Hardware

Hardware for Deep Learning (machine learning)

https://www.tiziran.com/topics/hardware

I experiment with many different hardware to train and run deep learning application. The below list shows my suggestion, comparison, expectation of using different hardware. Embedded AI, implementing distributed data parallel, distributed model parallel solutions.

https://www.tiziran.com/topics/hardware

#hardware #deep_learning #IoT #training_machine_learning_model #tiziran

Laptop:

  • NVIDIA Geforce RTX 3080 Ti

    • Razer Blade 17 - 17.3 inch gaming laptop (NVIDIA Geforce RTX 3080 Ti, Intel i9-12900H, 4K UHD 144Hz display 32GB DDR5 RAM, 1TB SSD,

Desktop

  • eGPU

    • Razer RC21-01310100-R351 Core X External Graphics Card Case = ~ 300 Euro + GPU

    • Cooler Master MasterCase EG200 External GPU Enclosure - Thunderbolt 3 Compatible eGPU Enclosure, 1 PWM 92mm Fan, V550 SFX Gold Fully Modular PSU, USB Hub, Vertical Laptop Support - EU Plug = ~ 300 Euro + GPU

  • GPU

    • Geforce RTX 3090 24G 384Bit Gddr6x Nvidia Geforce

    • MSI GeForce RTX 3090 GAMING TRIO 24G Gaming Graphics Card - NVIDIA RTX 3090, GPU 1740MHz, 24GB GDDR6X memory = ~ 2800 Euro

IoT:

  • Raspberry pi 3 (you need accelerator )

  • Raspberry pi 4 (you need accelerator )

  • Intel® Neural Compute Stick 2

    • Intel® Distribution of OpenVINO™ Toolkit

    • I attached to Raspberry pi 4 by USB 3 and work very well for many deep learning models

  • Google Coral

    • I attached to Raspberry pi 4 by USB 3 and work very well for TensorFlow models

    • Why TensorFlow lite on Edge: Lightweight, low-latency, Privacy, improved power consumption, efficient model ready to used

  • NVIDIA Jetson Nano ( 2GB and 4GB ram)

    • I test Multi-Class Multi-Object Multi-Camera Tracking (MCMOMCT) under heavy workloads can perform up to 30 minutes

  • NVIDIA JETSON AGX XAVIER

  • NVIDIA AGX Orin = ~ 1900 Euro

  • OpenCV AI Kit

    • OAK = ~ 100 Euro

    • OAK—D = ~ 200 Euro

    • OAK—D + Wifi = ~ 250 Euro

    • OpenCV AI Kit: OAK—D-PoE = ~ 250 Euro

    • OAK—D lite = ~ 100 Euro

My experience

I tested many different hardware for different computer vision applications in area of IoT and Robotics

Raspberry Pi 4

How to upgrade Raspberry Pi 4 EEPROM boot recovery; Released 2020-09-14; to install and boot from USB 3 (SSD)

  1. update Raspberry Pi 4 EEPROM boot recovery

  2. install Ubuntu 20 on SSD

  3. change the config.txt and add "program_usb_boot_mode=1" at the end of file

  4. remove and micro sd card and boot from ssd

Smart AI IoT, Robotic, 3D SLAM, AR, VR

I worked with many different hardware such as

  • Raspberry pi 3

  • Raspberry pi 4

  • Intel® Neural Compute Stick 2

    • Intel® Distribution of OpenVINO™ Toolkit

    • I attached to Raspberry pi 4 by USB 3 and work very well for many deep learning models

  • Google Coral

    • I attached to Raspberry pi 4 by USB 3 and work very well for TensorFlow models

    • Why TensorFlow lite on Edge: Lightweight, low-latency, Privacy, improved power consumption, efficient model ready to used

  • NVIDIA Jetson Nano

    • I test Multi-Class Multi-Object Multi-Camera Tracking (MCMOMCT) under heavy workloads can perform up to 30 minutes

  • NVIDIA JETSON AGX XAVIER

    • The best hardware

    • I attended in may conferences and summits in area of Hardware for deep learning such as:

  • OpenCV AI Kit

Camera

I worked with many different cameras such as:

  • Camera Module V1

  • Camera Module V2

  • Camera Module V2.1

  • multispectral camera

  • USB webcam

  • IP camera

  • high resolution camera > 8K

  • depth camera

  • stereo camera

What is important?

  • camera calibration is important

  • Quantum efficiency [%] (spectral response)

  • Sensor size [inches or mm] and pixel size [micro meter]

  • Dynamic Range [dB]

  • Image noise and signal to noise ratio (SNR), PSNR, SSIM, : greater SNR yields better contrast and clarity, as well as improved low light performance

  • inter face, cable length in m, bandwidth max in MB/s , multi camera, cable costs, real time, plug and play

      • firewire, 4.5 , 64, *, *, **, **

      • gige, 100, 100, **, **, *, *

      • usb, 8, 350, *, *, **, **

      • link, 10, 850, -, -, **, -

      • usb-c, 10, 40 GB,,,,

  • distortions, scaling factors, quality is important, calculate minimum sensor resolution *, determine your sensor size, focal length,

      • sensor resolution= image resolution = 2 * ( field of view (FOV) / smallest feature )

  • some online tools: baslerweb.com, edmundoptics.com, flir.com

  • to sum up

    • use USB-C camera. it will help you in the future upgrades in hardware and easy to use with less issues

    • find your best trade-off between WD and FOV

    • sometimes you cannot have everything in life!

    • your lens aperture (f/#) is your friend, use it!

    • a larger DOF requires a larger f/#

    • lens performance curves are the ultimate documentation to read when selecting a lens

    • understanding them properly requires good knowledge in optics, but it totally worth it.

Scaled-YOLOv4:scaling model based on hardware


Cost

Update 26.April.2021

How to use computer vision with deep learning in IoT devices. Inference machine learning on Edge require some extra steps.

I tested several hardware such as Raspberry pi 3, Raspberry pi 4, Intel® Neural Compute Stick 2, OpenCV AI Kit, Google Coral, NVIDIA Jetson Nano, etc. Different OS: real-time operating system (RTOS), Nasa cFS (core Flight System), Real-Time Executive for Multiprocessor Systems (RTEMS),

anomaly detection, object detection, object tracking, ...

Use special frameworks or library for edge devices:

  • NVIDIA TensorRT

  • TensorFlow Lite: TensorFlow Lite on Microcontroller Gesture Recognition OpenMV/Tensorflow/ studio.edgeimpulse.com

  • TensorFlow.js

  • PyTorch Lightning

  • PyTorch Mobile

  • Intel® Distribution of OpenVINO Toolkit

  • CoreML

  • ML kit

  • FRITZ

  • MediaPipe

  • Apache TVM

  • TinyML: enabling ultra-low power machine learning at the edge tiny machine learning with Arduino

  • Libraries: ffmpeg, GStreamer, celery,

  • GPU library for python: PyCUDA, NumbaPro, PyOpenCL, CuPy

Moreover, think about deep learning model for your specific hardware at first stage.

In some case you need to enhance model for inference. There are many techniques to use such as,

  • Pruning

  • Quantization

  • Distillation Techniques

  • Binarized Neural Networks (BNNs)

  • Apache TVM (incubating) is a compiler stack for deep learning systems

  • Distributed machine learning and load balancing strategy

  • Low rank matrix factorization (LRMF)

  • Compact convolutional filters (Video/CNN)

  • Knowledge distillation

  • Neural Networks Compression Framework (NNCF)

  • Parallel programming

How

Distributed machine learning and load balancing strategy

Pruning

model pruning: reducing redundant parameters which are not sensitive to the performance. aim: remove all connections with absolute weights below a threshold. 🤔go for bigger size of network with many layers then pruning much better and faster

Quantization

The best way is using Google library which support most comprehensive methods

compresses by reducing the number of bits used to represent the weights quantization effectively constraints the number of different weights we can use inside our kernels per channel quantization for weights, which improves performance by model compression and latency reduction.

training a compact neural network with distilled knowledge of a large model distillation (knowledge transfer) from an ensemble of big networks into a much smaller network which learns directly from the cumbersome model's outputs, that is lighter to deploy

Distillation Techniques

Distill-Net: Application-Specific Distillation of Deep Convolutional Neural Networks for Resource-Constrained IoT Platforms

Binarized Neural Networks (BNNs)

It is not support by GPU hardware such as Jetson Nano. mostly based on CPU

Apache TVM (incubating) is a compiler stack for deep learning systems

challenges with large scale models deep neural networks are: expensive computationally expensive memory intensive hindering their deployment in:devices with low memory resources applications with strict latency requirements other issues:data security: tend to memorize everything including PII bias e.g. profanity: trained on large scale public datas elf discovering: instead of manually configuring conversational flows, automatically discover them from your data self training: let your system train itself with new example s self managing: let your system optimize by itself knowledge distillation

Distributed machine learning and load balancing strategy

run models which use all processing power like CPU,GPU,DSP,AI chip together to enhance inference performance. dynamic pruning of kernels which aims to the parsimonious inference by learning to exploit and dynamically remove the redundant capacity of a CNN architecture. partitioning techniques through convolution layer fusion to dynamically select the optimal partition according to the availability of computational resources and network conditions.

Low rank matrix factorization (LRMF)

there exists latent structures in the data, by uncovering which we can obtain a compressed representation of the dataLRMF factorizes the original matrix into lower rank matrices while preserving latent structures and addressing the issue of sparseness

Compact convolutional filters (Video/CNN)

designing special structural convolutional filters to save parameters replace over parametric filters with compact filters to achieve overall speedup while maintaining comparable accuracy

Knowledge distillation

Neural Networks Compression Framework (NNCF)

AI Edge: How to inference deep learning models on edge/IoT Enabling efficient high-performance Accelerators/Optimization on Deep Learning

if the object is large and we do not need small anchor

in mobileNet we can remove small part of network which related to small objects. in YOLO reduce number of anchor. decrease size of image input but reduce the accuracy

Parallel programming and clean code, design pattern,