Hardware
Jetson Nano + OpenCV AI KIT OAK-D-LITE depth camera
Camera
Width: 91 mm , Height: 28 mm, Length: 17.5 mm, Baseline: 75 mm, Weight: 61 g
chips:
Robotics Vision Core 2 (RVC2 in short)
Myriad X are integrated into the Robotics Vision Core 2
Speed ML
Model name, Size, FPS, Latency [ms],
MobileOne S0 224x224, 165.5, 11.1
YoloV8n, 416x416, 31.3, 56.9,
YoloV8n, 640x640, 14.3, 123.6
YoloV8s, 416x416, 15.2, 111.9
YoloV8m, 416x416, 6.0, 273.8
Hardware for Deep Learning (machine learning)
https://www.tiziran.com/topics/hardware
I experiment with many different hardware to train and run deep learning application. The below list shows my suggestion, comparison, expectation of using different hardware. Embedded AI, implementing distributed data parallel, distributed model parallel solutions.
https://www.tiziran.com/topics/hardware
#hardware #deep_learning #IoT #training_machine_learning_model #tiziran
Laptop:
NVIDIA Geforce RTX 3080 Ti
Razer Blade 17 - 17.3 inch gaming laptop (NVIDIA Geforce RTX 3080 Ti, Intel i9-12900H, 4K UHD 144Hz display 32GB DDR5 RAM, 1TB SSD,
Desktop
eGPU
Razer RC21-01310100-R351 Core X External Graphics Card Case = ~ 300 Euro + GPU
Cooler Master MasterCase EG200 External GPU Enclosure - Thunderbolt 3 Compatible eGPU Enclosure, 1 PWM 92mm Fan, V550 SFX Gold Fully Modular PSU, USB Hub, Vertical Laptop Support - EU Plug = ~ 300 Euro + GPU
GPU
Geforce RTX 3090 24G 384Bit Gddr6x Nvidia Geforce
MSI GeForce RTX 3090 GAMING TRIO 24G Gaming Graphics Card - NVIDIA RTX 3090, GPU 1740MHz, 24GB GDDR6X memory = ~ 2800 Euro
IoT:
Raspberry pi 3 (you need accelerator )
Raspberry pi 4 (you need accelerator )
Intel® Neural Compute Stick 2
Intel® Distribution of OpenVINO™ Toolkit
I attached to Raspberry pi 4 by USB 3 and work very well for many deep learning models
Google Coral
I attached to Raspberry pi 4 by USB 3 and work very well for TensorFlow models
Why TensorFlow lite on Edge: Lightweight, low-latency, Privacy, improved power consumption, efficient model ready to used
NVIDIA Jetson Nano ( 2GB and 4GB ram)
I test Multi-Class Multi-Object Multi-Camera Tracking (MCMOMCT) under heavy workloads can perform up to 30 minutes
NVIDIA JETSON AGX XAVIER
NVIDIA AGX Orin = ~ 1900 Euro
OpenCV AI Kit
OAK = ~ 100 Euro
OAK—D = ~ 200 Euro
OAK—D + Wifi = ~ 250 Euro
OpenCV AI Kit: OAK—D-PoE = ~ 250 Euro
OAK—D lite = ~ 100 Euro
My experience
I tested many different hardware for different computer vision applications in area of IoT and Robotics
AI Edge: How to inference deep learning models on edge/IoT ; Enabling efficient high-performance ; Accelerators/Optimization on Deep Learning
#computervision #AI #objectdetection #objecttracking #ml #research #CNN #gans #convolutionalneuralnetworks #ai #vr #reinforcementlearning #mlops #aiforbusiness #science #researcher #phd #cameracalibration #opticalflow #videostablization #humanoidrobot #localization #3dSLAM #reconstruction #pointcloud #mixedreality #edgecomputing #raspberrypi #intelstick #googlecoral #jetsonnano #nvidiavgpu #tensorflowjs #pytorch #opencv #aikit #caffee #DIGITS #c++ #python #ubuntu #farshidpirahansiah #tiziran.com #farshid #pirahansiah #robotics #tiziran.com #farshid #pirahansiah #MultiCameraMultiClassMultiObjectTracking #deeplearning #machinelearning #artificialintelligence #tensorflow #robotics #3dvision #sterovision #depthmap #RCNN #machinevision #imageprocessing #patternrecognition #compiler #RISC-V #RNN #fullStackDeepLearning #productinnovation #patents #TensorRT #ApacheTVM #TFLite #PyTorchmobile #dockers #gRPC #RESTAPIs #GRPC #GraphQL #imageprocessing #patternrecognition
Raspberry Pi 4
How to upgrade Raspberry Pi 4 EEPROM boot recovery; Released 2020-09-14; to install and boot from USB 3 (SSD)
update Raspberry Pi 4 EEPROM boot recovery
install Ubuntu 20 on SSD
change the config.txt and add "program_usb_boot_mode=1" at the end of file
remove and micro sd card and boot from ssd
Smart AI IoT, Robotic, 3D SLAM, AR, VR
I worked with many different hardware such as
Raspberry pi 3
Raspberry pi 4
Intel® Neural Compute Stick 2
Intel® Distribution of OpenVINO™ Toolkit
I attached to Raspberry pi 4 by USB 3 and work very well for many deep learning models
Google Coral
I attached to Raspberry pi 4 by USB 3 and work very well for TensorFlow models
Why TensorFlow lite on Edge: Lightweight, low-latency, Privacy, improved power consumption, efficient model ready to used
NVIDIA Jetson Nano
I test Multi-Class Multi-Object Multi-Camera Tracking (MCMOMCT) under heavy workloads can perform up to 30 minutes
NVIDIA JETSON AGX XAVIER
The best hardware
I attended in may conferences and summits in area of Hardware for deep learning such as:
AI Hardware Europe Summit (July 2020)
Apache TVM And Deep Learning Compilation Conference (December 2020)
RISC-V Summit (December 2020)
OpenCV AI Kit
Camera
I worked with many different cameras such as:
Camera Module V1
Camera Module V2
Camera Module V2.1
multispectral camera
USB webcam
IP camera
high resolution camera > 8K
depth camera
stereo camera
What is important?
camera calibration is important
Quantum efficiency [%] (spectral response)
Sensor size [inches or mm] and pixel size [micro meter]
Dynamic Range [dB]
Image noise and signal to noise ratio (SNR), PSNR, SSIM, : greater SNR yields better contrast and clarity, as well as improved low light performance
inter face, cable length in m, bandwidth max in MB/s , multi camera, cable costs, real time, plug and play
firewire, 4.5 , 64, *, *, **, **
gige, 100, 100, **, **, *, *
usb, 8, 350, *, *, **, **
link, 10, 850, -, -, **, -
usb-c, 10, 40 GB,,,,
distortions, scaling factors, quality is important, calculate minimum sensor resolution *, determine your sensor size, focal length,
sensor resolution= image resolution = 2 * ( field of view (FOV) / smallest feature )
some online tools: baslerweb.com, edmundoptics.com, flir.com
to sum up
use USB-C camera. it will help you in the future upgrades in hardware and easy to use with less issues
find your best trade-off between WD and FOV
sometimes you cannot have everything in life!
your lens aperture (f/#) is your friend, use it!
a larger DOF requires a larger f/#
lens performance curves are the ultimate documentation to read when selecting a lens
understanding them properly requires good knowledge in optics, but it totally worth it.
Scaled-YOLOv4:scaling model based on hardware
Cost
Mobile, Open Hardware, RISC-V System-on-Chip (SoC) Development Kit
Hardware
NVIDIA Jetson Xavier NX Developer Kit
WIFI
SparkFun GPS-RTK Dead Reckoning pHAT
Micro Sd card
Mophie Powerstation USB C 20000
ZED 2 Stereo Camera
3D-printed box
AWS
AWS S3
AWS xml.p2.xlarge EC2 instances
AWS Sagemaker
3D printed humanoid robot: NimbRo-OP2 and NimbRo-OP2X hardware
Post Product to customer by
Update 26.April.2021
How to use computer vision with deep learning in IoT devices. Inference machine learning on Edge require some extra steps.
I tested several hardware such as Raspberry pi 3, Raspberry pi 4, Intel® Neural Compute Stick 2, OpenCV AI Kit, Google Coral, NVIDIA Jetson Nano, etc. Different OS: real-time operating system (RTOS), Nasa cFS (core Flight System), Real-Time Executive for Multiprocessor Systems (RTEMS),
anomaly detection, object detection, object tracking, ...
Use special frameworks or library for edge devices:
NVIDIA TensorRT
TensorFlow Lite: TensorFlow Lite on Microcontroller Gesture Recognition OpenMV/Tensorflow/ studio.edgeimpulse.com
TensorFlow.js
PyTorch Lightning
PyTorch Mobile
Intel® Distribution of OpenVINO Toolkit
CoreML
ML kit
FRITZ
MediaPipe
Apache TVM
TinyML: enabling ultra-low power machine learning at the edge tiny machine learning with Arduino
Libraries: ffmpeg, GStreamer, celery,
GPU library for python: PyCUDA, NumbaPro, PyOpenCL, CuPy
Moreover, think about deep learning model for your specific hardware at first stage.
In some case you need to enhance model for inference. There are many techniques to use such as,
Pruning
Quantization
Distillation Techniques
Binarized Neural Networks (BNNs)
Apache TVM (incubating) is a compiler stack for deep learning systems
Distributed machine learning and load balancing strategy
Low rank matrix factorization (LRMF)
Compact convolutional filters (Video/CNN)
Knowledge distillation
Neural Networks Compression Framework (NNCF)
Parallel programming
How
Distributed machine learning and load balancing strategy
Pruning
model pruning: reducing redundant parameters which are not sensitive to the performance. aim: remove all connections with absolute weights below a threshold. 🤔go for bigger size of network with many layers then pruning much better and faster
Quantization
The best way is using Google library which support most comprehensive methods
compresses by reducing the number of bits used to represent the weights quantization effectively constraints the number of different weights we can use inside our kernels per channel quantization for weights, which improves performance by model compression and latency reduction.
training a compact neural network with distilled knowledge of a large model distillation (knowledge transfer) from an ensemble of big networks into a much smaller network which learns directly from the cumbersome model's outputs, that is lighter to deploy
Distillation Techniques
Distill-Net: Application-Specific Distillation of Deep Convolutional Neural Networks for Resource-Constrained IoT Platforms
Binarized Neural Networks (BNNs)
It is not support by GPU hardware such as Jetson Nano. mostly based on CPU
Apache TVM (incubating) is a compiler stack for deep learning systems
challenges with large scale models deep neural networks are: expensive computationally expensive memory intensive hindering their deployment in:devices with low memory resources applications with strict latency requirements other issues:data security: tend to memorize everything including PII bias e.g. profanity: trained on large scale public datas elf discovering: instead of manually configuring conversational flows, automatically discover them from your data self training: let your system train itself with new example s self managing: let your system optimize by itself knowledge distillation
Distributed machine learning and load balancing strategy
run models which use all processing power like CPU,GPU,DSP,AI chip together to enhance inference performance. dynamic pruning of kernels which aims to the parsimonious inference by learning to exploit and dynamically remove the redundant capacity of a CNN architecture. partitioning techniques through convolution layer fusion to dynamically select the optimal partition according to the availability of computational resources and network conditions.
Low rank matrix factorization (LRMF)
there exists latent structures in the data, by uncovering which we can obtain a compressed representation of the dataLRMF factorizes the original matrix into lower rank matrices while preserving latent structures and addressing the issue of sparseness
Compact convolutional filters (Video/CNN)
designing special structural convolutional filters to save parameters replace over parametric filters with compact filters to achieve overall speedup while maintaining comparable accuracy
Knowledge distillation
Neural Networks Compression Framework (NNCF)
AI Edge: How to inference deep learning models on edge/IoT Enabling efficient high-performance Accelerators/Optimization on Deep Learning
if the object is large and we do not need small anchor
in mobileNet we can remove small part of network which related to small objects. in YOLO reduce number of anchor. decrease size of image input but reduce the accuracy
Parallel programming and clean code, design pattern,