Jetson Nano + OpenCV AI KIT OAK-D-LITE depth camera

Myriad X are integrated into the Robotics Vision Core 2

Hardware for Deep Learning (machine learning)


I experiment with many different hardware to train and run deep learning application. The below list shows my suggestion, comparison, expectation of using different hardware. Embedded AI, implementing distributed data parallel, distributed model parallel solutions. 


#hardware #deep_learning #IoT #training_machine_learning_model #tiziran 




My experience 

I tested many different hardware for different computer vision applications in area of IoT and Robotics

Raspberry Pi 4 

How to upgrade Raspberry Pi 4 EEPROM boot recovery; Released 2020-09-14; to install and boot from USB 3 (SSD)

Smart AI IoT, Robotic, 3D SLAM, AR, VR

I worked with many different hardware such as


I worked with many different cameras such as:

What is important?

Scaled-YOLOv4:scaling model based on hardware 


Update 26.April.2021

How to use computer vision with deep learning in IoT devices.  Inference machine learning on Edge require some extra steps.

I tested several hardware such as Raspberry pi 3, Raspberry pi 4, Intel® Neural Compute Stick 2, OpenCV AI Kit, Google Coral, NVIDIA Jetson Nano, etc. Different OS: real-time operating system (RTOS), Nasa cFS (core Flight System), Real-Time Executive for Multiprocessor Systems (RTEMS),

anomaly detection, object detection, object tracking, ...

Use special frameworks or library for edge devices:

Moreover, think about deep learning model for your specific hardware at first stage.

In some case you need to enhance model for inference. There are many techniques to use such as,


Distributed machine learning and load balancing strategy


model pruning: reducing redundant parameters which are not sensitive to the performance. aim: remove all connections with absolute weights below a threshold. 🤔go for bigger size of network with many layers then pruning much better and faster


The best way is using Google library which support most comprehensive methods

compresses by reducing the number of bits used to represent the weights quantization effectively constraints the number of different weights we can use inside our kernels per channel quantization for weights, which improves performance by model compression and latency reduction.

training a compact neural network with distilled knowledge of a large model distillation (knowledge transfer) from an ensemble of big networks into a much smaller network which learns directly from the cumbersome model's outputs, that is lighter to deploy

Distillation Techniques

Distill-Net: Application-Specific Distillation of Deep Convolutional Neural Networks for Resource-Constrained IoT Platforms

Binarized Neural Networks (BNNs)

It is not support by GPU hardware such as Jetson Nano. mostly based on CPU

Apache TVM (incubating) is a compiler stack for deep learning systems

challenges with large scale models deep neural networks are: expensive computationally expensive memory intensive hindering their deployment in:devices with low memory resources applications with strict latency requirements other issues:data security: tend to memorize everything including PII bias e.g. profanity: trained on large scale public datas elf discovering: instead of manually configuring conversational flows, automatically discover them from your data self training: let your system train itself with new example s self managing: let your system optimize by itself knowledge distillation

Distributed machine learning and load balancing strategy

run models which use all processing power like CPU,GPU,DSP,AI chip together to enhance inference performance. dynamic pruning of kernels which aims to the parsimonious inference by learning to exploit and dynamically remove the redundant capacity of a CNN architecture. partitioning techniques through convolution layer fusion to dynamically select the optimal partition according to the availability of computational resources and network conditions.

Low rank matrix factorization (LRMF)

there exists latent structures in the data, by uncovering which we can obtain a compressed representation of the dataLRMF factorizes the original matrix into lower rank matrices while preserving latent structures and addressing the issue of sparseness

Compact convolutional filters (Video/CNN)

designing special structural convolutional filters to save parameters replace over parametric filters with compact filters to achieve overall speedup while maintaining comparable accuracy

Knowledge distillation

Neural Networks Compression Framework (NNCF)

AI Edge: How to inference deep learning models on edge/IoT  Enabling efficient high-performance Accelerators/Optimization on Deep Learning

if the object is large and we do not need small anchor

in mobileNet we can remove small part of network which related to small objects. in YOLO reduce number of anchor. decrease size of image input but reduce the accuracy

Parallel programming and clean code, design pattern,