Hardware

My experience

I tested many different hardware for different computer vision applications in area of IoT and Robotics

Raspberry Pi 4

How to upgrade Raspberry Pi 4 EEPROM boot recovery; Released 2020-09-14; to install and boot from USB 3 (SSD)

  1. update Raspberry Pi 4 EEPROM boot recovery

  2. install Ubuntu 20 on SSD

  3. change the config.txt and add "program_usb_boot_mode=1" at the end of file

  4. remove and micro sd card and boot from ssd

Smart AI IoT, Robotic, 3D SLAM, AR, VR

RISC-V

I worked with many different hardware such as

  • Raspberry pi 3

  • Raspberry pi 4

  • Intel® Neural Compute Stick 2

    • Intel® Distribution of OpenVINO™ Toolkit

    • I attached to Raspberry pi 4 by USB 3 and work very well for many deep learning models

  • Google Coral

    • I attached to Raspberry pi 4 by USB 3 and work very well for TensorFlow models

    • Why TensorFlow lite on Edge: Lightweight, low-latency, Privacy, improved power consumption, efficient model ready to used

  • NVIDIA Jetson Nano

    • I test Multi-Class Multi-Object Multi-Camera Tracking (MCMOMCT) under heavy workloads can perform up to 30 minutes

  • NVIDIA JETSON AGX XAVIER

    • The best hardware

    • I attended in may conferences and summits in area of Hardware for deep learning such as:

  • OpenCV AI Kit

Camera

I worked with many different cameras such as:

  • Camera Module V1

  • Camera Module V2

  • Camera Module V2.1

  • multispectral camera

  • USB webcam

  • IP camera

  • high resolution camera > 8K

  • depth camera

  • stereo camera

What is important?

  • camera calibration is important

  • Quantum efficiency [%] (spectral response)

  • Sensor size [inches or mm] and pixel size [micro meter]

  • Dynamic Range [dB]

  • Image noise and signal to noise ratio (SNR), PSNR, SSIM, : greater SNR yields better contrast and clarity, as well as improved low light performance

  • inter face, cable length in m, bandwidth max in MB/s , multi camera, cable costs, real time, plug and play

      • firewire, 4.5 , 64, *, *, **, **

      • gige, 100, 100, **, **, *, *

      • usb, 8, 350, *, *, **, **

      • link, 10, 850, -, -, **, -

      • usb-c, 10, 40 GB,,,,

  • distortions, scaling factors, quality is important, calculate minimum sensor resolution *, determine your sensor size, focal length,

      • sensor resolution= image resolution = 2 * ( field of view (FOV) / smallest feature )

  • some online tools: baslerweb.com, edmundoptics.com, flir.com

  • to sum up

    • use USB-C camera. it will help you in the future upgrades in hardware and easy to use with less issues

    • find your best trade-off between WD and FOV

    • sometimes you cannot have everything in life!

    • your lens aperture (f/#) is your friend, use it!

    • a larger DOF requires a larger f/#

    • lens performance curves are the ultimate documentation to read when selecting a lens

    • understanding them properly requires good knowledge in optics, but it totally worth it.

Scaled-YOLOv4:scaling model based on hardware


Cost

Update 26.April.2021

How to use computer vision with deep learning in IoT devices. Inference machine learning on Edge require some extra steps.

I tested several hardware such as Raspberry pi 3, Raspberry pi 4, Intel® Neural Compute Stick 2, OpenCV AI Kit, Google Coral, NVIDIA Jetson Nano, etc. Different OS: real-time operating system (RTOS), Nasa cFS (core Flight System), Real-Time Executive for Multiprocessor Systems (RTEMS),

anomaly detection, object detection, object tracking, ...

Use special frameworks or library for edge devices:

  • NVIDIA TensorRT

  • TensorFlow Lite: TensorFlow Lite on Microcontroller Gesture Recognition OpenMV/Tensorflow/ studio.edgeimpulse.com

  • TensorFlow.js

  • PyTorch Lightning

  • PyTorch Mobile

  • Intel® Distribution of OpenVINO Toolkit

  • CoreML

  • ML kit

  • FRITZ

  • MediaPipe

  • Apache TVM

  • TinyML: enabling ultra-low power machine learning at the edge tiny machine learning with Arduino

  • Libraries: ffmpeg, GStreamer, celery,

  • GPU library for python: PyCUDA, NumbaPro, PyOpenCL, CuPy

Moreover, think about deep learning model for your specific hardware at first stage.

In some case you need to enhance model for inference. There are many techniques to use such as,

  • Pruning

  • Quantization

  • Distillation Techniques

  • Binarized Neural Networks (BNNs)

  • Apache TVM (incubating) is a compiler stack for deep learning systems

  • Distributed machine learning and load balancing strategy

  • Low rank matrix factorization (LRMF)

  • Compact convolutional filters (Video/CNN)

  • Knowledge distillation

  • Neural Networks Compression Framework (NNCF)

  • Parallel programming

How

Distributed machine learning and load balancing strategy

Pruning

model pruning: reducing redundant parameters which are not sensitive to the performance. aim: remove all connections with absolute weights below a threshold. 🤔go for bigger size of network with many layers then pruning much better and faster

Quantization

The best way is using Google library which support most comprehensive methods

compresses by reducing the number of bits used to represent the weights quantization effectively constraints the number of different weights we can use inside our kernels per channel quantization for weights, which improves performance by model compression and latency reduction.

training a compact neural network with distilled knowledge of a large model distillation (knowledge transfer) from an ensemble of big networks into a much smaller network which learns directly from the cumbersome model's outputs, that is lighter to deploy

Distillation Techniques

Distill-Net: Application-Specific Distillation of Deep Convolutional Neural Networks for Resource-Constrained IoT Platforms

Binarized Neural Networks (BNNs)

It is not support by GPU hardware such as Jetson Nano. mostly based on CPU

Apache TVM (incubating) is a compiler stack for deep learning systems

challenges with large scale models deep neural networks are: expensive computationally expensive memory intensive hindering their deployment in:devices with low memory resources applications with strict latency requirements other issues:data security: tend to memorize everything including PII bias e.g. profanity: trained on large scale public datas elf discovering: instead of manually configuring conversational flows, automatically discover them from your data self training: let your system train itself with new example s self managing: let your system optimize by itself knowledge distillation

Distributed machine learning and load balancing strategy

run models which use all processing power like CPU,GPU,DSP,AI chip together to enhance inference performance. dynamic pruning of kernels which aims to the parsimonious inference by learning to exploit and dynamically remove the redundant capacity of a CNN architecture. partitioning techniques through convolution layer fusion to dynamically select the optimal partition according to the availability of computational resources and network conditions.

Low rank matrix factorization (LRMF)

there exists latent structures in the data, by uncovering which we can obtain a compressed representation of the dataLRMF factorizes the original matrix into lower rank matrices while preserving latent structures and addressing the issue of sparseness

Compact convolutional filters (Video/CNN)

designing special structural convolutional filters to save parameters replace over parametric filters with compact filters to achieve overall speedup while maintaining comparable accuracy

Knowledge distillation

Neural Networks Compression Framework (NNCF)

AI Edge: How to inference deep learning models on edge/IoT Enabling efficient high-performance Accelerators/Optimization on Deep Learning

if the object is large and we do not need small anchor

in mobileNet we can remove small part of network which related to small objects. in YOLO reduce number of anchor. decrease size of image input but reduce the accuracy

Parallel programming and clean code, design pattern,