I would like to give you some of my experience with AI projects.

Before start your machine learning project ask these questions and preparation: What is your inference hardware? specify the use case. specify model interface. how would we monitor performance after deployment? how can we approximate post-deployment monitoring before deployment? build a model and iteratively improve it. How to deploy the model at the end? monitor performance after deployment. what is your metric? How do you split your data (training and validation)?

Preparation ML Project Workflow

  • What is your hardware ?

  • specify the use case

  • specify model interface

  • how would we monitor performance after deployment?

  • how can we approximate post-deployment monitoring before deployment?

  • build a model and iteratively improve it

  • deploy the model

  • monitor performance

    • what is your are metric?

    • How do you split your data?

Before Training deep learning model

  • using large model to train because

    • it is faster to train with lower overfit and faster converge due to best training

    • it is easier and higher compress in the final stage

      • model compression and acceleration: reducing parameters without significantly decreasing the model performance

  • Data: How to have good data for training deep learning models; How to Build and Enhance A Good Data Set For Your Deep Learning Project: using same config and data for training and inference, removing redundant (delete data which you don't need), get more data, Handle missing data, using data augmentation techniques or GAN to generate more data, re-scale/balance data, Transform your data (Change data types), Feature selection based on data-set and use case

      • The data you don't need: removing redundant samples

      • get more data

      • Invent more data

        • data augmentation

      • Re-scale data

        • balance datasets

      • Transform your data

      • Feature selection based on dataset and use case

      • ML-Augmented Video Object Tracking: By applying and evaluating multiple algorithmic models, enhanced ability to scale object tracking in high-density video compositions.

Training deep learning model

  • automated hyper-parameters

    • Using Hyperparameter tuning / Hyperparameter optimization tools

    • AutoML

    • genetic algorithm

    • population based training

    • bayesian optimization

  • You need to set some parameters and config for training

      • Diagnostics

      • Weight Initialization

      • Learning rate

      • Activation function

      • Network Topology

      • Batches and Epochs

      • Regularization

      • Optimization and Loss

      • Early Stopping

Continuous delivery

  • evolve with latest detection models

  • more data (no labels)

    • semi-supervised learning: big self-supervised models are strong semi-supervised learners

After Training deep learning model

  • Parameter pruning

    • model pruning: reducing redundant parameters which are not sensitive to the performance.

      • aim: remove all connections with absolute weights below a threshold

  • Quantization

    • compresses by reducing the number of bits used to represent the weights

    • quantization effectively constraints the number of different weights we can use inside our kernels

    • per-channel quantization for weights, which improves performance by model compression and latency reduction.

  • Low rank matrix factorization (LRMF)

    • there exists latent structures in the data, by uncovering which we can obtain a compressed representation of the data

    • LRMF factorizes the original matrix into lower rank matrices while preserving latent structures and addressing the issue of sparseness

  • Compact convolutional filters (Video/CNN)

    • designing special structural convolutional filters to save parameters

    • replace over parametric filters with compact filters to achieve overall speedup while maintaining comparable accuracy

  • Knowledge distillation

    • training a compact neural network with distilled knowledge of a large model

    • distillation (knowledge transfer) from an ensemble of big networks into a much smaller network which learns directly from the cumbersome model's outputs, that is lighter to deploy

  • Binarized Neural Networks (BNNs)

  • Apache TVM (incubating) is a compiler stack for deep learning systems

  • Neural Networks Compression Framework (NNCF)

Deep learning model in production

  • security: controls access to model(s) through secure packaging and execution

  • Test

  • auto training

  • using parallel processing and library such as GStreamer






My Keynote (February 2021)

  1. introduction

  2. Machine Learning/ Deep Learning

Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed

  1. supervised Machine Learning

    1. Deep Convolutional Neural Networks (DCNN) Architecture

    2. Visualizing and Understanding Convolutional Networks

    3. Object Detection by Deep Learning

    4. Video Tracking

    5. Style Transfer

  2. semi-supervised Machine Learning/ Deep Reinforcement learning (DRL)

    1. Google

    2. Deep Reinforcement learning (DRL)

  3. unsupervised Machine Learning

    1. Auto Encoder

  4. Generative Adversarial Networks (GANs)

  5. Tools

  6. Pre trained model

  7. Effect of Augmented Datasets to Train DCNNs

  8. Training for more classes

  9. Optimization

  10. Hardware

  11. Production setup

  12. post development

  13. business , Gartner, Hype Cycle for emerging technologies, 2025

Advanced and practical

  1. Inside CNN

    1. Deep Convolutional Neural Networks Architecture

    2. Convolution

    3. Convolution Layer

    4. Conv/FC Filters

    5. Activation Functions

    6. Layer Activations

    7. Pooling Layer

    8. Dropout ; L2 pooling

    9. Why

      1. Max-pooling is useful

      2. How to see inside each layer and find important features

  1. Hands on python for deep learning

  2. Fundamental deep learning

  3. Installation: TensorFlow, PyTorch

  4. Using PC+eGPU for training video tracking

Summary of the summit


  • Effective and precise face detection based on color and depth data


      • containing or not containing a face

      • Eigenface, Fisherface, waveletface, PCA (Principal Component Analysis), LDA (Linear Dis-criminant Analysis), Haar wavelet transform, and so on.

      • Viola–Jones detector

      • illumination changes and occlusion

      • depthinformation is used to filter the regions of the image where a candidate face regionis found by the Viola–Jones (VJ) detector

      • - the first filtering rule is defined on the color of the region; since some false positiveshave colors not compatible with the face (e.g. shadows on jeans) a skin detector isapplied to remove the candidate face regions that do not contain skin pixels;

      • - the second filtering rule is defined on the size of the face: using the depth mapit is quite easy to calculate the size of the candidate face region, which is use-ful to discard smallest and largest faces from the final result set;

      • - the third filtering rule is defined on the depth map to discard flat objects (e.g.candidate faces found in a wall) or uneven objects (e.g. candidate face foundin the leaves of a tree). Combining color and depth data the candidate faceregion can be extracted from the background and measures of depth and reg-ularity are used for filtering out false positives.

      • The size criteria simply remove the candidate faces not included in a fixed rangesize ([12.5,30] cm). The size of a candidate face region is extracted from the depthmap according to the following approach.

      • image below

  • Gaussian mixture 3D morphable face model

  • Face Synthesis for Eyeglass-Robust Face Recognition

  • GeneGAN: Learning Object Transfiguration and Attribute Subspace from Unpaired Data

  • FacePoseNet: Making a Case for Landmark-Free Face Alignment

  • Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision

  • Unsupervised Eyeglasses Removal in the Wild

  • How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)


    • (a) we construct, for the first time, a very strong baseline by combining a state-of-the-art architecture for landmark localization with a state-of-the-art residual block, train it on a very large yet synthetically expanded 2D facial landmark dataset and fi- nally evaluate it on all other 2D facial landmark datasets.

    • (b) We create a guided by 2D landmarks network which con- verts 2D landmark annotations to 3D and unifies all exist- ing datasets, leading to the creation of LS3D-W, the largest and most challenging 3D facial landmark dataset to date (~230,000 images).

    • (c) Following that, we train a neural network for 3D face alignment and evaluate it on the newly introduced LS3D-W.

    • (d) We further look into the effect of all “traditional” factors affecting face alignment performance like large pose, initialization and resolution, and introduce a “new” one, namely the size of the network.

    • (e) We show that both 2D and 3D face alignment networks achieve per- formance of remarkable accuracy which is probably close to saturating the datasets used.

    • Training and testing code as well as the dataset can be downloaded from https: //