Video Tracking

Resilient object detection and tracking on Edge and cloud (AWS):

The best methods of object tracking run on GPU. The versioning of different deep learning frameworks are crucial. For example the latest version of OS for Jetson Nano Jetpack which use latest CUDA but the Pytorch only support up to 10.1 now. So we need to install lower Jetpack version on Jetson Nano or compile the Pytorch. I compile Pytorch and it takes few hours with a lot of issue to solve. For the Ubuntu 20 there is not support for CUDA 10 you need install CUDA 11 and compile the Pytorch with a lot of library. install CUDA 10.x on Ubuntu 20 is possible but takes time to solve conflicts also supporting eGPU is another issue for lower ubuntu. On MacOS installing everything is easy because of not supporting GPU but many library and frameworks of the source codes of tracking require GPU version. Even install CPU version of all library does not grantee to run tracking methods. Another aspect is speed. Running tracking even on GPU is very slow based on my experience using Yolo version 3 which is one of the fastest object detection on GTX 2070 can process up to 15 FPS with Full HD videos. Methods on tracking is very different. First generation, it is completely based on computer vision. The second generation combining Kalman filter and advanced computer vision (SIFT), the third generation using deep learning and some of the methods of previous generation like Kalman filter. The fourth generation using combination of two deep learning methods. And the latest generation using complete end to end models like RNN. Object tracking works with all combination of environments such as, moving objects, moving objects and camera in dynamic environments. As long as object appear in the frame until disappeared it the tracking can track and identification as one objects. No mater how many FPS.

Tracking

  • Classic object tracking

        • classic feature detection (SIFT and SURF), combined with a machine learning algorithm like KNN or SVM for classification, or with a description matcher like FLANN for object detection.

        • Kalman filtering, sparse and dense optical flow,

        • Example: Simple Online and Realtime Tracking (SORT), which uses a combination of the Hungarian algorithm and Kalman filter

  • SOT is a hot topic in the last decade. Early visual tracking methods rely on extracting hand-crafted features of candidate target regions, and use matching algorithms or hand-crafted discriminative classifiers to generate tracking results.

  • The MOT track aims to recover the trajectories of objects in video sequences, which is an important problem in computer vision with many applications, such as surveillance, activity analysis, and sport video analysis.

  • Video object detection datasets. The video object detection task aims to detect objects of different categories in video sequences.

  • Multi-object tracking datasets

  • large-scale benchmark Multi-Class Multi-object tracking datasets

  • VisDrone datasets is captured in various unconstrained scenes, focusing on four core problems in computer vision fields, i.e., image object detection, video object detection, single object tracking, and multi- object tracking.

  • the accuracy of detection methods suffers from degenerated object appearances in videos such as motion blur, pose variations, and video de-focus. Exploiting temporal coherence and aggregating features in consecutive frames might to be two effective ways to handle such issue.

    • Temporal coherence. A feasible way to exploit temporal coherence is using object trackers

    • Feature aggregation. Aggregating features in consecutive frames is also a useful way to improve the performance.

  • List of Datasets

    • MOT20

    • KITTI Tracking

    • MOTChallenge 2015

    • UA-DETRAC Tracking

    • DukeMTMC

    • Campus

    • MOT17

    • UAVDT-MOT

    • VisDrone

Source code

Self collected datasets

Video labeling

some examples