IoT Scholarship Foundation

Face recognition by using OpenCV 4.1.1 is face but not accurate

Image classification by deep learning is very good with high accuracy

Object detection is around 5 frame per second: MobileNet SSD

Using Coral is very good - need some modification to installing - pose estimation is very face and accurate

Using Intel Movidius Stick 2 is good because we can use OpenCV and Python and different range of deep learning frameworks

Asfd

[1] Y. LeCun, “Generalization and network design strategies,” in Connectionism in perspective, 1989, pp. 143–155.

[1] S. Patrice, V. Bernard, L. Yann, and D. John S, “Tangent Prop: a formalism for specifying selected invariances in adaptive networks,” in Nips, 1991, vol. 4, pp. 895–903.

[1] Y. Le Cun, J. S. Denker, and S. a Solla, “Optimal Brain Damage,” in Advances in Neural Information Processing Systems, 1990, vol. 2, no. 1, pp. 598–605.

=====================================================

  • OpenCV: A computer vision (CV) library filled with many different computer vision functions and other useful image and video processing and handling capabilities.

  • MQTT: A publisher-subscriber protocol often used for IoT devices due to its lightweight nature. The paho-mqtt library is a common way of working with MQTT in Python.

  • Publish-Subscribe Architecture: A messaging architecture whereby it is made up of publishers, that send messages to some central broker, without knowing of the subscribers themselves. These messages can be posted on some given “topic”, which the subscribers can then listen to without having to know the publisher itself, just the “topic”.

  • Publisher: In a publish-subscribe architecture, the entity that is sending data to a broker on a certain “topic”.

  • Subscriber: In a publish-subscribe architecture, the entity that is listening to data on a certain “topic” from a broker.

  • Topic: In a publish-subscribe architecture, data is published to a given topic, and subscribers to that topic can then receive that data.

  • FFmpeg: Software that can help convert or stream audio and video. In the course, the related ffserver software is used to stream to a web server, which can then be queried by a Node server for viewing in a web browser.

  • Flask: A Python framework useful for web development and another potential option for video streaming to a web browser.

  • Node Server: A web server built with Node.js that can handle HTTP requests and/or serve up a webpage for viewing in a browser.

The "edge" means local (or near local) processing

NOT just anywhere in the cloud

Edge applications are often used where low latency is necessary

Also used where a network may not always be available

Can come from a desire for real-time decision making

Lesson 1: introduction to AI at the Edge

Lesson 2: Leveraging Pre-Trained Models

Lesson 3: The model Optimizer

Lesson 4: The inference engine

Lesson 5: Deploying an Edge App

No need to send to the cloud-> secure, less impact on network

Which of these are reasons for development of the Edge?

Proliferation of devices, need for low-latency compute, need for disconnected devices.

  • In this course, we’ll largely focus on AI at the Edge using the Intel® Distribution of OpenVINO™ Toolkit.

  • First, we’ll start off with pre-trained models available in the OpenVINO™ Open Model Zoo. Even without needing huge amounts of your own data and costly training, you can deploy powerful models already created for many applications.

  • Next, you’ll learn about the Model Optimizer, which can take a model you trained in frameworks such as TensorFlow, PyTorch, Caffe and more, and create an Intermediate Representation (IR) optimized for inference with OpenVINO™ and Intel® hardware.

  • Third, you’ll learn about the Inference Engine, where the actual inference is performed on the IR model.

  • Lastly, we'll hit some more topics on deploying at the edge, including things like handling input streams, processing model outputs, and the lightweight MQTT architecture used to publish data from your edge models to the web.

Very important listen again first two video :

Classification

Yes/no

Classes (1000 competition)

20 000 classes ImageNet

Detection

Find objects and location

Bounding boxes where object is

Combined with some form of classification

Segmentation

Classification segment of images (classify each and every pixel)

Semantic segmentation

All objects of the same class are one

Instance segmentation

Each object of a class is separate (two cats will different color)

Pose estimation

Text recognition

GANs

sudo ./downloader --name vehicle-attributes-recognition-barrier-0039 --precisions INT8 -o /home/workspace

Pre-processing:

Varies by model

Color channel order matters (RGB vs. BGR)

Image resizing

Normalization

def preprocessing(input_image, height, width):

image = cv2.resize(input_image, (width, height))

image = image.transpose((2,0,1))

image = image.reshape(1, 3, height, width)

return image

python app.py -i "images/blue-car.jpg" -t "CAR_META" -m "/home/workspace/models/vehicle-attributes-recognition-barrier-0039.xml" -c "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"

python app.py -i "images/sitting-on-car.jpg" -t "POSE" -m "/home/workspace/models/human-pose-estimation-0001.xml" -c "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"

python app.py -i "images/sign.jpg" -t "TEXT" -m "/home/workspace/models/text-detection-0004.xml" -c "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"

source /opt/intel/openvino/bin/setupvars.sh -pyver 3.5

python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model frozen_inference_graph.pb --tensorflow_object_detection_api_pipeline_config pipeline.config --reverse_input_channels --tensorflow_use_custom_operations_config /opt/intel/openvino/deployment_tools/model_optimizer/extensions/front/tf/ssd_v2_support.json

export MOD_OPT=/opt/intel/openvino/deployment_tools/model_optimizer

python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model squeezenet_v1.1.caffemodel --input_proto deploy.prototxt

python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model model.onnx

There’s two main command line arguments to use for cutting a model with the Model Optimizer, named intuitively as --input and --output, where they are used to feed in the layer names that should be either the new entry or exit points of the model.

-l $CLWS/cl_cosh/user_ie_extensions/cpu/build/libcosh_cpu_extension.so

~/inference_engine_samples_build/intel64/Release/classification_sample_async -i $CLT/pics/dog.bmp -m $CLWS/cl_ext_cosh/model.ckpt.xml -d CPU -l $CLWS/cl_cosh/user_ie_extensions/cpu/build/libcosh_cpu_extension.so

import argparse

import cv2

from helpers import load_to_IE, preprocessing

CPU_EXTENSION = "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"

def get_args():

'''

Gets the arguments from the command line.

'''

parser = argparse.ArgumentParser("Load an IR into the Inference Engine")

# -- Create the descriptions for the commands

m_desc = "The location of the model XML file"

i_desc = "The location of the image input"

r_desc = "The type of inference request: Async ('A') or Sync ('S')"

# -- Create the arguments

parser.add_argument("-m", help=m_desc)

parser.add_argument("-i", help=i_desc)

parser.add_argument("-r", help=i_desc)

args = parser.parse_args()

return args

def async_inference(exec_net, input_blob, image):

### TODO: Add code to perform asynchronous inference

### Note: Return the exec_net

exec_net.start_async(request_id=0, inputs={input_blob: image})

while True:

status = exec_net.requests[0].wait(-1)

if status == 0:

break

else:

time.sleep(1)

return exec_net

def sync_inference(exec_net, input_blob, image):

### TODO: Add code to perform synchronous inference

### Note: Return the result of inference

result = exec_net.infer({input_blob: image})

return result

def perform_inference(exec_net, request_type, input_image, input_shape):

'''

Performs inference on an input image, given an ExecutableNetwork

'''

# Get input image

image = cv2.imread(input_image)

# Extract the input shape

n, c, h, w = input_shape

# Preprocess it (applies for the IRs from the Pre-Trained Models lesson)

preprocessed_image = preprocessing(image, h, w)

# Get the input blob for the inference request

input_blob = next(iter(exec_net.inputs))

# Perform either synchronous or asynchronous inference

request_type = request_type.lower()

if request_type == 'a':

output = async_inference(exec_net, input_blob, preprocessed_image)

elif request_type == 's':

output = sync_inference(exec_net, input_blob, preprocessed_image)

else:

print("Unknown inference request type, should be 'A' or 'S'.")

exit(1)

# Return the exec_net for testing purposes

return output

def main():

args = get_args()

exec_net, input_shape = load_to_IE(args.m, CPU_EXTENSION)

perform_inference(exec_net, args.r, args.i, input_shape)

if __name__ == "__main__":

main()

https://docs.openvinotoolkit.org/latest/classInferenceEngine_1_1Blob.html

Note: There is one small change from the code on-screen for running on Linux machines versus Mac. On Mac, cv2.VideoWriter uses cv2.VideoWriter_fourcc('M','J','P','G') to write an .mp4 file, while Linux uses 0x00000021.

import argparse

import cv2

import numpy as np

def get_args():

'''

Gets the arguments from the command line.

'''

parser = argparse.ArgumentParser("Handle an input stream")

# -- Create the descriptions for the commands

i_desc = "The location of the input file"

# -- Create the arguments

parser.add_argument("-i", help=i_desc)

args = parser.parse_args()

return args

def capture_stream(args):

### TODO: Handle image, video or webcam

image_flag=False

if args.i=='CAM':

args.i=0

elif args.i.endswith('.jpg') or args.i.endswith('.bmp'):

image_flag=True

capture = cv2.VideoCapture(args.i)

capture.open(args.i)

if not image_flag:

out=cv2.VideoWriter('out.mp4',cv2.VideoWriter_fourcc('M','J','P','G'),30,(100,100))

else:

out=None

while capture.isOpened():

flag, frame = capture.read()

if not flag:

break

### TODO: Get and open video capture

key_pressed=cv2.waitKey(60)

if key_pressed == 27:

break

### TODO: Re-size the frame to 100x100

image = cv2.resize(frame, (100, 100))

edges = cv2.Canny(image,100,200)

if image_flag:

cv2.imwrite("out.jpg",edges)

else:

out.write(edges)

if not image_flag:

out.release

cv2.imshow('display', edges)

#cv2.imwrite('output.jpg', edges)

### TODO: Add Canny Edge Detection to the frame,

### with min & max values of 100 and 200

### Make sure to use np.dstack after to make a 3-channel image

### TODO: Write out the frame, depending on image or video

### TODO: Close the stream and any windows at the end of the application

#capture.release()

#cv2.destroyAllWindows()

def main():

args = get_args()

capture_stream(args)

if __name__ == "__main__":

main()

===============

import argparse

import cv2

from inference import Network

INPUT_STREAM = "pets.mp4"

CPU_EXTENSION = "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"

def get_args():

'''

Gets the arguments from the command line.

'''

parser = argparse.ArgumentParser("Run inference on an input video")

# -- Create the descriptions for the commands

m_desc = "The location of the model XML file"

i_desc = "The location of the input file"

d_desc = "The device name, if not 'CPU'"

# -- Add required and optional groups

parser._action_groups.pop()

required = parser.add_argument_group('required arguments')

optional = parser.add_argument_group('optional arguments')

# -- Create the arguments

required.add_argument("-m", help=m_desc, required=True)

optional.add_argument("-i", help=i_desc, default=INPUT_STREAM)

optional.add_argument("-d", help=d_desc, default='CPU')

args = parser.parse_args()

return args

def assess_scene(result, counter, incident_flag):

'''

Based on the determined situation, potentially send

a message to the pets to break it up.

'''

if result[0][1] == 1 and not incident_flag:

timestamp = counter / 30

print("Log: Incident at {:.3f} seconds.".format(timestamp))

print("Break it up!")

incident_flag = True

elif result[0][1] != 1:

incident_flag = False

return incident_flag

def infer_on_video(args):

# Initialize the Inference Engine

plugin = Network()

# Load the network model into the IE

plugin.load_model(args.m, args.d, CPU_EXTENSION)

net_input_shape = plugin.get_input_shape()

# Get and open video capture

cap = cv2.VideoCapture(args.i)

cap.open(args.i)

incident_flag=False

counter=0

# Process frames until the video ends, or process is exited

while cap.isOpened():

# Read the next frame

counter+=1

flag, frame = cap.read()

if not flag:

break

key_pressed = cv2.waitKey(60)

#print(flag)

# Pre-process the frame

p_frame = cv2.resize(frame, (net_input_shape[3], net_input_shape[2]))

p_frame = p_frame.transpose((2,0,1))

p_frame = p_frame.reshape(1, *p_frame.shape)

# Perform inference on the frame

plugin.async_inference(p_frame)

# Get the output of inference

if plugin.wait() == 0:

result = plugin.extract_output()

### TODO: Process the output

incident_flag=assess_scene(result,counter,incident_flag)

# Break if escape key pressed

if key_pressed == 27:

break

# Release the capture and destroy any OpenCV windows

cap.release()

cv2.destroyAllWindows()

def main():

args = get_args()

infer_on_video(args)

if __name__ == "__main__":

main()

*************

************************************

import argparse

import cv2

import numpy as np

import socket

import json

from random import randint

from inference import Network

### TODO: Import any libraries for MQTT and FFmpeg

import paho.mqtt.client as mqtt

INPUT_STREAM = "test_video.mp4"

CPU_EXTENSION = "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"

ADAS_MODEL = "/home/workspace/models/semantic-segmentation-adas-0001.xml"

CLASSES = ['road', 'sidewalk', 'building', 'wall', 'fence', 'pole',

'traffic_light', 'traffic_sign', 'vegetation', 'terrain', 'sky', 'person',

'rider', 'car', 'truck', 'bus', 'train', 'motorcycle', 'bicycle', 'ego-vehicle']

# MQTT server environment variables

HOSTNAME = socket.gethostname()

IPADDRESS = socket.gethostbyname(HOSTNAME)

MQTT_HOST = IPADDRESS

MQTT_PORT = 3004 #None ### TODO: Set the Port for MQTT

MQTT_KEEPALIVE_INTERVAL = 60

def get_args():

'''

Gets the arguments from the command line.

'''

parser = argparse.ArgumentParser("Run inference on an input video")

# -- Create the descriptions for the commands

i_desc = "The location of the input file"

d_desc = "The device name, if not 'CPU'"

# -- Create the arguments

parser.add_argument("-i", help=i_desc, default=INPUT_STREAM)

parser.add_argument("-d", help=d_desc, default='CPU')

args = parser.parse_args()

return args

def draw_masks(result, width, height):

'''

Draw semantic mask classes onto the frame.

'''

# Create a mask with color by class

classes = cv2.resize(result[0].transpose((1,2,0)), (width,height),

interpolation=cv2.INTER_NEAREST)

unique_classes = np.unique(classes)

out_mask = classes * (255/20)

# Stack the mask so FFmpeg understands it

out_mask = np.dstack((out_mask, out_mask, out_mask))

out_mask = np.uint8(out_mask)

return out_mask, unique_classes

def get_class_names(class_nums):

class_names= []

for i in class_nums:

class_names.append(CLASSES[int(i)])

return class_names

def infer_on_video(args, model):

### TODO: Connect to the MQTT server

client = mqtt.Client()

client.connect(MQTT_HOST, MQTT_PORT, MQTT_KEEPALIVE_INTERVAL)

# Initialize the Inference Engine

plugin = Network()

# Load the network model into the IE

plugin.load_model(model, args.d, CPU_EXTENSION)

net_input_shape = plugin.get_input_shape()

# Get and open video capture

cap = cv2.VideoCapture(args.i)

cap.open(args.i)

# Grab the shape of the input

width = int(cap.get(3))

height = int(cap.get(4))

# Process frames until the video ends, or process is exited

while cap.isOpened():

# Read the next frame

flag, frame = cap.read()

if not flag:

break

key_pressed = cv2.waitKey(60)

# Pre-process the frame

p_frame = cv2.resize(frame, (net_input_shape[3], net_input_shape[2]))

p_frame = p_frame.transpose((2,0,1))

p_frame = p_frame.reshape(1, *p_frame.shape)

# Perform inference on the frame

plugin.async_inference(p_frame)

# Get the output of inference

if plugin.wait() == 0:

result = plugin.extract_output()

# Draw the output mask onto the input

out_frame, classes = draw_masks(result, width, height)

class_names = get_class_names(classes)

speed = randint(50,70)

### TODO: Send the class names and speed to the MQTT server

client.publish("class", json.dumps({"class_names": class_names}))

client.publish("speedometer", json.dumps({"speed": speed}))

### Hint: The UI web server will check for a "class" and

### "speedometer" topic. Additionally, it expects "class_names"

### and "speed" as the json keys of the data, respectively.

### TODO: Send frame to the ffmpeg server

sys.stdout.buffer.write(frame)

sys.stdout.flush()

# Break if escape key pressed

if key_pressed == 27:

break

# Release the capture and destroy any OpenCV windows

cap.release()

cv2.destroyAllWindows()

### TODO: Disconnect from MQTT

client.disconnect()

def main():

args = get_args()

model = ADAS_MODEL

infer_on_video(args, model)

if __name__ == "__main__":

main()

#1

#wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz

#2

#tar -xvf ssd_mobilenet_v2_coco_2018_03_29.tar.gz

#3

# python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model /home/workspace/ssd_mobilenet_v2_coco_2018_03_29/frozen_inference_graph.pb --tensorflow_object_detection_api_pipeline_config /home/workspace/ssd_mobilenet_v2_coco_2018_03_29/pipeline.config --reverse_input_channels --tensorflow_use_custom_operations_config /opt/intel/openvino/deployment_tools/model_optimizer/extensions/front/tf/ssd_v2_support.json

# 4

#

#python app.py -m frozen_inference_graph.xml -ct 0.6 -c BLUE

python app.py | ffmpeg -v warning -f rawvideo -pixel_format bgr24 -video_size 1280x720 -framerate 24 -i - http://0.0.0.0:3004/fac.ffm

E4:

import argparse

import cv2

import numpy as np

def get_args():

'''

Gets the arguments from the command line.

'''

parser = argparse.ArgumentParser("Handle an input stream")

# -- Create the descriptions for the commands

i_desc = "The location of the input file"

# -- Create the arguments

parser.add_argument("-i", help=i_desc)

args = parser.parse_args()

return args

def capture_stream(args):

### TODO: Handle image, video or webcam

image_flag=False

if args.i=='CAM':

args.i=0

elif args.i.endswith('.jpg') or args.i.endswith('.bmp'):

image_flag=True

capture = cv2.VideoCapture(args.i)

capture.open(args.i)

if not image_flag:

#out=cv2.VideoWriter('out.mp4',cv2.VideoWriter_fourcc('M','J','P','G'),30,(100,100))

# if Linux

#fourcc=0x00000021

out=cv2.VideoWriter('out.mp4',0x00000021,30,(100,100))

else:

out=None

while capture.isOpened():

flag, frame = capture.read()

if not flag:

break

### TODO: Get and open video capture

key_pressed=cv2.waitKey(60)

if key_pressed == 27:

break

### TODO: Re-size the frame to 100x100

image = cv2.resize(frame, (100, 100))

### TODO: Add Canny Edge Detection to the frame,

### with min & max values of 100 and 200

edges = cv2.Canny(image,100,200)

### Make sure to use np.dstack after to make a 3-channel image

edges = np.dstack((edges, edges, edges))

### TODO: Write out the frame, depending on image or video

if image_flag:

cv2.imwrite("out.jpg",edges)

else:

out.write(edges)

if not image_flag:

out.release

# (display:144): Gtk-WARNING **: cannot open display: :1

#cv2.imshow('display', edges)

### TODO: Close the stream and any windows at the end of the application

capture.release()

cv2.destroyAllWindows()

def main():

args = get_args()

capture_stream(args)

if __name__ == "__main__":

main()

Let's say you have a cat and two dogs at your house.

import argparse

import cv2

from inference import Network

INPUT_STREAM = "pets.mp4"

CPU_EXTENSION = "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"

def get_args():

'''

Gets the arguments from the command line.

'''

parser = argparse.ArgumentParser("Run inference on an input video")

# -- Create the descriptions for the commands

m_desc = "The location of the model XML file"

i_desc = "The location of the input file"

d_desc = "The device name, if not 'CPU'"

# -- Add required and optional groups

parser._action_groups.pop()

required = parser.add_argument_group('required arguments')

optional = parser.add_argument_group('optional arguments')

# -- Create the arguments

required.add_argument("-m", help=m_desc, required=True)

optional.add_argument("-i", help=i_desc, default=INPUT_STREAM)

optional.add_argument("-d", help=d_desc, default='CPU')

args = parser.parse_args()

return args

def assess_scene(result, counter, incident_flag):

'''

Based on the determined situation, potentially send

a message to the pets to break it up.

'''

if result[0][1] == 1 and not incident_flag:

timestamp = counter / 30

print("Log: Incident at {:.3f} seconds.".format(timestamp))

print("Break it up!")

incident_flag = True

elif result[0][1] != 1:

incident_flag = False

return incident_flag

def infer_on_video(args):

# Initialize the Inference Engine

plugin = Network()

# Load the network model into the IE

plugin.load_model(args.m, args.d, CPU_EXTENSION)

net_input_shape = plugin.get_input_shape()

# Get and open video capture

cap = cv2.VideoCapture(args.i)

cap.open(args.i)

incident_flag=False

counter=0

# Process frames until the video ends, or process is exited

while cap.isOpened():

# Read the next frame

counter+=1

flag, frame = cap.read()

if not flag:

break

key_pressed = cv2.waitKey(60)

#print(flag)

# Pre-process the frame

p_frame = cv2.resize(frame, (net_input_shape[3], net_input_shape[2]))

p_frame = p_frame.transpose((2,0,1))

p_frame = p_frame.reshape(1, *p_frame.shape)

# Perform inference on the frame

plugin.async_inference(p_frame)

# Get the output of inference

if plugin.wait() == 0:

result = plugin.extract_output()

### TODO: Process the output

incident_flag=assess_scene(result,counter,incident_flag)

# Break if escape key pressed

if key_pressed == 27:

break

# Release the capture and destroy any OpenCV windows

cap.release()

cv2.destroyAllWindows()

def main():

args = get_args()

infer_on_video(args)

if __name__ == "__main__":

main()

# Integrate the Inference Engine - Solution

Let's step through the tasks one by one, with a potential approach for each.

> Convert a bounding box model to an IR with the Model Optimizer.

I used the SSD Mobilenet V2 architecture from TensorFlow from the earlier lesson here. Note

that the original was downloaded in a separate workspace, so I needed to download it again

and then convert it.

```

python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model frozen_inference_graph.pb --tensorflow_object_detection_api_pipeline_config pipeline.config --reverse_input_channels --tensorflow_use_custom_operations_config /opt/intel/openvino/deployment_tools/model_optimizer/extensions/front/tf/ssd_v2_support.json

```

> Extract the results from the inference request

```

self.exec_network.requests[0].outputs[self.output_blob]

```

> Add code to make the requests and feed back the results within the application

```

self.exec_network.start_async(request_id=0, inputs={self.input_blob: image})

...

status = self.exec_network.requests[0].wait(-1)

```

> Add a command line argument to allow for different confidence thresholds for the model

I chose to use `-ct` as the argument name here, and added it to the existing arguments.

```

optional.add_argument("-ct", help="The confidence threshold to use with the bounding boxes", default=0.5)

```

I set a default of 0.5, so it does not need to be input by the user every time.

> Add a command line argument to allow for different bounding box colors for the output

Similarly, I added the `-c` argument for inputting a bounding box color.

Note that in my approach, I chose to only allow "RED", "GREEN" and "BLUE", which also

impacts what I'll do in the next step; there are many possible approaches here.

```

optional.add_argument("-c", help="The color of the bounding boxes to draw; RED, GREEN or BLUE", default='BLUE')

```

> Correctly utilize the command line arguments in #3 and #4 within the application

Both of these will come into play within the `draw_boxes` function. For the first, a new line

should be added before extracting the bounding box points that check whether `box[2]`

(e.g. the probability of a given box) is above `args.ct` - assuming you have added

`args.ct` as an argument passed to the `draw_boxes` function. If not, the box

should not be drawn. Without this, any random box will be drawn, which could be a ton of

very unlikely bounding box detections.

========== app.py

import argparse

import cv2

from inference import Network

INPUT_STREAM = "test_video.mp4"

CPU_EXTENSION = "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"

def get_args():

'''

Gets the arguments from the command line.

'''

parser = argparse.ArgumentParser("Run inference on an input video")

# -- Create the descriptions for the commands

m_desc = "The location of the model XML file"

i_desc = "The location of the input file"

d_desc = "The device name, if not 'CPU'"

### TODO: Add additional arguments and descriptions for:

### 1) Different confidence thresholds used to draw bounding boxes

### 2) The user choosing the color of the bounding boxes

# -- Add required and optional groups

parser._action_groups.pop()

required = parser.add_argument_group('required arguments')

optional = parser.add_argument_group('optional arguments')

# -- Create the arguments

required.add_argument("-m", help=m_desc, required=True)

optional.add_argument("-i", help=i_desc, default=INPUT_STREAM)

optional.add_argument("-d", help=d_desc, default='CPU')

args = parser.parse_args()

return args

def draw_boxes(frame, result, args, width, height):

'''

Draw bounding boxes onto the frame.

'''

for box in result[0][0]: # Output shape is 1x1x100x7

conf = box[2]

if conf >= 0.5:

xmin = int(box[3] * width)

ymin = int(box[4] * height)

xmax = int(box[5] * width)

ymax = int(box[6] * height)

cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), (0, 0, 255), 1)

return frame

def infer_on_video(args):

### TODO: Initialize the Inference Engine

plugin = Network()

### TODO: Load the network model into the IE

plugin.load_model(args.m, args.d, CPU_EXTENSION)

net_input_shape = plugin.get_input_shape()

# Get and open video capture

cap = cv2.VideoCapture(args.i)

cap.open(args.i)

# Grab the shape of the input

width = int(cap.get(3))

height = int(cap.get(4))

# Create a video writer for the output video

# The second argument should be `cv2.VideoWriter_fourcc('M','J','P','G')`

# on Mac, and `0x00000021` on Linux

out = cv2.VideoWriter('out.mp4', 0x00000021, 30, (width,height))

# Process frames until the video ends, or process is exited

while cap.isOpened():

# Read the next frame

flag, frame = cap.read()

if not flag:

break

key_pressed = cv2.waitKey(60)

### TODO: Pre-process the frame

p_frame = cv2.resize(frame, (net_input_shape[3], net_input_shape[2]))

p_frame = p_frame.transpose((2,0,1))

p_frame = p_frame.reshape(1, *p_frame.shape)

### TODO: Perform inference on the frame

plugin.async_inference(p_frame)

### TODO: Get the output of inference

if plugin.wait() == 0:

result = plugin.extract_output()

### TODO: Update the frame to include detected bounding boxes

frame = draw_boxes(frame, result, args, width, height)

# Write out the frame

out.write(frame)

# Break if escape key pressed

if key_pressed == 27:

break

# Release the out writer, capture, and destroy any OpenCV windows

out.release()

cap.release()

cv2.destroyAllWindows()

def main():

args = get_args()

infer_on_video(args)

if __name__ == "__main__":

main()

========================================================app-sustom.py

import argparse

import cv2

from inference import Network

INPUT_STREAM = "test_video.mp4"

CPU_EXTENSION = "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"

def get_args():

'''

Gets the arguments from the command line.

'''

parser = argparse.ArgumentParser("Run inference on an input video")

# -- Create the descriptions for the commands

m_desc = "The location of the model XML file"

i_desc = "The location of the input file"

d_desc = "The device name, if not 'CPU'"

### TODO: Add additional arguments and descriptions for:

### 1) Different confidence thresholds used to draw bounding boxes

### 2) The user choosing the color of the bounding boxes

c_desc = "The color of the bounding boxes to draw; RED, GREEN or BLUE"

ct_desc = "The confidence threshold to use with the bounding boxes"

# -- Add required and optional groups

parser._action_groups.pop()

required = parser.add_argument_group('required arguments')

optional = parser.add_argument_group('optional arguments')

# -- Create the arguments

required.add_argument("-m", help=m_desc, required=True)

optional.add_argument("-i", help=i_desc, default=INPUT_STREAM)

optional.add_argument("-d", help=d_desc, default='CPU')

optional.add_argument("-c", help=c_desc, default='BLUE')

optional.add_argument("-ct", help=ct_desc, default=0.5)

args = parser.parse_args()

return args

def convert_color(color_string):

'''

Get the BGR value of the desired bounding box color.

Defaults to Blue if an invalid color is given.

'''

colors = {"BLUE": (255,0,0), "GREEN": (0,255,0), "RED": (0,0,255)}

out_color = colors.get(color_string)

if out_color:

return out_color

else:

return colors['BLUE']

def draw_boxes(frame, result, args, width, height):

'''

Draw bounding boxes onto the frame.

'''

for box in result[0][0]: # Output shape is 1x1x100x7

conf = box[2]

if conf >= args.ct:

xmin = int(box[3] * width)

ymin = int(box[4] * height)

xmax = int(box[5] * width)

ymax = int(box[6] * height)

cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), args.c, 1)

return frame

def infer_on_video(args):

# Convert the args for color and confidence

args.c = convert_color(args.c)

args.ct = float(args.ct)

### TODO: Initialize the Inference Engine

plugin = Network()

### TODO: Load the network model into the IE

plugin.load_model(args.m, args.d, CPU_EXTENSION)

net_input_shape = plugin.get_input_shape()

# Get and open video capture

cap = cv2.VideoCapture(args.i)

cap.open(args.i)

# Grab the shape of the input

width = int(cap.get(3))

height = int(cap.get(4))

# Create a video writer for the output video

# The second argument should be `cv2.VideoWriter_fourcc('M','J','P','G')`

# on Mac, and `0x00000021` on Linux

out = cv2.VideoWriter('out.mp4', 0x00000021, 30, (width,height))

# Process frames until the video ends, or process is exited

while cap.isOpened():

# Read the next frame

flag, frame = cap.read()

if not flag:

break

key_pressed = cv2.waitKey(60)

### TODO: Pre-process the frame

p_frame = cv2.resize(frame, (net_input_shape[3], net_input_shape[2]))

p_frame = p_frame.transpose((2,0,1))

p_frame = p_frame.reshape(1, *p_frame.shape)

### TODO: Perform inference on the frame

plugin.async_inference(p_frame)

### TODO: Get the output of inference

if plugin.wait() == 0:

result = plugin.extract_output()

### TODO: Update the frame to include detected bounding boxes

frame = draw_boxes(frame, result, args, width, height)

# Write out the frame

out.write(frame)

# Break if escape key pressed

if key_pressed == 27:

break

# Release the out writer, capture, and destroy any OpenCV windows

out.release()

cap.release()

cv2.destroyAllWindows()

def main():

args = get_args()

infer_on_video(args)

if __name__ == "__main__":

main()

=============================================interface.py

'''

Contains code for working with the Inference Engine.

You'll learn how to implement this code and more in

the related lesson on the topic.

'''

import os

import sys

import logging as log

from openvino.inference_engine import IENetwork, IECore

class Network:

'''

Load and store information for working with the Inference Engine,

and any loaded models.

'''

def __init__(self):

self.plugin = None

self.network = None

self.input_blob = None

self.output_blob = None

self.exec_network = None

self.infer_request = None

def load_model(self, model, device="CPU", cpu_extension=None):

'''

Load the model given IR files.

Defaults to CPU as device for use in the workspace.

Synchronous requests made within.

'''

model_xml = model

model_bin = os.path.splitext(model_xml)[0] + ".bin"

# Initialize the plugin

self.plugin = IECore()

# Add a CPU extension, if applicable

if cpu_extension and "CPU" in device:

self.plugin.add_extension(cpu_extension, device)

# Read the IR as a IENetwork

self.network = IENetwork(model=model_xml, weights=model_bin)

# Load the IENetwork into the plugin

self.exec_network = self.plugin.load_network(self.network, device)

# Get the input layer

self.input_blob = next(iter(self.network.inputs))

self.output_blob = next(iter(self.network.outputs))

return

def get_input_shape(self):

'''

Gets the input shape of the network

'''

return self.network.inputs[self.input_blob].shape

def async_inference(self, image):

'''

Makes an asynchronous inference request, given an input image.

'''

self.exec_network.start_async(request_id=0,

inputs={self.input_blob: image})

return

def wait(self):

'''

Checks the status of the inference request.

'''

status = self.exec_network.requests[0].wait(-1)

return status

def extract_output(self):

'''

Returns a list of the results for the output layer of the network.

'''

return self.exec_network.requests[0].outputs[self.output_blob]

======================================

The second is just a small adjustment to the `cv2.rectangle` function that draws the

bounding boxes we found to be above `args.ct`. I actually added a function to match

the different potential colors up to their RGB values first, due to how I took them in from the

command line:

```

def convert_color(color_string):

'''

Get the BGR value of the desired bounding box color.

Defaults to Blue if an invalid color is given.

'''

colors = {"BLUE": (255,0,0), "GREEN": (0,255,0), "RED": (0,0,255)}

out_color = colors.get(color_string)

if out_color:

return out_color

else:

return colors['BLUE']

```

I can also add the tuple returned from this function as an additional `color` argument to feed to

`draw_boxes`.

Then, the line where the bounding boxes are drawn becomes:

```

cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color, 1)

```

I was able to run my app, if I was using the converted TF model from earlier (and placed in the

current directory), using the below:

```bash

python app.py -m frozen_inference_graph.xml

```

Or, if I added additional customization with a confidence threshold of 0.6 and blue boxes:

```bash

python app.py -m frozen_inference_graph.xml -ct 0.6 -c BLUE

```

[Note that I placed my customized app actually in `app-custom.py`]