IoT Scholarship Foundation
Face recognition by using OpenCV 4.1.1 is face but not accurate
Image classification by deep learning is very good with high accuracy
Object detection is around 5 frame per second: MobileNet SSD
Using Coral is very good - need some modification to installing - pose estimation is very face and accurate
Using Intel Movidius Stick 2 is good because we can use OpenCV and Python and different range of deep learning frameworks
Asfd
[1] Y. LeCun, “Generalization and network design strategies,” in Connectionism in perspective, 1989, pp. 143–155.
[1] S. Patrice, V. Bernard, L. Yann, and D. John S, “Tangent Prop: a formalism for specifying selected invariances in adaptive networks,” in Nips, 1991, vol. 4, pp. 895–903.
[1] Y. Le Cun, J. S. Denker, and S. a Solla, “Optimal Brain Damage,” in Advances in Neural Information Processing Systems, 1990, vol. 2, no. 1, pp. 598–605.
=====================================================
OpenCV: A computer vision (CV) library filled with many different computer vision functions and other useful image and video processing and handling capabilities.
MQTT: A publisher-subscriber protocol often used for IoT devices due to its lightweight nature. The paho-mqtt library is a common way of working with MQTT in Python.
Publish-Subscribe Architecture: A messaging architecture whereby it is made up of publishers, that send messages to some central broker, without knowing of the subscribers themselves. These messages can be posted on some given “topic”, which the subscribers can then listen to without having to know the publisher itself, just the “topic”.
Publisher: In a publish-subscribe architecture, the entity that is sending data to a broker on a certain “topic”.
Subscriber: In a publish-subscribe architecture, the entity that is listening to data on a certain “topic” from a broker.
Topic: In a publish-subscribe architecture, data is published to a given topic, and subscribers to that topic can then receive that data.
FFmpeg: Software that can help convert or stream audio and video. In the course, the related ffserver software is used to stream to a web server, which can then be queried by a Node server for viewing in a web browser.
Flask: A Python framework useful for web development and another potential option for video streaming to a web browser.
Node Server: A web server built with Node.js that can handle HTTP requests and/or serve up a webpage for viewing in a browser.
The "edge" means local (or near local) processing
NOT just anywhere in the cloud
Edge applications are often used where low latency is necessary
Also used where a network may not always be available
Can come from a desire for real-time decision making
Lesson 1: introduction to AI at the Edge
Lesson 2: Leveraging Pre-Trained Models
Lesson 3: The model Optimizer
Lesson 4: The inference engine
Lesson 5: Deploying an Edge App
No need to send to the cloud-> secure, less impact on network
Which of these are reasons for development of the Edge?
Proliferation of devices, need for low-latency compute, need for disconnected devices.
In this course, we’ll largely focus on AI at the Edge using the Intel® Distribution of OpenVINO™ Toolkit.
First, we’ll start off with pre-trained models available in the OpenVINO™ Open Model Zoo. Even without needing huge amounts of your own data and costly training, you can deploy powerful models already created for many applications.
Next, you’ll learn about the Model Optimizer, which can take a model you trained in frameworks such as TensorFlow, PyTorch, Caffe and more, and create an Intermediate Representation (IR) optimized for inference with OpenVINO™ and Intel® hardware.
Third, you’ll learn about the Inference Engine, where the actual inference is performed on the IR model.
Lastly, we'll hit some more topics on deploying at the edge, including things like handling input streams, processing model outputs, and the lightweight MQTT architecture used to publish data from your edge models to the web.
Very important listen again first two video :
Classification
Yes/no
Classes (1000 competition)
20 000 classes ImageNet
Detection
Find objects and location
Bounding boxes where object is
Combined with some form of classification
Segmentation
Classification segment of images (classify each and every pixel)
Semantic segmentation
All objects of the same class are one
Instance segmentation
Each object of a class is separate (two cats will different color)
Pose estimation
Text recognition
GANs
sudo ./downloader --name vehicle-attributes-recognition-barrier-0039 --precisions INT8 -o /home/workspace
Pre-processing:
Varies by model
Color channel order matters (RGB vs. BGR)
Image resizing
Normalization
def preprocessing(input_image, height, width):
image = cv2.resize(input_image, (width, height))
image = image.transpose((2,0,1))
image = image.reshape(1, 3, height, width)
return image
python app.py -i "images/blue-car.jpg" -t "CAR_META" -m "/home/workspace/models/vehicle-attributes-recognition-barrier-0039.xml" -c "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"
python app.py -i "images/sitting-on-car.jpg" -t "POSE" -m "/home/workspace/models/human-pose-estimation-0001.xml" -c "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"
python app.py -i "images/sign.jpg" -t "TEXT" -m "/home/workspace/models/text-detection-0004.xml" -c "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"
source /opt/intel/openvino/bin/setupvars.sh -pyver 3.5
python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model frozen_inference_graph.pb --tensorflow_object_detection_api_pipeline_config pipeline.config --reverse_input_channels --tensorflow_use_custom_operations_config /opt/intel/openvino/deployment_tools/model_optimizer/extensions/front/tf/ssd_v2_support.json
export MOD_OPT=/opt/intel/openvino/deployment_tools/model_optimizer
python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model squeezenet_v1.1.caffemodel --input_proto deploy.prototxt
python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model model.onnx
There’s two main command line arguments to use for cutting a model with the Model Optimizer, named intuitively as --input and --output, where they are used to feed in the layer names that should be either the new entry or exit points of the model.
-l $CLWS/cl_cosh/user_ie_extensions/cpu/build/libcosh_cpu_extension.so
~/inference_engine_samples_build/intel64/Release/classification_sample_async -i $CLT/pics/dog.bmp -m $CLWS/cl_ext_cosh/model.ckpt.xml -d CPU -l $CLWS/cl_cosh/user_ie_extensions/cpu/build/libcosh_cpu_extension.so
import argparse
import cv2
from helpers import load_to_IE, preprocessing
CPU_EXTENSION = "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"
def get_args():
'''
Gets the arguments from the command line.
'''
parser = argparse.ArgumentParser("Load an IR into the Inference Engine")
# -- Create the descriptions for the commands
m_desc = "The location of the model XML file"
i_desc = "The location of the image input"
r_desc = "The type of inference request: Async ('A') or Sync ('S')"
# -- Create the arguments
parser.add_argument("-m", help=m_desc)
parser.add_argument("-i", help=i_desc)
parser.add_argument("-r", help=i_desc)
args = parser.parse_args()
return args
def async_inference(exec_net, input_blob, image):
### TODO: Add code to perform asynchronous inference
### Note: Return the exec_net
exec_net.start_async(request_id=0, inputs={input_blob: image})
while True:
status = exec_net.requests[0].wait(-1)
if status == 0:
break
else:
time.sleep(1)
return exec_net
def sync_inference(exec_net, input_blob, image):
### TODO: Add code to perform synchronous inference
### Note: Return the result of inference
result = exec_net.infer({input_blob: image})
return result
def perform_inference(exec_net, request_type, input_image, input_shape):
'''
Performs inference on an input image, given an ExecutableNetwork
'''
# Get input image
image = cv2.imread(input_image)
# Extract the input shape
n, c, h, w = input_shape
# Preprocess it (applies for the IRs from the Pre-Trained Models lesson)
preprocessed_image = preprocessing(image, h, w)
# Get the input blob for the inference request
input_blob = next(iter(exec_net.inputs))
# Perform either synchronous or asynchronous inference
request_type = request_type.lower()
if request_type == 'a':
output = async_inference(exec_net, input_blob, preprocessed_image)
elif request_type == 's':
output = sync_inference(exec_net, input_blob, preprocessed_image)
else:
print("Unknown inference request type, should be 'A' or 'S'.")
exit(1)
# Return the exec_net for testing purposes
return output
def main():
args = get_args()
exec_net, input_shape = load_to_IE(args.m, CPU_EXTENSION)
perform_inference(exec_net, args.r, args.i, input_shape)
if __name__ == "__main__":
main()
https://docs.openvinotoolkit.org/latest/classInferenceEngine_1_1Blob.html
Note: There is one small change from the code on-screen for running on Linux machines versus Mac. On Mac, cv2.VideoWriter uses cv2.VideoWriter_fourcc('M','J','P','G') to write an .mp4 file, while Linux uses 0x00000021.
import argparse
import cv2
import numpy as np
def get_args():
'''
Gets the arguments from the command line.
'''
parser = argparse.ArgumentParser("Handle an input stream")
# -- Create the descriptions for the commands
i_desc = "The location of the input file"
# -- Create the arguments
parser.add_argument("-i", help=i_desc)
args = parser.parse_args()
return args
def capture_stream(args):
### TODO: Handle image, video or webcam
image_flag=False
if args.i=='CAM':
args.i=0
elif args.i.endswith('.jpg') or args.i.endswith('.bmp'):
image_flag=True
capture = cv2.VideoCapture(args.i)
capture.open(args.i)
if not image_flag:
out=cv2.VideoWriter('out.mp4',cv2.VideoWriter_fourcc('M','J','P','G'),30,(100,100))
else:
out=None
while capture.isOpened():
flag, frame = capture.read()
if not flag:
break
### TODO: Get and open video capture
key_pressed=cv2.waitKey(60)
if key_pressed == 27:
break
### TODO: Re-size the frame to 100x100
image = cv2.resize(frame, (100, 100))
edges = cv2.Canny(image,100,200)
if image_flag:
cv2.imwrite("out.jpg",edges)
else:
out.write(edges)
if not image_flag:
out.release
cv2.imshow('display', edges)
#cv2.imwrite('output.jpg', edges)
### TODO: Add Canny Edge Detection to the frame,
### with min & max values of 100 and 200
### Make sure to use np.dstack after to make a 3-channel image
### TODO: Write out the frame, depending on image or video
### TODO: Close the stream and any windows at the end of the application
#capture.release()
#cv2.destroyAllWindows()
def main():
args = get_args()
capture_stream(args)
if __name__ == "__main__":
main()
===============
import argparse
import cv2
from inference import Network
INPUT_STREAM = "pets.mp4"
CPU_EXTENSION = "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"
def get_args():
'''
Gets the arguments from the command line.
'''
parser = argparse.ArgumentParser("Run inference on an input video")
# -- Create the descriptions for the commands
m_desc = "The location of the model XML file"
i_desc = "The location of the input file"
d_desc = "The device name, if not 'CPU'"
# -- Add required and optional groups
parser._action_groups.pop()
required = parser.add_argument_group('required arguments')
optional = parser.add_argument_group('optional arguments')
# -- Create the arguments
required.add_argument("-m", help=m_desc, required=True)
optional.add_argument("-i", help=i_desc, default=INPUT_STREAM)
optional.add_argument("-d", help=d_desc, default='CPU')
args = parser.parse_args()
return args
def assess_scene(result, counter, incident_flag):
'''
Based on the determined situation, potentially send
a message to the pets to break it up.
'''
if result[0][1] == 1 and not incident_flag:
timestamp = counter / 30
print("Log: Incident at {:.3f} seconds.".format(timestamp))
print("Break it up!")
incident_flag = True
elif result[0][1] != 1:
incident_flag = False
return incident_flag
def infer_on_video(args):
# Initialize the Inference Engine
plugin = Network()
# Load the network model into the IE
plugin.load_model(args.m, args.d, CPU_EXTENSION)
net_input_shape = plugin.get_input_shape()
# Get and open video capture
cap = cv2.VideoCapture(args.i)
cap.open(args.i)
incident_flag=False
counter=0
# Process frames until the video ends, or process is exited
while cap.isOpened():
# Read the next frame
counter+=1
flag, frame = cap.read()
if not flag:
break
key_pressed = cv2.waitKey(60)
#print(flag)
# Pre-process the frame
p_frame = cv2.resize(frame, (net_input_shape[3], net_input_shape[2]))
p_frame = p_frame.transpose((2,0,1))
p_frame = p_frame.reshape(1, *p_frame.shape)
# Perform inference on the frame
plugin.async_inference(p_frame)
# Get the output of inference
if plugin.wait() == 0:
result = plugin.extract_output()
### TODO: Process the output
incident_flag=assess_scene(result,counter,incident_flag)
# Break if escape key pressed
if key_pressed == 27:
break
# Release the capture and destroy any OpenCV windows
cap.release()
cv2.destroyAllWindows()
def main():
args = get_args()
infer_on_video(args)
if __name__ == "__main__":
main()
*************
************************************
import argparse
import cv2
import numpy as np
import socket
import json
from random import randint
from inference import Network
### TODO: Import any libraries for MQTT and FFmpeg
import paho.mqtt.client as mqtt
INPUT_STREAM = "test_video.mp4"
CPU_EXTENSION = "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"
ADAS_MODEL = "/home/workspace/models/semantic-segmentation-adas-0001.xml"
CLASSES = ['road', 'sidewalk', 'building', 'wall', 'fence', 'pole',
'traffic_light', 'traffic_sign', 'vegetation', 'terrain', 'sky', 'person',
'rider', 'car', 'truck', 'bus', 'train', 'motorcycle', 'bicycle', 'ego-vehicle']
# MQTT server environment variables
HOSTNAME = socket.gethostname()
IPADDRESS = socket.gethostbyname(HOSTNAME)
MQTT_HOST = IPADDRESS
MQTT_PORT = 3004 #None ### TODO: Set the Port for MQTT
MQTT_KEEPALIVE_INTERVAL = 60
def get_args():
'''
Gets the arguments from the command line.
'''
parser = argparse.ArgumentParser("Run inference on an input video")
# -- Create the descriptions for the commands
i_desc = "The location of the input file"
d_desc = "The device name, if not 'CPU'"
# -- Create the arguments
parser.add_argument("-i", help=i_desc, default=INPUT_STREAM)
parser.add_argument("-d", help=d_desc, default='CPU')
args = parser.parse_args()
return args
def draw_masks(result, width, height):
'''
Draw semantic mask classes onto the frame.
'''
# Create a mask with color by class
classes = cv2.resize(result[0].transpose((1,2,0)), (width,height),
interpolation=cv2.INTER_NEAREST)
unique_classes = np.unique(classes)
out_mask = classes * (255/20)
# Stack the mask so FFmpeg understands it
out_mask = np.dstack((out_mask, out_mask, out_mask))
out_mask = np.uint8(out_mask)
return out_mask, unique_classes
def get_class_names(class_nums):
class_names= []
for i in class_nums:
class_names.append(CLASSES[int(i)])
return class_names
def infer_on_video(args, model):
### TODO: Connect to the MQTT server
client = mqtt.Client()
client.connect(MQTT_HOST, MQTT_PORT, MQTT_KEEPALIVE_INTERVAL)
# Initialize the Inference Engine
plugin = Network()
# Load the network model into the IE
plugin.load_model(model, args.d, CPU_EXTENSION)
net_input_shape = plugin.get_input_shape()
# Get and open video capture
cap = cv2.VideoCapture(args.i)
cap.open(args.i)
# Grab the shape of the input
width = int(cap.get(3))
height = int(cap.get(4))
# Process frames until the video ends, or process is exited
while cap.isOpened():
# Read the next frame
flag, frame = cap.read()
if not flag:
break
key_pressed = cv2.waitKey(60)
# Pre-process the frame
p_frame = cv2.resize(frame, (net_input_shape[3], net_input_shape[2]))
p_frame = p_frame.transpose((2,0,1))
p_frame = p_frame.reshape(1, *p_frame.shape)
# Perform inference on the frame
plugin.async_inference(p_frame)
# Get the output of inference
if plugin.wait() == 0:
result = plugin.extract_output()
# Draw the output mask onto the input
out_frame, classes = draw_masks(result, width, height)
class_names = get_class_names(classes)
speed = randint(50,70)
### TODO: Send the class names and speed to the MQTT server
client.publish("class", json.dumps({"class_names": class_names}))
client.publish("speedometer", json.dumps({"speed": speed}))
### Hint: The UI web server will check for a "class" and
### "speedometer" topic. Additionally, it expects "class_names"
### and "speed" as the json keys of the data, respectively.
### TODO: Send frame to the ffmpeg server
sys.stdout.buffer.write(frame)
sys.stdout.flush()
# Break if escape key pressed
if key_pressed == 27:
break
# Release the capture and destroy any OpenCV windows
cap.release()
cv2.destroyAllWindows()
### TODO: Disconnect from MQTT
client.disconnect()
def main():
args = get_args()
model = ADAS_MODEL
infer_on_video(args, model)
if __name__ == "__main__":
main()
#1
#wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz
#2
#tar -xvf ssd_mobilenet_v2_coco_2018_03_29.tar.gz
#3
# python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model /home/workspace/ssd_mobilenet_v2_coco_2018_03_29/frozen_inference_graph.pb --tensorflow_object_detection_api_pipeline_config /home/workspace/ssd_mobilenet_v2_coco_2018_03_29/pipeline.config --reverse_input_channels --tensorflow_use_custom_operations_config /opt/intel/openvino/deployment_tools/model_optimizer/extensions/front/tf/ssd_v2_support.json
# 4
#
#python app.py -m frozen_inference_graph.xml -ct 0.6 -c BLUE
python app.py | ffmpeg -v warning -f rawvideo -pixel_format bgr24 -video_size 1280x720 -framerate 24 -i - http://0.0.0.0:3004/fac.ffm
E4:
import argparse
import cv2
import numpy as np
def get_args():
'''
Gets the arguments from the command line.
'''
parser = argparse.ArgumentParser("Handle an input stream")
# -- Create the descriptions for the commands
i_desc = "The location of the input file"
# -- Create the arguments
parser.add_argument("-i", help=i_desc)
args = parser.parse_args()
return args
def capture_stream(args):
### TODO: Handle image, video or webcam
image_flag=False
if args.i=='CAM':
args.i=0
elif args.i.endswith('.jpg') or args.i.endswith('.bmp'):
image_flag=True
capture = cv2.VideoCapture(args.i)
capture.open(args.i)
if not image_flag:
#out=cv2.VideoWriter('out.mp4',cv2.VideoWriter_fourcc('M','J','P','G'),30,(100,100))
# if Linux
#fourcc=0x00000021
out=cv2.VideoWriter('out.mp4',0x00000021,30,(100,100))
else:
out=None
while capture.isOpened():
flag, frame = capture.read()
if not flag:
break
### TODO: Get and open video capture
key_pressed=cv2.waitKey(60)
if key_pressed == 27:
break
### TODO: Re-size the frame to 100x100
image = cv2.resize(frame, (100, 100))
### TODO: Add Canny Edge Detection to the frame,
### with min & max values of 100 and 200
edges = cv2.Canny(image,100,200)
### Make sure to use np.dstack after to make a 3-channel image
edges = np.dstack((edges, edges, edges))
### TODO: Write out the frame, depending on image or video
if image_flag:
cv2.imwrite("out.jpg",edges)
else:
out.write(edges)
if not image_flag:
out.release
# (display:144): Gtk-WARNING **: cannot open display: :1
#cv2.imshow('display', edges)
### TODO: Close the stream and any windows at the end of the application
capture.release()
cv2.destroyAllWindows()
def main():
args = get_args()
capture_stream(args)
if __name__ == "__main__":
main()
Let's say you have a cat and two dogs at your house.
import argparse
import cv2
from inference import Network
INPUT_STREAM = "pets.mp4"
CPU_EXTENSION = "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"
def get_args():
'''
Gets the arguments from the command line.
'''
parser = argparse.ArgumentParser("Run inference on an input video")
# -- Create the descriptions for the commands
m_desc = "The location of the model XML file"
i_desc = "The location of the input file"
d_desc = "The device name, if not 'CPU'"
# -- Add required and optional groups
parser._action_groups.pop()
required = parser.add_argument_group('required arguments')
optional = parser.add_argument_group('optional arguments')
# -- Create the arguments
required.add_argument("-m", help=m_desc, required=True)
optional.add_argument("-i", help=i_desc, default=INPUT_STREAM)
optional.add_argument("-d", help=d_desc, default='CPU')
args = parser.parse_args()
return args
def assess_scene(result, counter, incident_flag):
'''
Based on the determined situation, potentially send
a message to the pets to break it up.
'''
if result[0][1] == 1 and not incident_flag:
timestamp = counter / 30
print("Log: Incident at {:.3f} seconds.".format(timestamp))
print("Break it up!")
incident_flag = True
elif result[0][1] != 1:
incident_flag = False
return incident_flag
def infer_on_video(args):
# Initialize the Inference Engine
plugin = Network()
# Load the network model into the IE
plugin.load_model(args.m, args.d, CPU_EXTENSION)
net_input_shape = plugin.get_input_shape()
# Get and open video capture
cap = cv2.VideoCapture(args.i)
cap.open(args.i)
incident_flag=False
counter=0
# Process frames until the video ends, or process is exited
while cap.isOpened():
# Read the next frame
counter+=1
flag, frame = cap.read()
if not flag:
break
key_pressed = cv2.waitKey(60)
#print(flag)
# Pre-process the frame
p_frame = cv2.resize(frame, (net_input_shape[3], net_input_shape[2]))
p_frame = p_frame.transpose((2,0,1))
p_frame = p_frame.reshape(1, *p_frame.shape)
# Perform inference on the frame
plugin.async_inference(p_frame)
# Get the output of inference
if plugin.wait() == 0:
result = plugin.extract_output()
### TODO: Process the output
incident_flag=assess_scene(result,counter,incident_flag)
# Break if escape key pressed
if key_pressed == 27:
break
# Release the capture and destroy any OpenCV windows
cap.release()
cv2.destroyAllWindows()
def main():
args = get_args()
infer_on_video(args)
if __name__ == "__main__":
main()
# Integrate the Inference Engine - Solution
Let's step through the tasks one by one, with a potential approach for each.
> Convert a bounding box model to an IR with the Model Optimizer.
I used the SSD Mobilenet V2 architecture from TensorFlow from the earlier lesson here. Note
that the original was downloaded in a separate workspace, so I needed to download it again
and then convert it.
```
python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model frozen_inference_graph.pb --tensorflow_object_detection_api_pipeline_config pipeline.config --reverse_input_channels --tensorflow_use_custom_operations_config /opt/intel/openvino/deployment_tools/model_optimizer/extensions/front/tf/ssd_v2_support.json
```
> Extract the results from the inference request
```
self.exec_network.requests[0].outputs[self.output_blob]
```
> Add code to make the requests and feed back the results within the application
```
self.exec_network.start_async(request_id=0, inputs={self.input_blob: image})
...
status = self.exec_network.requests[0].wait(-1)
```
> Add a command line argument to allow for different confidence thresholds for the model
I chose to use `-ct` as the argument name here, and added it to the existing arguments.
```
optional.add_argument("-ct", help="The confidence threshold to use with the bounding boxes", default=0.5)
```
I set a default of 0.5, so it does not need to be input by the user every time.
> Add a command line argument to allow for different bounding box colors for the output
Similarly, I added the `-c` argument for inputting a bounding box color.
Note that in my approach, I chose to only allow "RED", "GREEN" and "BLUE", which also
impacts what I'll do in the next step; there are many possible approaches here.
```
optional.add_argument("-c", help="The color of the bounding boxes to draw; RED, GREEN or BLUE", default='BLUE')
```
> Correctly utilize the command line arguments in #3 and #4 within the application
Both of these will come into play within the `draw_boxes` function. For the first, a new line
should be added before extracting the bounding box points that check whether `box[2]`
(e.g. the probability of a given box) is above `args.ct` - assuming you have added
`args.ct` as an argument passed to the `draw_boxes` function. If not, the box
should not be drawn. Without this, any random box will be drawn, which could be a ton of
very unlikely bounding box detections.
========== app.py
import argparse
import cv2
from inference import Network
INPUT_STREAM = "test_video.mp4"
CPU_EXTENSION = "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"
def get_args():
'''
Gets the arguments from the command line.
'''
parser = argparse.ArgumentParser("Run inference on an input video")
# -- Create the descriptions for the commands
m_desc = "The location of the model XML file"
i_desc = "The location of the input file"
d_desc = "The device name, if not 'CPU'"
### TODO: Add additional arguments and descriptions for:
### 1) Different confidence thresholds used to draw bounding boxes
### 2) The user choosing the color of the bounding boxes
# -- Add required and optional groups
parser._action_groups.pop()
required = parser.add_argument_group('required arguments')
optional = parser.add_argument_group('optional arguments')
# -- Create the arguments
required.add_argument("-m", help=m_desc, required=True)
optional.add_argument("-i", help=i_desc, default=INPUT_STREAM)
optional.add_argument("-d", help=d_desc, default='CPU')
args = parser.parse_args()
return args
def draw_boxes(frame, result, args, width, height):
'''
Draw bounding boxes onto the frame.
'''
for box in result[0][0]: # Output shape is 1x1x100x7
conf = box[2]
if conf >= 0.5:
xmin = int(box[3] * width)
ymin = int(box[4] * height)
xmax = int(box[5] * width)
ymax = int(box[6] * height)
cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), (0, 0, 255), 1)
return frame
def infer_on_video(args):
### TODO: Initialize the Inference Engine
plugin = Network()
### TODO: Load the network model into the IE
plugin.load_model(args.m, args.d, CPU_EXTENSION)
net_input_shape = plugin.get_input_shape()
# Get and open video capture
cap = cv2.VideoCapture(args.i)
cap.open(args.i)
# Grab the shape of the input
width = int(cap.get(3))
height = int(cap.get(4))
# Create a video writer for the output video
# The second argument should be `cv2.VideoWriter_fourcc('M','J','P','G')`
# on Mac, and `0x00000021` on Linux
out = cv2.VideoWriter('out.mp4', 0x00000021, 30, (width,height))
# Process frames until the video ends, or process is exited
while cap.isOpened():
# Read the next frame
flag, frame = cap.read()
if not flag:
break
key_pressed = cv2.waitKey(60)
### TODO: Pre-process the frame
p_frame = cv2.resize(frame, (net_input_shape[3], net_input_shape[2]))
p_frame = p_frame.transpose((2,0,1))
p_frame = p_frame.reshape(1, *p_frame.shape)
### TODO: Perform inference on the frame
plugin.async_inference(p_frame)
### TODO: Get the output of inference
if plugin.wait() == 0:
result = plugin.extract_output()
### TODO: Update the frame to include detected bounding boxes
frame = draw_boxes(frame, result, args, width, height)
# Write out the frame
out.write(frame)
# Break if escape key pressed
if key_pressed == 27:
break
# Release the out writer, capture, and destroy any OpenCV windows
out.release()
cap.release()
cv2.destroyAllWindows()
def main():
args = get_args()
infer_on_video(args)
if __name__ == "__main__":
main()
========================================================app-sustom.py
import argparse
import cv2
from inference import Network
INPUT_STREAM = "test_video.mp4"
CPU_EXTENSION = "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"
def get_args():
'''
Gets the arguments from the command line.
'''
parser = argparse.ArgumentParser("Run inference on an input video")
# -- Create the descriptions for the commands
m_desc = "The location of the model XML file"
i_desc = "The location of the input file"
d_desc = "The device name, if not 'CPU'"
### TODO: Add additional arguments and descriptions for:
### 1) Different confidence thresholds used to draw bounding boxes
### 2) The user choosing the color of the bounding boxes
c_desc = "The color of the bounding boxes to draw; RED, GREEN or BLUE"
ct_desc = "The confidence threshold to use with the bounding boxes"
# -- Add required and optional groups
parser._action_groups.pop()
required = parser.add_argument_group('required arguments')
optional = parser.add_argument_group('optional arguments')
# -- Create the arguments
required.add_argument("-m", help=m_desc, required=True)
optional.add_argument("-i", help=i_desc, default=INPUT_STREAM)
optional.add_argument("-d", help=d_desc, default='CPU')
optional.add_argument("-c", help=c_desc, default='BLUE')
optional.add_argument("-ct", help=ct_desc, default=0.5)
args = parser.parse_args()
return args
def convert_color(color_string):
'''
Get the BGR value of the desired bounding box color.
Defaults to Blue if an invalid color is given.
'''
colors = {"BLUE": (255,0,0), "GREEN": (0,255,0), "RED": (0,0,255)}
out_color = colors.get(color_string)
if out_color:
return out_color
else:
return colors['BLUE']
def draw_boxes(frame, result, args, width, height):
'''
Draw bounding boxes onto the frame.
'''
for box in result[0][0]: # Output shape is 1x1x100x7
conf = box[2]
if conf >= args.ct:
xmin = int(box[3] * width)
ymin = int(box[4] * height)
xmax = int(box[5] * width)
ymax = int(box[6] * height)
cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), args.c, 1)
return frame
def infer_on_video(args):
# Convert the args for color and confidence
args.c = convert_color(args.c)
args.ct = float(args.ct)
### TODO: Initialize the Inference Engine
plugin = Network()
### TODO: Load the network model into the IE
plugin.load_model(args.m, args.d, CPU_EXTENSION)
net_input_shape = plugin.get_input_shape()
# Get and open video capture
cap = cv2.VideoCapture(args.i)
cap.open(args.i)
# Grab the shape of the input
width = int(cap.get(3))
height = int(cap.get(4))
# Create a video writer for the output video
# The second argument should be `cv2.VideoWriter_fourcc('M','J','P','G')`
# on Mac, and `0x00000021` on Linux
out = cv2.VideoWriter('out.mp4', 0x00000021, 30, (width,height))
# Process frames until the video ends, or process is exited
while cap.isOpened():
# Read the next frame
flag, frame = cap.read()
if not flag:
break
key_pressed = cv2.waitKey(60)
### TODO: Pre-process the frame
p_frame = cv2.resize(frame, (net_input_shape[3], net_input_shape[2]))
p_frame = p_frame.transpose((2,0,1))
p_frame = p_frame.reshape(1, *p_frame.shape)
### TODO: Perform inference on the frame
plugin.async_inference(p_frame)
### TODO: Get the output of inference
if plugin.wait() == 0:
result = plugin.extract_output()
### TODO: Update the frame to include detected bounding boxes
frame = draw_boxes(frame, result, args, width, height)
# Write out the frame
out.write(frame)
# Break if escape key pressed
if key_pressed == 27:
break
# Release the out writer, capture, and destroy any OpenCV windows
out.release()
cap.release()
cv2.destroyAllWindows()
def main():
args = get_args()
infer_on_video(args)
if __name__ == "__main__":
main()
=============================================interface.py
'''
Contains code for working with the Inference Engine.
You'll learn how to implement this code and more in
the related lesson on the topic.
'''
import os
import sys
import logging as log
from openvino.inference_engine import IENetwork, IECore
class Network:
'''
Load and store information for working with the Inference Engine,
and any loaded models.
'''
def __init__(self):
self.plugin = None
self.network = None
self.input_blob = None
self.output_blob = None
self.exec_network = None
self.infer_request = None
def load_model(self, model, device="CPU", cpu_extension=None):
'''
Load the model given IR files.
Defaults to CPU as device for use in the workspace.
Synchronous requests made within.
'''
model_xml = model
model_bin = os.path.splitext(model_xml)[0] + ".bin"
# Initialize the plugin
self.plugin = IECore()
# Add a CPU extension, if applicable
if cpu_extension and "CPU" in device:
self.plugin.add_extension(cpu_extension, device)
# Read the IR as a IENetwork
self.network = IENetwork(model=model_xml, weights=model_bin)
# Load the IENetwork into the plugin
self.exec_network = self.plugin.load_network(self.network, device)
# Get the input layer
self.input_blob = next(iter(self.network.inputs))
self.output_blob = next(iter(self.network.outputs))
return
def get_input_shape(self):
'''
Gets the input shape of the network
'''
return self.network.inputs[self.input_blob].shape
def async_inference(self, image):
'''
Makes an asynchronous inference request, given an input image.
'''
self.exec_network.start_async(request_id=0,
inputs={self.input_blob: image})
return
def wait(self):
'''
Checks the status of the inference request.
'''
status = self.exec_network.requests[0].wait(-1)
return status
def extract_output(self):
'''
Returns a list of the results for the output layer of the network.
'''
return self.exec_network.requests[0].outputs[self.output_blob]
======================================
The second is just a small adjustment to the `cv2.rectangle` function that draws the
bounding boxes we found to be above `args.ct`. I actually added a function to match
the different potential colors up to their RGB values first, due to how I took them in from the
command line:
```
def convert_color(color_string):
'''
Get the BGR value of the desired bounding box color.
Defaults to Blue if an invalid color is given.
'''
colors = {"BLUE": (255,0,0), "GREEN": (0,255,0), "RED": (0,0,255)}
out_color = colors.get(color_string)
if out_color:
return out_color
else:
return colors['BLUE']
```
I can also add the tuple returned from this function as an additional `color` argument to feed to
`draw_boxes`.
Then, the line where the bounding boxes are drawn becomes:
```
cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color, 1)
```
I was able to run my app, if I was using the converted TF model from earlier (and placed in the
current directory), using the below:
```bash
python app.py -m frozen_inference_graph.xml
```
Or, if I added additional customization with a confidence threshold of 0.6 and blue boxes:
```bash
python app.py -m frozen_inference_graph.xml -ct 0.6 -c BLUE
```
[Note that I placed my customized app actually in `app-custom.py`]