OpenMV IDE v2.0.0 is now here! The new IDE release brings OpenMV Cam Firmware v3.0.0 which includes CNN support and we've added the CNNs we've made so far to the IDE (Tools -> Machine Vision -> CNN Library)! We'll be expanding the CNN library in the future with all kinds of classifiers using open training data sources and we'll release our scripts that we use for this so you can train your own classifiers too.
We're very happy that ARM released the CMSIS-NN library. Future major features released for the OpenMV Cam will be done through CNNs versus traditional machine vision algorithms (however, I will make an effort to port a few more traditional algorithms). In particular, CNN support allows you to build a classier to do whatever you need without waiting for us to add support to the firmware for an image processing feature.
OpenMV Cam H7 KickStarter
We're finally ready to start working on the OpenMV Cam H7 KickStarter! We've finished getting all drivers working and verifying all functionality for the OpenMV Cam H7! From now until the campaign launches in September we're going to focus on website updates and a media campaign to get the word out.
For the new OpenMV Cam H7 we focused on getting these big new features working:
#1 Global Shutter Support
We've had a lot of inquires about global shutter support. So, it's now done and working. Cheap global shutter imaging for everyone is coming soon. Also, note that you can indeed use the OpenMV Cam as a webcam to a Pi if you just need the global shutter feature and nothing else. Thanks to the on board JPEG compression hardware we can stream VGA images at 20 FPS over the 12 Mb/s USB 2.0 bus. If you lower the res you can also stream RAW RGB565 or Grayscale images quickly too. Best of all, the OpenMV Cam can expose the image on-demand with global shutter trigger control when you call snapshot().
#2 FLIR Thermal Imaging Support
Thermal vision makes some things easy. Need to detect a person in a room? With a FLIR sensor the problem just becomes as simple as blob tracking. With the OpenMV Cam H7 you'll now be able to use the FLIR 1/2/3 with your OpenMV Cam. Our drivers seamlessly work with any FLIR sensor allowing you to focus on writing some very simple blob tracking code to detect movement or etc. And like with global shutter imaging above, you can also use the OpenMV Cam with a Pi.
#3 Re-loadable CNNs
As mentioned at the beginning of this post, the OpenMV Cam M7/H7 now support loading CNNs trained on your PC. The networks are stored in binary files that your OpenMV Cam can easily load off of an SD card or from it's internal flash.
We'll be focusing on updating the website first followed by getting the KickStarter setup and a video produced in the next month. Our goal is to launch the KickStarter in September. In the mean-time, please let folks know about what's coming and to follow us on Twitter, YouTube, and sign-up to our mailing list!
Edge computing helps reduce latency and power consumption by performing signal processing at the source of the data. Recently ARM released the CMSIS-NN library, an efficient neural network library optimized for the Cortex-M based microcontrollers. The CMSIS-NN library (pronounced Sim-Sis) brings deep learning to low-power microcontrollers such as the Cortex-M7 based OpenMV camera. In this blog post we'll go through training a custom neural network using Caffe on a PC and deploying the network on the OpenMV Cam.
The CMSIS-NN library consists of a number of optimized neural network functions using SIMD and DSP instructions, separable convolutions, etc... and most importantly, it supports 8-bit fixed point representation. Using fixed-point avoids costly floating point operations, reduces the memory footprint, and uses less power when running inference. However, this means that models have to be quantized first before using them with CMSIS-NN.
Quantization, simply put, is the mapping of a range of numbers to a more compact range of numbers, or in this case the mapping of 32-bits floats to 8-bit fixed-point numbers. The hardest part about quantizing models is finding the min and max ranges of layer inputs/outputs to evenly distribute the floating point values across an 8-bit representation. Fortunately, ARM also provides a script to quantize Caffe model weights and activations. If you're interested in more details about the CMSIS-NN library and the quantization process please see this paper published by ARM researchers.
The default CMSIS-NN library comes with a CNN example trained on the CIFAR-10 dataset. However, this example is hard-coded, meaning it must compiled and linked with the main application. So, we extended the CMSIS-NN library and supporting scripts. Our code allows users to convert Caffe models to a quantized binary format which can be loaded from the file-system (SD Card or internal flash) at run-time. Additionally, our code takes care of preprocessing the input image, subtracting the mean, and scaling the data if required.
In the next sections of this blog post I'll demonstrate how to use the CMSIS-NN library with the OpenMV camera to train a simple CNN model on a smile detection dataset. The model achieves ~93% accuracy and the camera consumes about 150mA @ 3.3V while running the network.
Training a network with Caffe:
First, if you're just getting started with neural networks and Caffe I highly recommend this tutorial. Note that you should keep in mind the CMSIS-NN library and conversion scripts have limitations on the number and types of supported layers - so your model should be simple.
The smile dataset that we used can be found here. The dataset consists of ~3000 positive images and ~9000 negative images. We need the number of positive and negative images to be close otherwise the network will biased to a class (class imbalance). To fix this we can augment the dataset using this augmentation script on the positive images to increase the number of positive examples by 4x. The image augmentation script can be used like this:
You can use any deep learning library to train the network. However, if you're not using Caffe you need to convert the network output to a Caffe format to work with the ARM scripts. In the future ARM will provide more conversion scripts to accept models from TensorFlow, etc.
Quantizing the model:
The first step after training the network is to use the quantization script provided by ARM to convert the Caffe model weights and activations from floating point to fixed point format. As mentioned before, quantization is performed to reduce the size of the network and avoid floating point computations.
The nn_quantizer script works by testing the network and figuring out the best format for the dynamic fixed point representation. The output of this script is a serialized Python (.pkl) file which includes the network's model, quantized weights and activations, and the quantization format of each layer. Running this command generates the quantized model:
The next step is to use our NN converter scriptto convert the model into a binary format runnable by the OpenMV Cam. The converter script outputs a code for each layer type followed by the layer's dimensions and weights (if any).
On the OpenMV Cam our firmware reads the binary file and builds the network in memory using a linked-list like structure.
While it's possible to slide the detection window over the entire image doing so would be very slow. Instead we use the built-in Haar cascade face detector to extract faces from the image, and then pass the region of interest (ROI) to the CNN to detect smiles. The first part of the smile detection code loads the network into memory and loads the built-in face detection Haar Cascade.
# Load Smile Detection network net = nn.load('/smile.network')
# Load Face Detection Haar Cascade face_cascade = image.HaarCascade("frontalface", stages=25) print(face_cascade)
The next step is capturing a snapshot and finding all the faces.
Finally, for each detected face, the region of interest is slightly cropped and passed to the neural network. Note that the smile detection network is trained on tightly cropped faces so we have to reduce the size of the ROI.
# Detect smiles for r in objects: # Resize and center detection area r = [r+10, r+25, int(r*0.70), int(r*0.70)] out = net.forward(img, roi=r, softmax=True) img.draw_string(r, r, ':)' if (out > 0.8) else ':(', color=0, scale=2)
The OpenMV Cam uses the Cortex-M7 processor without any external DRAM attached and only uses the internal SRAM. At any point in time the processor can go into low-power mode drawing about 50 uA while maintaining all state, then wake-up again on an interrupt, take a picture, and run the neural network before turning off again.
For example, on the upcoming OpenMV Cam H7 we're able to run a Lenet-6 CNN trained on the MNIST data set at 50 FPS while using only 3mA @ 3.3V per inference. With a 1Ah 3.7V Lipo battery you can deploy a CNN in the field running every minute that would last for over a year.
In particular, CNN support on Cortex-M7 processors lines up particularly well with deploying smart thermal vision based cameras that are able to detect people accurately from low-resolution thermal vision images. Smart sensors powered by CMSIS-NN and Cortex-M7 processors are coming soon!
For a MNIST demonstration checkout the video below:
You can grab the latest firmware (v3.0.0) from our GitHub here. Example scripts are here. And the models are here. The OpenMV IDE release to package everything up is coming soon.