Skip to content
Call us for support

1800 123 456

$0.00 USD

Giant TensorFlow Update!

Posted by Kwabena Agyeman on

Hi Everyone!

Get ready! Firmware v4.5.6 is out which unlocks tons of new capabilities along with OpenMV IDE v4.2.0.

Introducing the new ML Module

At the start of this year we looked into getting YOLO running on the OpenMV Cam H7 and OpenMV Cam RT1062. We were able to get some okay results with models running at 2-3 FPS on the MCUs. While not impressive, this is actually quite usable for industrial applications which are only taking one frame a minute or so. However, we realized we had a problem. There are multiple different versions of the YOLO network, each requiring slightly different pre-processing and post-processing. As we looked into the issue we realized that even if we added support in our C code to handle all of them... sometimes folks may have to customize the networks to avoid problematic operators meaning you can't even count on known networks to have a known output value range.

This was conundrum, moving forwards we wanted to support running more networks, but, models like YOLOV5 output 6300 detected bounding box tuples of (xmin, ymin, xmax, ymax, score, class[n]). Post-processing this is C was a no-brainer, but, if our post-processing code was in C then we'd have a fixed set of networks we could support.

NumPy to the Rescue

The solution to the problem lies in leveraging how people do this on the desktop in Python. They use NumPy! Thanks to the hard work of Zoltán Vörös an implementation of NumPy for MicroPython exists. We've actually been building the library into our firmware for quite a while. However, it wasn't until we ran into the problem of wanting to move post-processing Model outputs in Python on an MCU (wild right?) that we realized we had the solution to our problems at our fingertips.

Without NumPy, processing the above list of tuples in Python takes roughly 0.5ms per bounding box who's score is above a threshold - 3.15 seconds if all are valid!!! But, with NumPy you can write code like this:

indices = np.nonzero(np.asarray(yolo_output[:, 4] > 0.5))

Which creates a new NumPy array of all the valid bounding box indices with a score above 0.5 which can be used to select from the previous array. Because the above code runs in a tight loop in C it's more 50x faster than Python and is eligible to be speedup in the future using SIMD.

What about RAM?

While using NumPy to post-process a YOLO model output sounds great so far, what about RAM usage? 6300 * (5 + N) * float32 = Hundreds of KB to store the output Tensor. Our current Heap size of nearly 256 KB won't cut it for this.

However, a while back MicroPython introduced a new feature called Heap Blocks which allows you to create the MicroPython heap out of multiple blocks of RAM that are treated as one. Using this feature we were able to significantly increase the size of the heap on all OpenMV Cams while maintaining speed. We're able to do this by ordering the smaller on-chip heap blocks first followed by larger off-chip heap blocks in SDRAM (if a board has SDRAM). MicroPython then will allocate objects on-chip if it can and only off-chip is there's no room.

With this new feature the Arduino Portenta H7 and Arduino Giga have a 2.5MB+ Heap, the OpenMV Cam H7 Plus now has a 4MB+ Heap, and the OpenMV Cam RT1062 and Pure Thermal have an 8MB+ Heap. And, on the OpenMV Cam H7 we 1.2x'ed the Heap and 2x'ed the Heap on the Arduino Nicla. With these changes you have enough Heap on each system to run Classification Models, FOMO Models from Edge Impulse, and more with post-processing in Python.

Multi-Input and Multi-Output Support

While NumPy and Heap Block support unblock supporting single input and output models and post-processing their output in Python, lots of object detection models have multiple output tensors. TensorFlow Lite for Microcontrollers has always had support for this, but we previously never exposed it.

That is until now.

Putting it all Together

So, we've made the following major changes in firmware v4.5.6:

  1. We updated our TensorFlow Lite for Microcontrollers library to the latest and totally updated our build system process.
  2. Thanks to the new build system we were able to significantly reduce the size of the library allowing us to enable EVERY OPERATOR in TensorFlow Lite for Microcontrollers on all OpenMV Cams.
  3. Thanks to the new larger heaps we now keep the Tensor Arena on the Heap versus in the Frame Buffer stack. This allows models to maintain state across inference calls.
  4. Now that the Tensor Arena is on the heap you can safely create NumPy array objects that reference each input and output Tensor. This gives you controller objects that let you manipulate the Model inputs and outputs easily in Python without any additional giant memory allocations.
  5. Leveraging NumPy arrays for Model Inputs and Outputs, we were then able to fully vectorize the Model itself so that we can support Mutli-Input and Multi-Output Models. So, you could say, we now support Mutli-Modal Models o_O.
  6. And finally, we increased the number of dimensions that NumPy arrays support from 2 to 4 in our firmware. This enables us to directly share Tensor shapes from TensorFlow with NumPy and vice-versa without making any assumptions in our code about input/output shapes.

When we started this refactoring work, I didn't think that we'd end up doing what we've done. But, I'm super glad we did. This new framework sets the stage for you to run any TensorFlow model on the OpenMV Cam using MicroPython.

That said, these changes are breaking changes, so, we've published a porting guide for how to deal with them. 

What can you do with this power?

Ibrahim refactored MicroSpeech during this update. The examples now work again. In particular, he was able to move all of the code from C into Python thanks to using NumPy.

First, TensorFlow folks moved all the audio processing for the keyword spotting network into a TensorFlow Lite Model itself. This new network does the following:

  1. Audio frame input with shape (1, 480)
  2. Apply Hann Window smoothing using SignalWindow
  3. Reshape tensor to match the input of SignalFftAutoScale
  4. Rescale tensor data using SignalFftAutoScale and calculate one of the input parameters to SignalFilterBankSquareRoot
  5. Compute FFT using SignalRfft
  6. Compute power spectrum using SignalEnergy. The tensor data is only updated for elements between [start_index, end_index).
  7. The Cast, StridedSlice, and Concatenation operations are used to fill the tensor data with zeros, for elements outside of [start_index, end_index)
  8. Compress the power spectrum tensor data into just 40 channels (frequency bands) using SignalFilterBank
  9. Scale down the tensor data using SignalFilterBankSquareRoot
  10. Apply noise reduction using SignalFilterBankSpectralSubtraction
  11. Apply gain control using SignalPCAN
  12. Scale down the tensor data using SignalFilterBankLog
  13. The remaining operations perform additional legacy down-scaling and convert the tensor data to int8
  14. Model output has shape (40,)

It just needs to be feed with a 480 audio samples from a 16 KHz audio source (e.g. a Mic). However, the network must be run every 320 new samples with 160 of them being from the old time slice. Thanks to NumPy this is easy to handle:

# Roll the audio buffer to the left, and add the new samples.
self.audio_buffer = np.roll(self.audio_buffer, -(_SAMPLES_PER_STEP * 2), axis=1)
self.audio_buffer[0, _SAMPLES_PER_STEP:] = np.frombuffer(buf, dtype=np.int16)


Where "buf" above is a buffer of 320 int16_t audio samples from the Mic in a callback that's being executed at 50 times a second. Plenty of time for MicroPython to handle things. The preprocessor network is then passed the rolling audio microphone buffer and we collect the preprocessor outputs in another rolling buffer:

# Roll the spectrogram to the left and add the new slice.
self.spectrogram = np.roll(self.spectrogram, -_SLICE_SIZE, axis=1)
self.spectrogram[0, -_SLICE_SIZE:] = self.preprocessor.predict([self.audio_buffer])


Finally, the keyword spotting network itself is then invoked after 49 of these FFT chunks have been assembled. The model is run each time another FFT chunk is available:

# Roll the prediction history and add the new prediction.
self.pred_history = np.roll(self.pred_history, -1, axis=0)
self.pred_history[-1] = self.micro_speech.predict([self.spectrogram])[0]


Now, the output of the model is a vector of class scores for things like "yes", "no", "noise", etc. We collect the Model's guesses over the course of a second and take the average of each row of class score guesses to get an idea of on average what the model is thinking for each type of keyword it's looking for.

average_scores = np.mean(self.pred_history, axis=0)
max_score_index = np.argmax(average_scores)
max_score = average_scores[max_score_index]
label = self.labels[max_score_index]

np.mean() in this case is taking the mean of multiple rows of data where each row represents the model's class scores. argmax() then takes the mean of each row of data and returns which row has the max value. This is then used to extract the score/label of what the network is thinking.

Anyway, congratulations if you followed the above. If you didn't the TLDR is that tens of thousands of lines of C code are now doable with TensorFlow networks and some NumPy code to stich everything together - this is the power we are talking about.

In Summary

Get excited! We're only going to be expanding capabilities and improving performance from here. Also, thanks to ChatGPT, writing NumPy code has never been easier too! It will do it for you!

Other Stuff

Okay, before I finish, we've posted cases online now for the OpenMV Cam M4, M7, H7, H7 Plus, and the RT1062. These cases have everything - wall mounts, battery compartments, GoPro adapters, and Tripod mounts.

We've also finished posting case designs for your OpenMV Cam when using the LCD Shield and PoE Shield.

Finally, we're starting to post covers now for all the shields too! Soon, you will be able to 3D print whatever enclosure you need for your OpenMV Cam.

That's all folks! You made it to the end of this massive blog post!