Home / News

Blog - News

The Path to Performance - Part 1

Hi folks,

Normally I'm more focused on adding new features than blogging. But, we made a lot of changes under to hood to enable triple buffering for image capture and I'd like to share! But, first...

PureThermal OpenMV Campaign Updated!

GroupGets updated the PureThermal OpenMV Cam campaign! There's a lot more information on the product available now. The complete feature list, schematic, board outline, and more are posted! We put a lot of love and work into the product and would love if you could back it! It's the most feature packed OpenMV Cam to date. Please watch the video of me explaining it below:

Right now the price is a little high, but, once the chip shortage eases (no joke parts are more than 2x more expensive right now) and production ramps up we hope to drive the price of this feature packed system down. Please back the PureThermal OpenMV today!

Now, time for a deep dive on the technical topic for today.

Part 1 - Memory Alignment

When we first started developing the OpenMV Cam we were working on the STM32F4 architecture. Like other microcontrollers the architecture features the Cortex-M4 at the heart of the system. The Cortex-M4 is a straight forward processor which can read/write 8/16/32-bits at a time without side affects making coding with it easy. What you program is what you get. So much so that we developed most of our original code with with the assumption that we just needed to maintain 4-byte alignment when allocating memory - or maybe 8 to support 64-bit values.

Enter DMA (Direct Memory Access)

DMA Controllers are a tricky beasts. Our original firmware until recently avoided using them. If you've been programming microcontroller firmware lately you've probably avoided using them too. They are overkill for most applications - the processor can generally do all that you need to do. But, if you've been avoiding them too like we were then you would have been leaving a massive amount of performance on the table. Using a DMA controller in your application is not straight forward. There's a big challenge you need to solve first that will trip you up indefinitely - memory alignment.

On the STM32 line of microcontrollers DMA Controllers have 16-byte deep FIFOs that can hold four 32-bit values, eight 16-bit values, or 16 bytes. The DMA controllers work by filling their FIFOs with data coming from a peripheral like the camera interface 32-bits at a time before flushing their internal FIFO to memory. Now, the DMA Controllers are not sophisticated. They work on the system bus level of the hardware. Meaning, they will not automagically abstract away complexity like the Cortex-M4 processor does to make your life easier. In particular, there are two rules you must follow when using the STM32 DMA Controllers:

  1. The AHB bus allows burst transactions of 4, 8, or 16 beats (these are the most efficient types of transactions as an address arbitration per element can cut a bus bandwidth in half or more). This matches directly to the four 32-bit values, eight 16-bit values, or 16 bytes that the DMA Controller's FIFO can hold. So, to get the best performance you're going to want to make your data buffer a multiple of the above values... which is 16-bytes.
  2. The AHB bus wraps all burst transactions at 1KB boundaries. To avoid this from happening we must ensure that we never allow a burst transaction to cross the 1KB boundary. Luckily this is pretty simple since we are always transferring 16-bytes... so we just need to ensure that our memory buffer is 16-byte address aligned and this will ensure we never cross a 1KB boundary.

If you follow the two rules above, then DMA is easy-to-use on the STM32. It will work as expected without much fuss. But... this is easier said than done though as if you've been developing lots of code without respect for these two rules then you're going to be in for a lot of work like we were when trying to turn DMA on.

Enter the Cortex-M7 and the Cache

But, the OpenMV Cam M7/H7 are powered by the Cortex-M7 which features a cache! The cache automagically makes your code run a lot faster - but, using it with DMA is challenging. Because, while it hides a lot of system complexity from you it does not play nicely with DMA hardware.

The cache on the Cortex-M7 works by reading/writing cache lines which are 32-bytes in size. Note that it can only read/write cache lines. So, anytime it reads/writes it will always be a 32-byte chunk address aligned to a 32-bytes.

Additionally, as a cache, by definition it only reads main memory when something is not already in the cache and it only writes to main memory when it has to flush a line (or lines) from the cache. So, DMA updates to main memory are invisible unless you invalidate the cache covering the memory buffer DMA is writing to forcing the cache to read the updated memory. Similarly, processor writes will be invisible to DMA unless you flush the cache to the memory buffer DMA is to read. While more complex microprocessors have cache coherency built into the hardware to handle this for you the Cortex-M7 does not so you must deal with it yourself.

Anyway, given the cache line rule, we must again extend our memory allocation requirements. Which is, memory buffers must be multiplies of 32-bytes in size and 32-byte address aligned. If you follow this rule then working with the Cortex-M7 and DMA is a breeze. Things will just work!

And... if you don't you will experience some of the most challenging bugs created by race conditions between the cache and DMA Controllers in your code.

Next Week - DMA Buffer Locality

Did you know the STM32H7 is a SoC (system-on-a-chip)? Next week we'll cover DMA buffer locality and it's affect on performance.

Thanks for reading, that's all folks!

 

 

 

 

 

Introducing the OpenMV Cam Pure Thermal

Hi everybody,

We've got a lot of news today:

Introducing The OpenMV Cam Pure Thermal

You can now pre-order the OpenMV Cam Pure Thermal. It's out latest OpenMV Cam which pushes the STM32H7 architecture to the max! It's so good Hackster even did a Spotlight video about it!

The OpenMV Cam Pure Thermal allows you to create mixed color and thermal camera applications for hobbyist or professional uses. It's a great product which we've spent a lot of time developing.

If you've like what we've done with the OpenMV Cam and are a supporter of the project please pre-order the OpenMV Cam Pure Thermal. Your support will enable the creation of future products like support for the FLIR Boson, Stereo Thermal Vision, and FPV Thermal Vision.

Firmware Version 4.0.0 is Released!

Alright, this is a big one folks. Our latest firmware is available for download. We've brought some big features folks have been waiting for years out now:

  • Double/Triple/Video Buffering
  • Bayer Image Processing Support
  • Usable WiFi Debug Support

With Double/Triple/Video Buffering Support all OpenMV Cams (M4/M7/H7) now can process and capture images simultaneously. This feature is enabled automatically for all OpenMV Cams meaning once you install firmware 4.0.0 frame rate limited algorithms will automatically experience a massive increase in speed.

Additionally, for video recording we now support allocating an arbitrary number of frame buffers. Just do sensor.set_framebuffers(10) for example to allocate a 10 frame buffer FIFO to handle the random erase delays caused by an SD Card when recording video.

By default triple buffering is enabled which offers the best best performance. In this mode the interrupt driven frame grabber always has a buffer to store data to while your code is processing one of the frame buffers. However, if you don't have enough RAM for this you can downgrade to double buffering which captures an image in the background while processing the current image. Double buffering only provides increased performance though as long as your algorithm can process images faster than new frames from the camera are generated. Finally, single buffer mode still works which your OpenMV Cam will default to if it can't fit multiple buffers without using up more than half of the RAM onboard.

Last, we finally got the Master DMA hardware onboard the STM32H7 working which 100% offloads the processor from copying image data from camera line buffers to the finally image. This further improves the frame rate on the STM32H7 as the processor no longer has to copy the image data. And... we have further speed performance improvements coming soon. Right now the processor has to handle an interrupt per line from the camera in-order to setup the MDMA transfer. But, we should be able to get MDMA working to completely offload the CPU so that we only get an interrupt per image.

And... we also have non-blocking snapshot support finally. You can register an interrupt handler to be called when an image is ready or poll a flag to let you know when there's a frame available.

Bayer Image Processing Support

Now you can do more things than just look at Bayer images in the IDE. We're going to be slowly integrating Bayer Image support to all functions which don't modify the image in our API. We'll also be introducing JPEG image processing soon too - with hardware decompression acceleration on the STM32H7.

WiFi Debug Support

Finally, we've got usable WiFi Debug support. This means you can FINALLY CONTROL YOUR OPENMV CAM WIRELESSLY FROM YOUR PC USING OPENMV IDE WITH A WIFI SHIELD!!! Please watch the how-to video below:

This feature will be coming soon to the Arduino Portenta H7 WiFi/Ethernet once Arduino asks for it.

Manufacturing Update

So, we've run out of OpenMV Cams in stock to sell on our store. Unfortunately, the chip shortage has made it impossible to buy the STM32H743VIT6. However, we managed to get 3K for building more OpenMV Cam H7 units and we've ordered 8K STM32H743II chips which we expect 2K to be delivered in a few months and the rest to arrive in October.

In the mean-time, we are going to be building OpenMV Cam H7 Plus units using the STM32H743XI package to get something in stock to sell. So, you may see a different hardware variant of the OpenMV Cam H7 Plus sold. Similarly, we are also making a design for the OpenMV Cam H7 with the STM32H743VIH6 (BGA versus LQFP) to make it easier to keep the OpenMV Cam H7 in stock too.

However, while we should be able to get and build OpenMV Cams using different STM32H743 chip variants the OV7725 was EoLed by OmniVision and is no-longer going to be produced. Given this we will be switching the main camera to the MT9M114. New OpenMV Cam H7 units will be called the OpenMV Cam H7 R2. Similarly, the OV5640 is nearing end-of-life and we will be working on finding a replacement for it. There are a lot of FPC modules on the market however for this chip so it's less of a problem sourcing it than the OV7725.

IDE and Docs

We're working on updating the docs for firmware 4.0.0 and we'll be releasing an IDE update soon with new editing features allowing you to finally have multiple python scripts open at the same time. We hope to have a new IDE release out in a few weeks.

Anyway, that's all folks!

Please back the OpenMV Cam Pure Thermal! We need your support!

OpenMV IDE 2.6.9 Released!

Hi everybody,

OpenMV IDE 2.6.9 is out! It includes several bug fixes to the usability of the IDE and the latest firmware, 3.9.2!

Firmware 3.9.2 Big Features

    • Bilinear/Bicubic scaling is now supported for upscaling all low-resolution sensors (along with color table support, better alpha blending control, and etc. - all thanks to our new scaling pipeline).
    • Support for the MLX90641 was added. The OpenMV Cam's firmware now supports the MLX90621, MLX90640, MLX90641, and AMG8833.
    • And... support for the FLIR Lepton 1, 2, 2.5, 3, 3.5 has been added to the FIR module allowing you connect an external FLIR Lepton sensor to your OpenMV Cam using a Lepton breakout board.
      • This allows you to do dual thermal and normal vision on any OpenMV Cam (even the old M4!).
      1. We also completely overhauled the TV driver. It now supports the 352x240 SIF NSTC resolution at a 60 Hz update rate. Additionally, it now has all the same scaling features as the updated LCD code - like triple buffering and bicubic/bilinear scaling. This makes the TV driver significantly more useful for wireless display output.
      2. And various bug fixes.
      3. Here are two videos about the new features:

         

        We are also continuing to integrate the new scaling pipeline into more and more functions in the firmware. to_bitmap(), to_grayscale(), to_rgb565(), to_rainbow(), copy(), crop(), scale() were all updated with the new pipeline bringing bicubic/bilinear scaling to these features along with rgb565 channel extraction and color table support.

        What's next?

        The LCD, FIR, and TV module were all redone to support the launch of the OpenMV Pure Thermal coming later this year. We are just waiting on the second revision of the board to come back for testing before launching the funding campaign for it. A teaser is below:

        In the mean-time, we will be adding support for Embedded Display Port out over the Arduino Portenta's USB connector using the STM32H7's MIPI DSI controller next and then finally adding support for triple buffering in the camera sensor driver. Once triple buffering is integrated you will instantly see a massive boot in your algorithm FPS as the processor will no longer ever have to wait for a frame to be received. On the STM32F4 and STM32F7 based OpenMV Cams this will result in a rather high  number of interrupts per second in the 50K+ range to handle data... so, it's not going to be advised on older models. However, on the STM32H7, thanks to the MDMA hardware, we will be able to enable triple buffering of the camera data stream with nearly zero CPU load.

        Anyway, that's all folks!