Home / News

Blog - News

Production Update

Hi folks!

We've got some exciting news!

OpenMV Cam H7 Update

Okay, first for the OpenMV Cam H7 we completed a new sensor board for the MT9M114 and a high performance software driver control it. The new sensor offers much improved image quality compared to the OV7725 along with a higher resolution. We've also changed the lens from 2.8mm to 2.1mm to keep the field of view the same as original OpenMV Cam H7. The sensor FPS hits about 80 in bright environments and 40 in dark rooms. Higher frame rates up to 120 FPS can be unlocked by setting the readout window (but, this crops the field-of-view).

It was MUCH easier to write the driver for this sensor compared to the OV5640 as we had access to extensive documentation from OnSemi along with a much improved camera sensor driver that's able to handle the data bandwidth and line rate of the sensor with full processor offload. The MT9M114 like the OV5640 has a databus that runs at 80 MHz which is the maximum limit of the STM32H7's DCMI hardware and stresses the STM32H7's memory architecture.

Anyway, production is unblocked for the OpenMV Cam H7. We will be building 3240 units. However, about 2176 have already sold. We will only have about 1K units left in general inventory for folks to buy.

However our manufacture managed to secure parts to produce another 2160 OpenMV Cam H7 units last year in anticipation of the chip shortage.

So:

  • If you are interested in buying OpenMV Cam H7 units please send us an email now to pre-order stock if you need to buy 100+ units. For the remaining 2176 units we need to produce we will not be able to ship them to customers except in bulk of 100+ units.
  • We do not have access to anymore STM32H743VIT6 chips and have no idea on when we can produce anymore OpenMV Cam H7 units after the above stock runs out. If you want the OpenMV Cam H7 you should pre-order now. Once the above sellout we will not likely re-stock until the chip shortage eases.

The new system will be called the OpenMV Cam H7 R2. We will have the product updated on the website shortly. Right now the OpenMV Cam H7 product itself can be pre-ordered which will be switched to the R2. To avoid any functional surprises only the camera module was changed. The base board remains the same.

OpenMV Cam H7 Plus Update

For the H7 Plus we were promised 6K STM32H743II chips in June. These orders have fallen through. Given this, we hope to receive 2K more chips from another distributor in October.

So... the H7 Plus is going to remain out of stock. Note that 600 units of the above 2K have already sold. If you are interested in pre-ordering the H7 plus you should do so now to lock yourself in-line for the 2K supply of chips we hope to receive.

OpenMV Cam H7 Plus Alt

We tried to purchase different packages of the STM32H7 chip to keep producing the OpenMV Cam H7 Plus. However, while we placed a complete down payment on 1K chips with a different package for delivery in June... our order was cancelled and moved to a delivery date next year in 2022. So, I repeat that the above 2K we hope to receive in October is all that we will receive. If you go onto FindChips and search for the STM32H743 everything is out of stock of all packages.

Firmware Update

Firmware v4.0.2 has now been pushed to the IDE for download! Full MDMA offload support with 0% processor overhead is now working on all H7 based OpenMV Cams. With firmware 4.0.2 you get the benefits of triple buffering in the camera driver along with MDMA doing all the work receiving the frame.

Additionally, we finally added frame rate control via set_framerate(). Now if you want to set the camera FPS to exactly 30 Hz you can. The feature works by selectively dropping frames in the camera driver. And if possible, it will also adjust the camera sensor FPS to save power and improve image quality by increasing the exposure.

IDE Update

We've been trying to release a new version of OpenMV IDE with support for non-stm32 based microcontrollers. Our focus will be on this now. Also! Tabs are coming to OpenMV IDE! Along with a lot of other new features. We're going to focus on trying to get an IDE release out in the next couple of weeks followed by a second release with a lot of new editing features like split panel text editors and code minimap support.

Anyway, that's all folks! And... please pre-order the OpenMV Cam H7 or send us your bulk purchase order requests via email. We do not have a timeline for more stock once the above units run out - which can happen quickly. Our store was mobbed back in March with customers buying up everything.

 

 

 

The Path to Performance - Part 3

Hi Folks,

All right! Three blog posts in three weeks! Who knew I had the time.

Part 3 - The Master DMA Controller

Last week I talked about the importance of DMA memory buffer locality and putting the DMA memory buffers in SRAM in the same AHB matrix region as the DMA controller. However, SRAMs are limited in size. How can you move chunks of data in the SRAM buffer into a larger buffer in SDRAM? Certainty you can use the processor, but, with the STM32H7 you have access to the Master DMA controller to do this for you without loading the CPU.

The Master DMA (MDMA) controller is a high performance DMA controller on the STM32H7 with A LOT more functionality for data movement than the standard DMA controllers on STM32 Microcontrollers. In particular, it can trigger off of other DMA controllers when they finish moving data allowing it to act like a mini-processor that's interrupted to memcpy() data from one buffer to another.

How we use it on the STM32H7

Like many pieces of hardware on the STM32H7, explaining what modules can do doesn't really give any insight on how to use them. So, instead, I'll explain how our camera driver achieves 100% image capture offload for the CPU using MDMA. Buckle up!

Line Capture using DMA

There's quite a bit of complexity in our new camera image capture driver. We've really pushed it to the max this last year. But, it's pretty straight forward to explain how we used MDMA here. First, we use a DMA controller to receive lines of pixels coming from the camera. Lines are loaded into the same memory buffer at the same starting address over-and-over again.

DCMI to Line Buffer

As mentioned in previous blog posts, we have to follow all the memory alignment and data size rules for the DMA controllers here. So, the line buffer is 16-byte aligned and we are moving the image in 16-byte chunks to keep the DMA controller happy (however, per-line we only have to be 4-byte aligned but the total image must be 16-byte aligned).

That said, to enable architecturally efficient cropping when you want to crop more than 4-bytes worth of pixels per line, we tell the DCMI hardware to drop the first 1-3 bytes of each line to ensure that the first pixel of the cropped image is on a 32-bit boundary. Once this is done we can then just change the starting address of where we want to grab pixels from to crop the image with MDMA while being able to maintain 4-byte alignment which is critical for keeping performance up on the 32-bit AHB bus.

DCMI hardware also takes care of vertically cropping the image too by dropping lines before a starting line and after an ending line. So, along with the above trick image cropping is fully offloaded to the hardware.

MDMA to Frame Buffer

MDMA to Frame Buffer

Next, MDMA moves each line of the image to the frame buffer. As mentioned above, to handle cropping we simply program MDMA with the number of lines it needs to move, the size of each line, and a starting address offset into the line buffer. It then takes care of the rest by triggering off each time DMA2 completes a line transfer of the image. Once the MDMA Controller is done it generates a transfer complete interrupt to let us know it's finished writing the image!

Now, the real magic with MDMA is in its memcpy() features. Image data isn't directly usable all the time. In particular, some cameras send us byte reversed RGB565 pixels that the processor would normally have to byte-un-reverse. But, the STM32H7 designers foresaw this issue and gave the MDMA controller the ability to byte-reverse, half-word reverse, and word reverse the data it's moving!

Next, sometimes we have to extract the Y channel from YUV422 images to get a grayscale image out of the camera. This takes quite a bit of processor bandwidth as it can't be done very efficiently. But, MDMA can do this too! It supports flexible source and data size increments allowing us to program it to grab one byte every two bytes to extract the Y channel from YUV422 images (YUV422 images are organized in a repeating YUYV byte pattern).

Finally, the best part of MDMA is how much smarter it is than the regular STM32 DMA controllers. Based on the line byte offset and width we optimally pick the source/destination data/increment/burst sizes to move data using the system buses as efficiently as possible.

Wrapping it all up

After the image has been fully transferred we enqueue it into our flexible frame buffer architecture (using pointers). The processor has to do this part. Then when a new image is received by the DCMI hardware we start the process all over again to receive the next frame. All this happens in the background while we're running your code. In triple buffer mode (which is the default on the OpenMV Cam H7 Plus, the OpenMV PureThermal, and the Arduino Portenta H7) we're able to constantly receive images in the background and store images to SDRAM with effectively ZERO processor overhead. Then when you call snapshot() you're just setting the frame buffer to point to the latest frame that was captured making sure that you have the most recent image (along with having to invalidate the cache where the image was placed in a 32-byte aligned and 32-byte multiple image buffer).

Anyway, thanks for reading! That's all folks!

(What about the OpenMV PureThermal? I don't have any new updates about it this week. Please back it on GroupGets though! The new high performance camera driver architecture is made possible by companies like GroupGets investing in OpenMV. Support us and GroupGets by backing the OpenMV PureThermal).

 

 

 

 

 

 

 

 

 

 

The Path to Performance - Part 2

Hi everyone,

Time for the next blog post. Going to try to keep doing an update for this series every week. But first:

PureThermal OpenMV Price drop!

The PureThermal OpenMV is now $259.99! We managed to shave $30 off the BOM looking for cost savings (we wanted to get it to $249.99 but couldn't get it that low thanks to the current chip market).

We're going to be building 250 of these things for our first production run and go from there. Please back the campaign and lock in your spot now!

Yes, these blog posts are a vehicle for me to keep blasting the email list about the PureThermal OpenMV. But, I'm also writing genuinely useful content below. Maybe I'll have a demo video next week showing something cool onboard off.

Part 2 - DMA Buffer Locality

Where you put DMA buffers in RAM matters. It determines how data flows around in system buses on chip and determines what resources are under load. For example, with the PureThermal OpenMV we're able to do:

  • Constant Image Capture with the OV5640 at 80 MB/s
  • Constant Display Buffer Update (50 MB/s)
  • Constant Display Update at 1280x720 @ 60 Hz (111 MB/s)
  • Constant SPI Bus Input of the FLIR Lepton 3.5 (2.5 MB/s)
  • Constant SPI Bus Output for a TV Shield (10 MB/s)
  • Constant WiFi Output (1.25 MB/s)
  • Constant USB Output (1.25 MB/s)
  • Constant SDIO Output (12.5 MB/s)

At the same time! When we first tried to do all this at once our code fell over. DMA FIFOs overflowed, things locked up, nothing worked. But, we found the answer when looking at the system bus architecture.

The STM32F4/STM32F7 System Bus

If you dig into the STM32F427 reference manual you'll find the below system bus architecture.

STM32F427 System Bus

When we first started developing our firmware on the STM32F427 we didn't have to worry too much about the location of the DMA buffers in SRAM. The camera was slower, we weren't using SDRAM, and the processor was simpler. So, we made no effort to locate DMA buffers optimally.

Now, here's how to look at the picture above. You'll notice there are three SRAM banks. The reason for this is that it allows three masters to read/write to all SRAM banks at the same time. The bus masters are the devices at the top of the matrix while the bus targets are the devices on the right. The bus matrix on the STM32F427 allows all masters to simultaneously read/write to all targets at the same time as long as multiple masters are not trying to share a target. Finally, the dots above show what targets masters can access. For example, if you look carefully above you'll notice that most bus masters can't access AHB1/2 peripherals - just RAM.

Moving on, even on the STM32F765 ST kept the same type of architecture:

SMT32F765 System Bus

Like the STM32F427 System Bus there's one main matrix with three SRAMs available for use. There are two SRAMs on the main system bus matrix along with the DTCM SRAM which all bus masters can access via the AHB slave port on the ARM-Cortex-M7 Processor. Our firmware originally roughly stayed the same between the STM32F4/F7 because of this.

Enter the STM32H7 - A System-on-a-Chip

Now, the STM32H7 is quite a different chip than the STM32F4/F7. While the STM32F4/F7 chips look like very high performance Microcontrollers the STM32H7 is clearly a System-on-a-Chip:

STM32H743 System Bus

It's got three system bus matrices, with a 64-bit AXI bus domain (there are three domains because each can be shutdown to save power). What's AXI? Well, it's a split transaction bus architecture that lets masters issue read/write requests in such a way that resource locking of the bus is minimized. On the AHB Bus a master locks the bus exclusively for the time it takes to complete the read/write. If one master is doing a read which may take a long time to complete another master cannot execute a write while the bus is idle waiting for the read response. With AXI you actually have five transaction channels between each master and target for write requests, write data, write responses, read requests, and read responses. This allows a target to receive multiple read/write requests at the same time, choose how to handle them for the best performance, and respond to the transactions without blocking.

Clearly, with AXI you're not going to have a bandwidth problem on the STM32H7. It's running at 240 MHz with a 64-bit databus for 1.92 GB/s of memory bandwidth. But, you'd be wrong, because, not all bus masters are the AXI domain - some are still the in AHB domain.

The Choke Point

To link the AXI Bus domain to the AHB Bus domain ST choose to use AHB buses. There's the D1-to-D2 AHB Bus and the D2-to-D1 AHB Bus which allow bus masters to communicate across domains. These 32-bit buses run at 240 MHz for 960 MB/s of bandwidth. But, unlike AXI, AHB buses are locked when a master performs a read/write. For example, if DMA2 wants to read from SDRAM it must:

  1. Win arbitration access to the D2-to-D1 AHB Bus.
  2. Use the D2-to-D1 AHB Bus to send a transaction to the SDRAM.
  3. Wait for the SDRAM to respond (might be a while - 100s of clocks)
  4. Return the result over the D2-to-D1 AHB Bus

And... during the time above no other bus master may use the D2-to-D1 AHB Bus. If you recall from the previous blog post, DMA engines on the STM32 line of microcontrollers only have 16-bytes of onboard FIFO space. These FIFOs cannot handle reads/writes taking a very long-time to complete and not overflow if they are constantly receiving data from a peripheral. So, if you were trying to write image data from a camera to SDRAM while pulling another frame from SDRAM to send to SPI using DMA things will crash.

The Solution - Use the Architecture Features!

Back to my original observation, the chip designers at ST put SRAM blocks in different domains. This is on-purpose to solve this exact problem. DMA1/2 are designed to target peripherals and SRAM1/2/3 while BDMA is designed to target SRAM4. By locating DMA buffers in their local SRAM banks you can significantly reduce system bus congestion ensuring that the bandwidth you need is available.

So, the rule is simple. If you've got a real-time DMA transaction that you cannot back-pressure locate it's DMA buffer in the local SRAM near that DMA controller. Do this and things will just work.

Next Week - MDMA

There's another DMA controller on the STM32H7. The Master DMA controller. In the next blog post I'll explain how to use it.

Thanks for reading, that's all folks!