Seeing is, . . . everything

Posted: 07.13.16

The human brain processes entire images in 13 milliseconds

We are visual animals. Ninety percent of the information we gather about the world and ourselves is done by our visual system, and naturally, we use our vision as a model for all other systems. If you want to build a successful robot, AR device, VR system, autonomous vehicle, it has to be able to see. 

Our visual system consumes approximately 4 × 106 ATP molecules per second. Adenosine triphosphate (ATP) is a nucleoside triphosphate used in cells for intracellular energy transfer. Our visual system uses about 2 × 1013 ATP molecules per bit (9 × 105 to 3 × 106 ATP molecules per bit (average 1.9 million ATP/bit), and we process about 10 million bits a second. (The human retina accepts data at 10 million bits per second.) That works out to 11kJ, for 16 hours of non-stop information collec-tion. An alkaline long-life AAA battery stores about 5,000 J of energy, so our visual system uses a little over two AAA batteries a day. 

Replicating such a system is a challenge we will face for a very long time, but we are steadily making improvements. We call them vision or image processing systems, and integrate them into SoCs, and/or separate co-processors. But don’t get confused, an ISP (image-signal processor) is not a vision or image processing system, although a robust visual processing system will likely have an ISP in it. An ISP is just a front end for de-mosaicing (getting the RGB right), color and missing pixel correction (done at the factory), and translating the sensor’s data stream into one that is compatible with the internals of the SoC’s processors. It is the sections behind the ISP, which can involve the GPU, the DSP (if one is present), the CPU, plus specialized processors for tracking (pose), and depth. This is where the men are separated from the boys. Companies that sell separate ISPs (such as DxO), or IP for an ISP (like Imagination Technologies), understand the issues and their designs push into the image processing and control functions offering exposure control, auto-focus, stabilization, and more, so there is a gray line between and ISP and VP. 

Also, given the wide-ranging array of applications and conditions a visual and image signal processing (V/ISP) system has to operate in, one size can’t fit all demands. Qualcomm has invested a lot into their front end with dual 16-bit ISPs, a recently expanded DSP, a very worthy GPU, and a gaggle of 64-bit ARM cores. They’ve also stuck several special function image processing support processors in their new Snapdragon 820, which, BTW, has amazing low-light capability in the Samsung Galaxy S7 Edge—but I digress. 

To use an OTS SoC for VR, or AR, or drones and other autonomous vehicles and robots, the device has to have a really broad range of functions and capabilities. Here’s where the SoC builders have to make some tough decisions. They have to balance the needs of applications that maybe only 5% of the potential users will need against getting the costs and power consumption as low as possible. If you took a SN820 and stuck it in a drone (as several people have) you’re probably only going to be using 50% of the capabilities of that SoC. A smartphone, on the other hand, maybe uses even less, but different parts of the SoC, depending on who the user is and what he or she is doing. Stick a Galaxy S7 in a VR Gear and you use most of it; that may be the biggest work load it gets. 

So the trick to putting in a lot of stuff that most of the users won’t take advantage is to turn off those things—put them to sleep until an app wakes them up. That’s how Qualcomm, Mediatek, Nvidia, and others are able to jam millions of transistors into a tiny little chip and keep it from melting itself and those around it. 

The smartphones of today are packing a 3000 mAh battery, 3X what a AAA battery can deliver, which means a smartphone has more than enough battery power to power our visual system. But a smart-phone, even one with the impressive SD820, can’t come close to the processing power and data consumption of our personal wet-wear. So when your friends start wringing their hands about the rise of the robots, tell them about the human visual processing system. We’re not replicating that gooey mess for a long, long time.