Video cameras have become ubiquitous fixtures in our surroundings, whether it’s in our doorbells, in elevators, airports, sports arenas or city streets. The smarter these video cameras become, the more they’re capable of, from improving home security and enhancing public safety to optimizing traffic flow.  It’s become evident that the demand for such camera vision systems, which increasingly interpret visual data, is growing. According to ABI Research, shipments are projected to reach close to 200 million by 2027, generating $35 billion in sales.

As smarter camera systems become more prevalent, there arises a need for greater automation. This includes the ability to monitor video streams in real-time and generate insights more quickly, while simultaneously enabling more efficient and cost-effective streaming and storage. This is where artificial intelligence (AI) enters the equation.

However, even AI-supported camera systems have their limitations. That’s because conventional AI models rely on cloud infrastructure and thus suffer from latency issues and a variety of other challenges. They fall short in providing real-time insights and alerts, and integration with the cloud poses data privacy concerns. Moreover, their dependency on networks introduces reliability issues.

That said, advanced AI for smart cameras calls for an alternative that doesn’t depend on the cloud. What’s needed is AI at the network edge. And if edge AI cameras are to fulfill their potential — adeptly managing a variety of important video functions independently — they can’t just be capable of some AI processing. Instead, they need to be able to handle lots of AI processing.

Edge AI Is Integral

Edge AI, in which AI processing occurs directly in cameras, makes it possible to deliver video analytics, insights, and alerts instantaneously for an increased level of security. Furthermore, edge AI facilitates the streaming of metadata and analysis only — rather than entire video streams — thus reducing the cost of transferring, processing, and storing video in the cloud. Additionally, AI at the edge can bolster privacy and reduce reliance on network connectivity by keeping data out of the cloud.

Yet, up until now, the majority of intelligent cameras include limited compute power for handling AI processing. Because of this, they have generally lacked the ability to enhance video in real-time, which is crucial for accurate analytics.

What sets the next generation of smart camera systems apart is the integration of high compute power and AI processing capacity directly within the cameras themselves. This enables the cameras to not only process advanced video analytics, but also apply AI for video enhancement, resulting in higher quality video. Since each of these functions — enabling advanced video analytics and enhancing video quality — requires their own AI capacity, today’s smart cameras must be equipped with the right amount of AI power.

Improved Video Equals Improved Analytics

Although AI is commonly associated with analytics, it can also be leveraged in smart cameras to enhance image quality, delivering clear and sharp images. In public safety situations, the quality of a video image plays a critical role when assessing risk.

AI is able to perform a variety of image enhancement functions, such as reducing noise in low light conditions, processing high dynamic range (HDR) visuals, and even addressing some aspects of the classic 3A (auto exposure, auto focus, and auto white balance).

Take low-light conditions as an example; they can limit viewing distance and lower image quality. The resulting video “noise” makes it challenging to differentiate detail while actually increasing data size during compression. This leads to inefficiencies when transmitting and storing video data in the cloud.

Although AI is capable of removing noise while also preserving important image details, it takes significant processing. For instance, noise removal from a 4K video image captured in low-light conditions requires approximately 100 giga (billion) operations per second (GOPS) per frame, which is 3 tera (trillion) operations per second (TOPS) for real-time video streaming of 30 frames per second.

With the increasing prevalence of higher-resolution video streams in smart camera systems, there is a growing demand to handle larger amounts of data, identify increasingly intricate and detailed objects, and execute a wider array of tasks through intricate AI processing pipelines.

How Much AI Processing Do You Need?

When a security camera has enough AI capacity, it can execute advanced video analytics alongside AI-driven video enhancement. It can even run multiple AI processes on a single video stream, making it possible to identify smaller and more distant objects with increased accuracy or perform faster detection at higher resolution. In traffic settings, for example, an intelligent camera equipped with sufficient AI processing can perform multi-step automatic license plate recognition, which requires object detection to identify every car on the road then license plate detection to locate the license plate on every car then license plate recognition to determine the characters in each license plate.

To achieve accurate analytics on high-quality video footage, smart cameras themselves must have enough AI power to handle both video enhancement and analytic tasks at the same time. For example, to obtain an accurate license plate number from a video stream, the camera’s vision processor needs to apply semantic awareness (understanding what it sees) to selectively enhance and sharpen the parts of the video that contain relevant visual information. In such situations, processing demands add up quickly.

For basic AI vision tasks like noise reduction, a 2-megapixel (1080p) camera could require around .5 TOPS of processing power. Then, for basic video analytics pipelines, such as object detection, it would need an additional 1 TOPS. Combine those with advanced video enhancement features, such as HDR or digital zoom, and another 1 TOPS could be needed. Interested in facial recognition? That’ll be another 2 TOPS.

When combined, these processing capabilities, which are standard in the fast growing security camera market, require at least 4.5 TOPS — minimum. This means, to enable edge AI and the immediate processing that can actually make people safer, intelligent cameras should include AI vision processors that are capable of at least that much power. Moreover, as video applications evolve to achieve even more, such as re-identifying people who appear on multiple cameras across a large area, like at an airport, smart cameras must be capable of more processing.

The new generation of camera attached vision processors, like the Hailo-15 series of AI vision processors, are designed to answer the growing need for high AI capacity within the camera. This range of vision processors boasts up to 20 TOPS of AI compute power, which easily enables both processing of advanced AI analytics and video enhancements.

Ultimately, integration of AI into smart cameras has the potential to revolutionize many different industries, especially security. A pivotal shift in how we capture, process, and interpret visual information requires pushing the AI revolution out to the edge of networks, where it can do the most good — and fast. Taken separately, AI and video have already made a substantial impact. Together, they can fundamentally transform everyday life for the better.