Machine Learning is a well-established field of research that has existed for decades, which is already present in many products and applications it’s based on collecting large amounts of data specific to a particular problem, training a model using this data and then employing this model to process new data. With regard to video analytics, one of the most critical problems impacting accuracy is object classification. Fundamental to improving performance is the capability to teach the algorithm to distinguish between people, animals, different types of vehicles and sources of noise at an extremely high level of accuracy.
Until recently there have been minimal applications of Machine Learning used in video analytics products, largely due to high complexity and high resource usage, which made such products too costly for mainstream deployment. However, the last couple of years have seen a tremendous surge in research and advances surrounding a branch of Machine Learning called Deep Learning. Deep Learning is a name used to describe a family of algorithms based on the concept of neural networks. Very loosely speaking, these algorithms try to emulate the functionality of the brain’s neurons, enabling them to learn efficiently from example, and subsequently apply this learning to new data.
The recent increased interest in Deep Learning is largely due to the availability of graphical processing units (GPUs). GPUs can efficiently train and run Deep Learning algorithms, and have allowed the scientific community to accelerate their research and application, bringing them to the point where they exceed the performance of most traditional Machine Learning algorithms across several categories.
Solving object classification with deep learning
This means that Deep Learning can now be used to solve the most crucial problem facing video analytics – object classification – by collecting many thousands of images from hundreds of surveillance cameras, which must first be manually labelled and classified by a human, into a range of categories that include: person, car, bus, truck, bird, vegetation, dog and many more. To achieve the required accuracy rates, such a vast database must be collected and identified from actual surveillance footage.
A Deep Learning algorithm trained on images collected from YouTube, Google Search and elsewhere on the Internet will completely fail in analysing images from surveillance cameras, due to the difference in viewing angles, resolution and image quality. Once enough images are collected, a Deep Learning classifier algorithm can be trained and deployed as part of a video analytics solution, enabling it to practically eliminate most of the existing causes for false alarms.
Due to GPU requirements in order for the algorithms to run efficiently, video analytics solutions using Deep Learning will initially need to run on a server. A few solutions of this nature are already available and are showing a dramatic leap in performance in comparison to traditional video analytics, with a drastic reduction in false alarm rates and a significant increase in detection accuracy. Concurrently, these new solutions do not require manual tweaking by the user and are essentially plug-and-play, making mass deployment a realistic premise.
Surveillance applications of Deep Learning
Basic classification and false alarm reduction are the first applications of Deep Learning for video analytics, but they are by no means the only ones. In the not-too-distant future, we will see Deep Learning enabling as yet not possible video analytics applications, such as identifying objects carried by people, such as a gun, handbag, or a knife, or being able to quickly find people and vehicles with similar appearances across multiple cameras and more.
Increasing accuracy with updates
A crucial component in achieving and maintaining the high performance of Deep Learning-based applications, is the ability to continuously update the models as more data is collected so that the models increase in accuracy. This will give an advantage to cloud-based video analytics services, since they can collect vast amounts of data from cameras connected to the service, train new models in the cloud based on this data and then push these new models to cameras at the edge. This continuous improvement cycle will be instrumental in helping video analytics fulfill the promise of improving peoples’ safety and security, by giving surveillance cameras human-level accuracy and a comprehensive understanding of the environment.