Automated video annotation for risk-proof driverless cars
The race to put the first-ever driverless car on the market is on. And we already have a pretty good idea of what they’ll look like: a car covered in all kinds of cameras and sensors that will record and analyse everything happening in their surroundings, in real-time. According to experts, that’s as much as 10 terabytes of data generated every day for video only. Future driverless cars are foreseen to include approximately 10 CMOS cameras as part of their active driving assisted systems (ADAS), and annotating the data they generate for the likes of road traffic objects, events and scenes will be key to testing and training the computer vision systems without which the car wouldn’t be able to make the right decision at the right time. But there goes the loophole: there is currently a lack of labelled, realistic video datasets of sufficient size, complexity and comprehensiveness to train the computer vision of future driverless cars. “Metadata generation or labelling is tedious work. It’s usually done manually by drawing boxes or pixels and labelling them individually, frame by frame. Such human annotation is slow, inconsistent and excessively costly. Moreover, the opportunity to capture this human knowledge when annotating and to roll it back into the training process is not fully exploited,” explains Dr Oihana Otaegui, Head of ITS and Engineering at Vicomtech – a Spanish Research Centre specialising in Computer Vision. With cloud-enabled video analysis technology, along with tools to fuse video with other data sources, these problems could easily be overcome. And that’s what the Cloud-LSVA (Cloud Large Scale Video Analysis) project was all about: creating large training datasets to be used in vision-based detection systems, along with ground scene descriptions based on objects and events to evaluate the performance of algorithms and systems set up in the car. “Our Big Data platform can automatically pre-annotate large video datasets and upload them to a cloud infrastructure. There, each recorded scene will be analysed and decomposed to detect and classify relevant objects and events for specific scenarios,” Dr Otaegui explains, continuing: “In the second stage, the annotation tool assists users in refining and augmenting annotations. Finally, online learning techniques are applied to update detection and classification models, and to incorporate human knowledge into the automatic processes. Reasoning mechanisms will also be included in some scenarios to enable automatic annotation of complex concepts not previously trained or labelled by human operators, yielding automatic scene descriptions.” From there, users and applications can perform semantic queries over video archives via meta-languages as well as faceted queries to enable rapid results sharing – Online Big Data Video Analytics in the palm of your hand. Although primarily aimed at ADAS functions for automated vehicles and HD cartography generation, Cloud-LSVA also contemplates using scene catalogues from accident analysis initiatives (GIDAS – German In Depth Accident Study) or in-vehicle system quality assessment (Euro NCAP – European New Car Assessment Programme). Beyond the car industry, other applications in robotics and healthcare (which has a similar demand for the annotation of medical images) are also evoked. Future plans The project will be completed at the end of 2018. By then, the team will still need to fully close the loop between in-vehicle processing capabilities and cloud-level computation, so as to provide a fully recursive processing loop: the cloud learns from the annotations, updates models and delivers them to vehicles to increase performance with time. Beyond that deadline, Dr Otaegui also foresees how, in “a not-so-distant scenario, fleets of test cars, and possibly one day private cars, will be driving and collecting even larger data volumes, which will then require an equivalent increment in cloud computing and communication capabilities of the platform to ingest and process the data.” Cloud-LSVA is already tackling this future problem by adopting a computing architecture in which processing capabilities are brought closer to the data source, that is, to the car. “The participation of Valeo and IBM in the project has offered the possibility of exploring the latest developments in embedded computer vision for in-vehicle with the aim of pre-annotating all the data on the fly while being recorded,” Dr Otaegui says.
Keywords
Cloud-LSVA, Big Data, video annotation, CMOS camera, ADAS