Project Success Stories - Need footage in a rush?
Broadcasters have thousands of hours of unprocessed videos sitting in their databases. Many of these 'rushes' could be used for future work, but they often sit gathering virtual dust because programme producers and journalists - usually working to tight deadlines - cannot quickly assess what footage they contain. 'Programme makers may use only a few seconds or minutes out of hours of footage that they shoot,' says Dr Oliver Schreer from the Fraunhofer Institute for Telecommunications/Heinrich-Hertz-Institute in Berlin. 'At the moment broadcasters may do some manual annotation of unedited footage but it is a very time-consuming process and much of the footage remains unclassified and therefore unused,' continues Dr Schreer. 'What they really need is automated methods to organise this material.' Working under the project 'The retrieval of multimedia semantic units for enhanced reusability' (Rushes), Dr Schreer headed a team from European research institutes and the technology industry who looked at how to make the reuse of this type of raw video content much easier. The prototype system that they developed automatically analyses and labels video footage, making the indexing and cataloguing much simpler. Meanwhile they also created a user interface to improve the management and searching of large multimedia repositories. 'Current video databases will present individual images from videos but the user has limited ability to understand and analyse footage. We wanted to create tools that can present video content in a much better way,' he says. Collaborative development The team involved professional and general users throughout the development of the system. A first step was to assess the industry's current workflows and technologies, identifying areas for improvement and additional needs. 'We worked closely with Basque broadcaster ETB to build our prototype,' says Dr Schreer. Once the system was designed, user input tested and validated it. 'We asked journalists and archivists to investigate the different functionalities of the system,' he adds. 'This feedback was vital as we really wanted to provide a solution that met the industry's needs.' The system analyses and categorises raw videos using semantic indexing principles. The team first created a series of algorithms which can detect certain types of objects or content in a video and then automatically generate metadata to describe it. For instance, the system can detect faces (which indicates the presence of people), regular shapes (which shows man-made environments), different types of vegetation, or even different types of camera motion. It is also able to classify different types of audio such as speech, music, noise or silence; different types of water, including sea, oceans, rivers or harbours; or identify common objects such as buses, dogs and ships. And a flashlight detector can help to indicate press conferences or news interviews. 'We adapted and combined existing technologies used in image search and retrieval for video, but also created some aspects from scratch, such as the camera movement detector and the 3D shape detector,' he says. Another set of algorithms used the generated metadata to cluster and summarise rushes' content, creating groupings of content that aid browsing and further processing. 'The metadata model at the heart of the system was quite novel,' says Dr Schreer. Fast retrieval A key consideration of the project was to make a user interface that can enable users to access video content much more efficiently. 'Searching through footage takes a lot of time for journalists, so we wanted to give them new tools to enable them to explore video databases more quickly and help them to reuse content, ' says Dr Schreer. The Rushes system features a series of browsing and visualisation interfaces ranging from simple text-based searches based on keywords, to 'semantic' and visual browsing. The technological concepts behind these tools are the system's grouping of content based on hierarchical clustering, semantic context matching and relevance feedback. 'The system takes the temporal structure of the footage into account, telling the user much more information about how it is organised and helping to put it into context. This enables the users to find relevant content and the specific parts of it they want much more easily and quickly,' he adds. Future outlook The team demonstrated the prototype that they developed under the project, which lasted from February 2007 to July 2009, at a number of major technology events across Europe including the CEBIT telecommunications fair in Hannover. 'The feedback that we got was good, with the industry thinking this could be really helpful,' says Dr Schreer. Individual project partners are now further developing individual aspects of the prototype, with some lasting cooperation continuing. 'The project results will be seen in commercial products which should help broadcast professionals,' he predicts. Rushes received funding from the EU's Sixth Framework Programme (FP6) for research.