Final Report Summary - ADABTS (Automatic Detection of Abnormal Behaviour and Threats in crowded Spaces)
ADABTS (Automatic Detection of Abnormal Behaviour and Threats in crowded Spaces) within the 7th framework programme focus was to facilitate the protection of EU citizens, property and infrastructure against threats of terrorism, crime and riots by the automatic detection of abnormal human behaviour, while respecting privacy and civil liberties. In order to fulfil these requirements technical means can help to solve these problems.
Today's surveillance operators often face the challenge of watching the imagery from literally hundreds of surveillance cameras. Naturally, each operator is able to watch the imagery from only one or a few cameras at a time, making it highly likely that incidents pass unnoticed, or are noticed too late. Moreover, to watch and analyze all these videos, where typically nothing interesting happens, is mentally demanding. A human operator is thus unable to keep focus for long periods of time, and needs some automated support.
While current surveillance systems store the footage and mostly use it for post-event analysis, upcoming surveillance systems aim at detecting anomalous and threatening incidents and behaviours while the incident is in progress or even better, as it unfolds. Such systems would enable pro-active measures to be taken, typically notifying the local security personnel that they should attend a possible incident or crime scene. If the surveillance system could draw the attention to such events otherwise hard to discover, would reduce the operator’s workload and improve the overall situation awareness. User needs and operational requirements as well as legal and ethical restrictions on the proposed automated detection system set the boundaries for such a surveillance system.
Current automatic detection systems have limited functionality, struggling to make inferences about the acceptability of human behaviour. ADABTS has addressed one of the key problems, the definition of threatening or anomalous behaviour, by extracting characterizations in realistic security settings based on expert classifications and the analysis of operator behaviour. Furthermore ADABTS has developed models for certain threats (e.g. violence) and for the typical behaviour in specific contexts, as well as methods for detecting these threats and deviations from the typical behaviour in surveillance data (video and audio). A demonstration system has been implemented, demonstrating a pro-active system focusing on detecting presence of potentially threatening as well as anomalous behaviours. The demonstrator system established a proof of concept for supporting the operator to focus on these unusual findings instead of, as in today’s systems, try to follow all situations all the time in in public spaces or at large scale events. ADABTS explored the possibilities for an automated operator support, linking different sensor techniques together allowing the system to cope with sensor uncertainties and thus enhancing the system performance. This opens for new system functionality that simply disregards the vast amount of imagery that contains nothing interesting and enabling the presentation of highlighted events in the footage where something interesting might be going on. Improved events detection would benefit CCTV operators’ effectiveness, leading to shorter reaction time for terror actions and riots for example. Furthermore automated offline abilities, like searching databases, would also facilitate subsequent content-based retrieval in images after an incident. This creates new possibilities for increased security against threats like terror, crime and riots by enhanced warning systems.
The project gathered experts in human factors, signal processing, computer vision, and surveillance technology in an international consortium. The consortium partners are well known in the area of security and protection of infrastructure and consist of FOI (SE), BAE systems (UK), Detec (NO), Home Office Scientific Development Branch, CAST (UK), Institute of Psychology – Ministry of the Interior (BG), SINTEF (NO), University of Amsterdam (NL) and TNO (NL). The involvement of stakeholders (security system operators and integrators, police organizations, airports, event organizers) has ensured relevance throughout the project.
Project Context and Objectives:
1 INTRODUCTION
1.1 CONTEXT AND MAIN OBJECTIVES
ADABTS (“Automatic Detection of Abnormal Behaviour and Threats in crowded Spaces”) aim was to facilitate the protection of EU citizens, property and infrastructure against threats of terrorism, crime and riots by the automatic detection of abnormal human behaviour. The main objective was to develop the necessary sensor processing and inference mechanisms to automatically detect potentially threatening or anomalous behaviour of individuals in large crowded spaces, associated with, e.g. public transports or large-scale events.
Today's surveillance operators often face the challenge of watching the imagery from literally hundreds of surveillance cameras. Naturally, each operator is able to watch the imagery from only one or a few cameras at a time, making it highly likely that incidents pass unnoticed, or are noticed too late. A human operator is thus unable to keep focus for long periods of time, and needs some automated support. If the surveillance system could draw the attention to such events otherwise hard to discover, would reduce the operator’s workload and improve the overall situation awareness.
Such a decision support system would help the operator to come to a conclusion faster and more accurately. Decision support systems focus on aiding the situation assessment where important decision support questions to address when developing new systems are:
• In what way can a concept be developed to support the decision process?
• Does the tool make the development of events more observable?
• Will the user get an increased understanding about causal relations as an incident unfolds?
The behaviour of individuals in crowded scenes can be very difficult to analyse. At the most basic level, this is simply because it is hard to keep track of persons. It is even more difficult to determine what activity a detected individual is engaged in, since gestures and poses often cannot be reliably estimated. Much of the activities in the project were concerned with exploring sensor fusion as a means for increasing the robustness of behaviour analysis. The challenges posed by occlusion and cluttered dynamic backgrounds can be mitigated by combining processing results from multiple camera views. Multi-modal fusion is also of interest. Audio is for example a data source that can provide highly valuable information in crowded scenes. Acoustic events, e.g. glass breaking, screams and gunshots, are strong indicators of anomalous behaviour that can be difficult to detect in video. By combining audio event detection and sound source localisation with video tracking, the location and behaviour of persons close to the acoustic event can be tracked over time.
The main question is, how is human behaviour represented in order to detect relevant abnormality? In recent years, hierarchical representations, where behaviours are viewed as particular sequences of low-level activities (with some random variations permitted) have become popular. Activities are regarded as a finite set of primitives or basic building blocks that can be combined in different ways into specific behaviours (in analogy with how words are combined into sentences). This model is often well-suited to public environments, where one may, e.g. enter the premises through any of several entries, join one of several queues, wait for company, and exit. Trajectories obtained from person tracking are the main data source for model inference in this case, which requires automatically determining which and how many basic activities there are, and how they are sequenced. Anomalies are detected as activity transitions that do not fit well with any of the learnt behaviour models, or, at lower level, as basic actions that are inconsistent with the learnt set of activities. Behaviour can also be represented in ways that do not require person tracks for inference, which is particularly attractive for densely crowded scenes. Much abnormal behaviour, e.g. aggression, is revealed by characteristic
motion patterns (gestures, etc.) over time spans much shorter than complete person tracks. Offensive behaviour that is difficult to detect directly is often revealed by bystander reactions that can be represented as characteristic spatial distributions of motion.
1.1.1 Main objectives
The overall objective of ADABTS was to develop visual and acoustical sensor processing and inference mechanisms to automatically detect potentially threatening behaviour of individuals in a group or crowd in large public spaces, e.g. those in relation to public transport or large scale events. The five specific project objectives are listed below.
Objective 1: Address the European need for increased security against deliberate threats (terror, crime, riots) by advancing the capability for automatic detection of abnormal, potentially threatening behaviour of crowds or individuals in crowds while respecting privacy and civil liberties.
Objective 2: Collect user needs with respect to Objective 1 and based on this develop models for describing various targeted abnormal behaviours. These models will be used when specifying surveillance objectives and designing algorithms for automatic detection. Contribute to standards for defining and detecting abnormal behaviour.
Objective 3: New capabilities for automatic detection of targeted behaviours based on new and existing methods and algorithms for signal analysis and fusion of information obtained from video and acoustic sensors, taking into account context information.
Objective 4: Optimization of detection algorithms for real-time processing based on commercial low cost heterogeneous hardware, and integration of a platform for real time on-line demonstration and evaluation.
Objective 5: Disseminate project results to European Stakeholders (in policy, standardization and legislation), System Integrators and Security Providers and exploit project results among consortium end users (system developers) and RTD partners.
By fusing data from heterogeneous sensors in different views, person detection in crowded scenes can be performed with greater robustness and accuracy. In ADABTS, several methods have been investigated for combining foreground segments extracted in multiple views into estimates of scene volume occupancy. The location of individuals can then be determined by processing the volume data. Another class of detection methods, where object hypotheses are generated in each camera view and merged in the 3D scene, has also been investigated. Such methods are applicable to single views, and therefore less dependent on multi-view coverage. Both classes of methods have strengths and weaknesses, and increased robustness should be possible to achieve by combining elements from both.
Target tracks are a rich source of information for behaviour analysis. In crowded scenes, however, it can be very difficult to maintain track of each individual. By modeling the appearance of the tracked person, correct observation-to-track association can be accomplished for closely moving individuals, and in the presence of occlusion. In ADABTS several appearance representations have been explored, including a combined 3D shape and texture model that can also be used for determining the main orientation of the person. A recursive multi-view tracking algorithm combining appearance modeling, volume carving and foreground object detection has been developed and shown to outperform state-of-the-art methods. For approaches where person detections in the different camera views are merged in 3D, head tracking in the image plane makes it possible to replace single frame person detections with reliable track fragments, or tracklets. This leads to a large reduction in the number of observation-to-track hypotheses that must be maintained, while simultaneously improving tracking performance.
Audio events, e.g. gunshots, glass-breaking, screams and abusive chants are associated with behaviours that may be very difficult to detect in video. These events can be detected by audio processing, and, using a microphone array, localised in the scene. By calibrating video cameras and acoustic arrays to a common reference frame, people close to the audio event can be tagged and tracked through the scene, long after the event actually has occurred.
Anomaly detection is a fundamental capability of a surveillance system. In ADABTS, track-based anomaly detection methods have been developed that overcomes important limitations of previous approaches. Specifically, such methods by-passes an initial quantisation step, where the scene is typically divided into a regular grid of cells, which are either occupied or not by a trajectory.
While multi-camera methods substantially increase the accuracy and robustness of person detection, tracking and pose estimation in crowded scenes, as the crowd density increases, a point is eventually reached where reliable tracking becomes infeasible. A robust, high-performance system for automatic detection of abnormal behaviour must be able to autonomously determine when conditions are unfavorable for person tracking and then fall back to cruder feature-based methods.
Gestures provide valuable information on behaviour, but are difficult to estimate in crowded scenes. However, much of the information available in gestures can be extracted without explicitly tracking the pose of individuals, namely by computing image motion. In ADABTS, a fight detector has been developed using histograms of motion features as input.
During the project, two major data collections were performed in staged sets with professional and lay actors on a football stadium. The actors had to sign a consent form to take part in the data collection prior to the actual event. Staged sets were used to respect the privacy of people and because collecting threatening behavior in authentic environments would intrude on people’s privacy. The two datasets recorded in the main entrance hall of the football club ADO Den Haag´s home arena in October 2010 and June 2012 provided a common test bed for the algorithms developed during ADABTS. The dataset contained up to 30 people moving around in several scenarios tagged “normal”, “terrorist”, “pick-pocket” and “aggression”. Video and audio data was recorded by a dozen sensors from various viewpoints. The resulting data base describes the formats of the data collected and systems used to collect it and were also used for system evaluation. A representative subset of data will be made available to the public, after the project finishes, via e.g. the project website.
In conclusion, ADABTS developed new functionality for the surveillance area for single as well as multiple views. The availability of multiple sensors to observe a scene is one area where sensor fusion can help to obtain a high-level representation of people's behaviour. In multi-person tracking, especially in real-world surveillance settings targeted in the ADABTS project, people may be more clearly visible in some camera views than others due to occlusions, perspective or challenging environments (e.g. reflections, lightning changes). Fusing information from many camera views directly into tracking can remedy these problems.
In order to cope with moderately dense crowds several 3D detection and tracking functionalities was developed during the project using multiple overlapping cameras. By tracking multiple persons as they move through the observed space, typical movement patterns can be learned, which in turn can help to identify threatening or anomalous patterns. Tracking is also important to relate observed spatial features (e.g. beamformed audio) to persons in the scene. Detecting loitering and people entering forbidden zones are also applications using tracking techniques. Furthermore other complementing techniques were developed such as flow patterns analysis that is less dependent on occlusions. Such techniques were used for fight detection for example.
ADABTS has furthermore developed various support tools using audio technology in combination with. Using a microphone array instead of a single microphone allows to selectively enhancing sounds coming from a certain direction and to detect and visualize where a sound came from. This technique was proven well suited for detecting abnormal sounds and classification of specific sounds as screams, gun shots and breaking glass. Furthermore, the sound source localization technique was also proven to work well in combination with video for tracking sound events in video and as an example of attention based steering of sensors, using a Pan-Tilt-Zoom camera.
ADABTS has furthermore developed new and adapted existing sensor processing methods and algorithms for detecting and tracking people in complex environments, involving groups of people or crowds. Extracted sensor data features (e.g. tracks, voice pitches and body articulations) can be related to identify unwanted behaviours. Also combining information from heterogeneous sensors has proven to be successful in supporting the detection of unwanted behaviours. Furthermore ADABTS has adapted algorithms to run on commercially available low-cost hardware architectures consisting of multi-core CPU’s combined with several multi-stream GPU’s (Graphical Processing Units). Such hardware combined with the new , in rapid development driven by the game industry, represents a huge potential for future high performance surveillance systems.
An Ethical and Dual use advisory board reviewed the project activities, like data collection, scenarios and implications of a finished surveillance system for society. In conclusion the EDUAB conclude that hidden bias must be avoided and face or speech recognition should not be performed to safeguard the respect for privacy. Also they concluded that the system functionalities should focus on unwanted behaviours rather than to detect appearances to respect people’s privacy (Appendix 1).
Improved events detection would benefit CCTV operators, leading to shorter reaction time for terror actions and riots for example. Furthermore automated offline abilities, like searching databases, would also facilitate subsequent content-based retrieval in images after an incident. This creates new possibilities for increased security against threats like terror, crime and riots by enhanced warning system.
Project Results:
2 SUMMARY OF THE MAIN S&T RESULTS/FOREGROUND
The work was subdivided in eight work packages (WP). The work packages were organised in a manner so that the objectives were divided in well-defined tasks. The different work packages were interconnected to obtain the overall goal, as described in Figure 1.
Figure 1. ADABTS work packages.
2.1 WP2 USER NEEDS
WP2 investigated the user needs operational requirements as well as legal and ethical restrictions on the proposed automated detection system. Within the work package four areas of work were addressed; User needs and Scenarios, Human-Machine-Interface (HMI), Legal aspects and Ethical considerations. These aspects were investigated and assessed for a variety of surveillance tasks in different areas.
User needs and scenarios
In a first stage, focus was set on user needs and human factors in order to define and model threatening behaviours. ADABTS created models of threatening or anomalous behaviors describing how they can be observed. In order to detect behaviors defined by these models, advanced methods for sensor data analysis were required. These methods extracted sensor data features that could be coupled to the defined behaviours, and thus detect the presence of the (potentially) threatening behaviour.
First the operational requirements of various types of operators and users and relevant scenarios were established for different applications (e.g. large scale events, public transport systems). This was done by interviewing domain experts and responsible parties (e.g. police, (local) governments, owners of public facilities). From these identified goals of the various users, new desirable system capabilities were identified. Furthermore human factors issues were raised where user needs and relevant scenarios
were identified. The analysis showed that for the ADABTS project three locations were important to pursue: Airport, Stadium and Town centre. It was also established that the behaviours could be grouped into three types of behaviours: a terrorist behaviour, crowd behaviour and individual/small group behaviour. These locations and behaviours rendered an interesting mix of scenarios to pursue relevant to the users and society. These scenarios were used to generate lists of detectable elemental behaviours that together, and in context, may indicate potential threats. The scenarios also implicated the functional requirements of the demonstrator surveillance system to be developed.
Next, requirements for user needs support functionality were established. The study was especially focused on the requirements and human factors issues associated with the new support functionality. Requirements and issues (incl. legal) associated with the use of audio were extensively covered, since this functionality is hardly used in current systems.
In many cases operators can only concentrate for up to 40 minutes, after which “video blindness” sets in and detection rates drop severely. Furthermore the number of cameras that has to be overviewed can be extremely large, whereas the number of monitors that can feasibly be handled by a single person ranges between 4 and 16. This leads to an ineffective use of camera surveillance equipment where only a fraction of the incidents are detected in real-time.
HMI
Requirements for HMI support functionality were established through interviews with the end-users and through a literature search. The study was especially focused on the requirements, and human factors issues associated with the new support functionality supplied by an ADABTS-like system. The requirements and issues (incl. legal) associated with the use of audio were covered extensively, since this functionality is hardly used in current systems. Effective HMI relies on good control design. It was shown that modern surveillance systems allow for a range of decision support tools, however one must be aware of that the situational awareness (a good overview and insight in the situation) is only achieved in the human operators mind. Available information for the operator about followed surveillance activities as well as how the information is presented is therefore important. The system interfaces design should support the human operator to achieve best possible SA with appropriate workload in time of necessity. The presented information should support the operator intuitively in the task performance and the system should also provide ergonomic tools interacting with the operator.
Since the automatic detection algorithms is far from perfect at the same time as the human processing capability is limited a combination of the strengths should be combined in a support tool. At the same time the false alarms must be held on a reasonable level keeping the operator workload on an appropriate level.
Legal aspects
A surveillance system can provide personal data such as video and audio that can be related directly or indirectly to identifiable persons. In order to handle such data a number of terms must be taken under consideration;
The data must be:
• Fairly and lawfully processed
• Processed for limited purposes
• Adequate, relevant and not excessive
• Accurate
• Not kept longer than is needed for the purpose it was obtained for
• Processed in accordance with the data subject's rights and finally secure and
• Not transferred between countries without adequate protection.
Ethical considerations
Ethical restrictions on and implications of the proposed ADABTS system were also addressed in this work package. The results indicated that a system should be designed in such a way that individual personal identity is protected in accordance with the system description in the ADABTS proposal, i.e. not to obtain any personal information of observed individuals. In addition to these system restrictions the collection, distribution and use of database data during the ADABTS development process must also take ethical concerns into consideration. If possible it is preferable to select data sources that can be used without challenging subjects’ privacy. Privacy concerns are reduced by the fact that the ADABTS system does not perform any type of identification (e.g. face recognition), it covers only a local area, and detects potentially harmful activity, independent of person identity or background.
Furthermore related to future ADABTS-like systems, regulations may need to be adjusted to automatic detection, automatic filter may be less subjective (biased, prejudiced), privacy sensitive information may be shielded from the user and finally, interpretation should be left to the user (i.e. no automatic assignment to the system).
In Summary WP2 User needs concerned user requirements where relevant scenarios and ethical/legal aspects were investigated and assessed for a variety of surveillance tasks in different areas.
Summary of major achievements from WP2:
• Definition of User requirements and desired capabilities
• Identification of Data usage conditions
• Scenarios use case
• HMI requirements
• Legal aspects requirements
• Ethical considerations
WP2 was completed in 2010 and comprised the deliverable
• D2.1 User Needs
2.2 WP3 ABNORMAL BEHAVIOUR DEFINITION
The WP3 purpose was two-fold. First, to understand and define abnormal behaviour of individuals, groups and crowds in crowded spaces in general and in the context of the three different locations suggested in WP2 as being of importance in ADABTS: Town Centres, Airports and Sporting Stadium. Second, to model behaviours and scenarios of high relevance to ADABTS focusing as WP2 suggests on terrorist scenarios in the airport environment, crowd scenarios in the stadium, and individual/small group scenarios in the town centre context.
The work comprised a wide range of sources and methods in order to accurately understand abnormal behaviour and different contexts. The overall purpose of these activities was to establish a list of identified behaviour indicators for threats, and behaviour models for scenarios of high relevance to ADABTS. Distinct and visible behaviour, such as whole-body behaviours including movement about a space or excessive body gestures were identified as well as behaviours that are less obvious such as signs of stress, eye movements, mumbling and sweating. These indicators served as a core input to the development of the automated detection system. The abnormal behaviour indicators were developed from the following activities:
• Literature review and review of the commercial state of the art in abnormal behaviour detection
• Past incidents analysis
• Analysis of CCTV operator behaviour using eye tracking and verbal protocol analysis
• Surveying domain experts (extracting explicit knowledge)
The review of the literature and commercial state of the art showed that abnormal behaviour is a topic of research in many different fields and that there are abnormal behaviours of individuals, crowds and groups which might be categorised in terms of behaviour, body language, movement, and appearance indicators. The review highlighted that context (e.g. location, time, and type of threat) plays a key role in the definition and interpretation of abnormal behaviour and that crucially it is usually a combination of abnormal behaviours, either observed together or sequentially, that is important when deciding on the response to an actual or potential threat. For example, observed nervous behaviour may cause concern, but a CCTV operator will not raise an alarm unless other suspicious signs are observed as well.
The past incidents analysis examined the behaviour aspects of a number of different types of incidents based upon expert analysis of CCTV footage of actual incidents. The total volume of the analyzed footages was about 140 hours. This analysis provided objective data on abnormal behaviour that adds weight to the behaviours elicited using other more subjective methods. The analysis of past incidents revealed individual, group and crowd abnormal behaviour as well as the appearance of the perpetrators of an incident in many cases. The analysis also revealed behaviours leading up to an incident (pre-incident) as well as behaviours occurring during the incident itself. The pre-incident behaviours tended to be appearance indicators and behaviours not involving physical contact (e.g. hand waving, and running) whereas the incident behaviours tended to involve more violent actions (e.g. kicking, pushing, and hitting). These results provided abnormal behaviours related to specific incidents and a guide to the timeline of behaviour occurrence during a particular type of incident. The footage was restricted only to be used for the purpose of the scientific analysis for the Abnormal Behaviour Definition, as part of the ADABTS Project.
Another objective method employed to determine abnormal behaviour was eye tracking in conjunction with verbal protocol analysis. This method was used in two studies to objectively identify what cues CCTV operators working in town centre, airport, and football stadium control rooms find suspicious (abnormal), what cues are associated with various types of incidents and contexts, and importantly what these cues mean. These studies revealed individual, group and crowd abnormal behaviour as well as the appearance of an individual or group as being suspicious and show that some abnormal behaviour indicators are specific to a particular context whereas other behaviour indicators are more universal. Both eye movement studies support the previous literature study that suggests that it is often the combination of behaviour indicators in a particular context that are important in making a decision regarding suspicious activity.
The utility of abnormal behaviours in diagnosing a potential problem was determined quantitatively for the three contexts: airport, town centre and sporting stadium. The list of indicators was separated into five distinct categories: behaviour/body language, movement, appearance, audio and crowd behaviour/movement. The list was then rated by 28 experts. The ratings provided subjective ratings of Importance, Occurrence and Usefulness to detect and Timescale of the occurrence for each abnormal behaviour indicator for each context.
The set of behaviour indicators rated as important in detecting a potential problem was relatively constant across the three contexts, perhaps indicating that some behaviours are more or less independent of the context in which they are detected. However, the fact that the level of utility of the behaviour is not constant across contexts does indicate that in certain contexts it will be more useful to detect certain behaviours more accurately or more consistently than others.
In line with the areas of application, ten brief scenarios of high relevance to ADABTS were developed focusing on terrorist scenarios in the airport environment, crowd scenarios in the stadium, and individual/small group scenarios in the town centre context. For each threat scenario type, a number of specific scenarios was modelled in conjunction with domain experts, which formed the basis for scenario models to be used in later work packages.
The work package outcome also contained the results of a second round of expert interviews where ‘primitive behaviours’ was elicited and rated for a number of relevant complex behaviours such as aggressive or intoxicated behaviour of an individual. An analysis was provided of how related different complex behaviours are, and what kind of primitive behaviours are informative in distinguishing complex behaviours. Furthermore, a probabilistic model was presented and the detectability of primitive behaviours discussed by analysing behaviour occurrences.
The task of establishing the criteria upon which a definition of abnormal behaviour could be based was however proven to be complex and difficult. Researchers in the field of clinical psychology have proposed certain criteria of abnormal behaviour, which are broad enough to be transferred across and applied in other scientific areas. Davison and Neale (2001) state these criteria as follows:
• Statistical infrequency – it is suggested that abnormal behaviour is infrequent as a rate of occurrence in statistical terms. Thus, abnormal behaviours fall within the extremes of the normal-distribution curve and their manifestations are always outlined against the “background” of what is considered as “normal” within given culture.
• Violation of norms – typically, abnormal behaviour manifests itself through the violation of certain social-conventional, moral-ethical and legal norms or threatens/makes anxious those observing this violation.
• Personal distress – abnormal behaviour usually causes great distress and torment to the person experiencing it (although there are certain psychopathological conditions that make an exception, e.g. antisocial personality disorder - psychopathy).
• Disability or dysfunction – it is considered that abnormal behaviour impairs some important area (or areas) in a person’s life (e.g. work or personal relationships).
• Unexpectedness – abnormal behaviour very often is an unexpected response to certain environmental conditions.
Within the context of the ADABTS project, three of the above-mentioned criteria could be taken into consideration in the following work, as comparatively relevant: statistical infrequency, violation of norms and unexpectedness. Abnormal behaviour also differs with the type of location, e.g. public transport, sport stadiums, amusement parks or large scale events, because normal as well as abnormal behaviour is different for each of the types of locations.
Throughout the project, video and audio data on normal and threatening behaviour have been captured for the purpose of developing and testing the automatic detection of these behaviours. A selection of these data captured under controlled conditions, with known content, forms the ADABTS ‘dataset’. A database of recordings of abnormal behaviour was accomplished with video and audio data related to “normal” and suspicious behaviours defined in previous work package. The data base describes the formats of the data collected and systems used to collect it and include details of the recording equipment and its calibration. The scenarios and high level descriptions of the behaviours captured in the data are also included in this dataset.
The output of WP3 was finally taken forward to the next stage of the project to inform decisions regarding abnormal behaviours and scenarios. The most beneficial indicators to detect within the detection system, and furthermore which indicators that may be most useful in a pro-active intervention system were defined. More specifically the WP3 outcome comprised:
• The State of the Art in abnormal behaviour definition and detection.
• Prioritized lists of abnormal behaviours for airports, town centres and sporting stadium.
• Models of a sub-set of behaviours and scenarios.
• Data base of audio and video recordings.
WP3 was completed in 2012 and comprised two deliverables
• D3.1 Abnormal Behaviour Definition
• D3.2 Behaviour database
2.3 WP4 SYSTEM SPECIFICATION
WP4 “System Specification” objective was to specify the system to be implemented in the project. Given the input from WP2 “User need” and WP3 “Abnormal Behaviour Definition” this work package produced the specifications and requirements of the demonstrator system to be built aiming at demonstrating important results from ADABTS. The aim for the platform was to manage live video analysis, live acoustic processing, body pose estimation, tracking, and methods related to automatic discovery of potential violent situations. The objective was to specify a demonstrator system that would be able to show novel concepts for automatic detection of threatening behaviours to be used in tomorrow’s surveillance systems. The focus was to show how complex video analysis can be conducted live in real-time using heterogeneous computing platforms with affordable graphical processing units (GPUs). These GPUs can offer high performance at a very low cost when used for suitable parallel algorithms. The specification included requirements as well as outlines of system functionality, architecture and integration protocol.
During this work an advanced Human Machine Interface (HMI) was specified. The results from this work showed that an effective HMI relies on good control design and modern surveillance systems allow for a range of decision support tools. However, one must be aware of that the situational awareness (a good overview and insight in the situation) is only achieved in the human operators mind. Available information for the operator about followed surveillance activities, as well as how the information is presented is therefore important. The system interfaces design should also support the human operator to achieve best possible SA with appropriate workload in a timely fashion. The presented information should support the operator intuitively in the task performance and the system should also provide ergonomic tools interacting with the operator, See Figure 2.
Figure 2. Generic workplace concept
Furthermore scenario definitions were further detailed in this work package. The work was primarily concentrated on the definition of a series of scenarios that would serve as a basis for testing and demonstration of the behaviour detection. These scenarios were defined by taking into account the user needs established in WP2 “User Needs”, and the abnormal behaviour definitions and scenario models specified in WP3 “Abnormal Behaviour Definition” and the current requirements of the ADABTS partners for demonstration.
Summary of major achievements from WP4:
• Requirements of system functionality
• System architecture for advanced real time applications
• System Integration protocol
• HMI
o Alarm rules & context settings
o Alarm handling
o Principles to reduce (false) alarms
o Attention/HMI levels
• Detailed scenarios
WP4 was completed in 2011 with two deliverables
• D4.1 Scenario Definition
• D4.2 System Specification
2.4 WP5 ABNORMAL BEHAVIOUR DETECTION
The main objective of WP5“Abnormal Behaviour Detection” was to develop the necessary visual and acoustical sensor processing and inference mechanisms to automatically detect potentially threatening abnormal behaviour defined in previous work packages. WP5 comprised Visual Person Detection and Tracking, Visual Human Pose Recovery and together with the audio- and sensor-fusion tasks, they provided the input to the subsequent tasks on threatening/anomalous behaviour detection.
Initially, activities in WP5 aimed at localizing and tracking persons effectively through complex, crowded environments using video sensors. This lead to a trajectory-based description of scene activity (e.g. location, time). Furthermore body pose, such as body facing direction functionality was developed. The obtained descriptors were subsequently used for abnormal/anomalous behaviour detection.
Detecting and tracking people in crowded scenes is a very challenging problem. The cluttered, dynamic backgrounds make person detection by simple background subtraction methods infeasible. The various detections need to be associated to tracks, possibly creating new ones and discontinuing old ones. Observable attribute data, such as person motion and appearance, can help reduce the detection-to-track assignment problem. Knowledge about the spatial layout of the scene, i.e. entry/exit point, can further be used to infer where creations or discontinuations of tracks are especially likely. The main challenge with dealing with detection and tracking multiple people is occlusion. Using a multi-view camera system the occlusion effects can be mitigated, relying on unobstructed views and an analysis of depth ordering. Fusing spatial and colour information across cameras poses its own challenges, when it comes to different object viewpoints, lighting conditions and sensor characteristics.
The overall detection and tracking problem in crowded scenes was addressed in a multi-facetted way. Single-view 2D person detection under partial occlusion was achieved by a classifier-based method and by face detection. The ADABTS partners developed complementary techniques for mapping 2D person detections in the various images to 3D positions. In terms of tracking, the developed 2D tracker was found superior to a commercial product.
The main effort of this work however went into 3D tracking of multiple persons (i.e. determining their locations on the ground plane) using overlapping camera views. Two approaches were investigated: network flow-based methods and the classical methods applied to multi-target tracking. Apart from using detections on the ground plane, it was investigated how attribute data (person appearance) can improve tracking performance.
The dataset recorded in the main hall of football club ADO Den Haag in October 2011 provided a common test bed for the algorithms developed in this Work Package. The dataset contained up to 30 people moving around in several scenarios tagged “normal”, “terrorist”, “pick-pocket” and “aggression”. Video data was recorded by a dozen cameras from various viewpoints.
The experiments showed good track localization performance on the collected data set for small-to-medium sized groups (up to a dozen of people). However, the number of identity swaps increase whenever people come in close proximity, especially for the recursive filtering approaches. For large-sized groups, more work is needed on component-based approaches for detection and tracking, to handle significant amounts of occlusions, and on the suitable combination of single-view and multi-view approaches.
Furthermore, the overall person body orientation is an important determining factor for estimating who interacts with whom, which in turn is important for abnormal behaviour detection. Four orientation classes were defined with respect to the camera (i.e. facing front, back, left, or right). A mixture-of-experts approach allowed estimating the body orientation.
The following work involved how abnormal behaviour detection in crowded scenes, and how sensor fusion – in different forms – can be used for achieving increased robustness and performance.
By integrating information from multiple views, person detection in crowded scenes can be performed with greater robustness and accuracy. In ADABTS, several methods have been investigated for combining foreground segments extracted in multiple views into estimates of scene volume occupancy. The location of individuals can then be determined by processing the volume data. Another class of detection methods, where objects are generated in each camera view and merged in a 3D scene, has also been investigated. Such methods are applicable to single views, and therefore less dependent on multi-view coverage. Both methods have strengths and weaknesses, and increased robustness should be possible to achieve by combining elements from both.
Target tracks are a rich source of information for behaviour analysis. In crowded scenes, however, it can be very difficult to maintain track of each individual. By modelling the appearance of the tracked person, correct observation-to-track association can be accomplished for closely moving individuals, and in the presence of occlusion. In ADABTS several appearance representations have been explored, including a combined 3D shape and texture model that can be used for determining the main orientation of a person. A multi-view tracking algorithm combining different methods has been developed, and shown to outperform state-of-the-art methods. For approaches where person detections in the different camera views are merged in 3D, head tracking has been proven successful, while simultaneously improving tracking performance.
In summary, successful person tracking is required for detecting loitering, as well as more subtle anomalies in complex behaviours of individuals in large public spaces. Pose and gesture recognition provide additional information useful for detecting aggression, persons falling to the ground, and for recognising suspicious activities. Multi-camera methods substantially increase the accuracy and robustness of person detection, tracking and pose estimation in crowded scenes, as the crowd density increases. However, a point is eventually reached where reliable tracking becomes infeasible. A robust, high-performance system for automatic detection of abnormal behaviour must be able to autonomously determine when conditions are unfavourable for person tracking and then fall back to cruder feature-based methods.
Furthermore, audio events, such as gunshots, glass-breaking, screams and abusive chants are associated with behaviours that may be very difficult to detect in video. These events can be detected by audio processing, and, using a microphone array, localised in the scene. By calibrating video cameras and the microphone array to a common reference, people close to the audio event can be tagged and tracked through the scene, long after the event has occurred. More specifically the approach here was based on ‘scanning’ the environment by applying beamforming on the outputs of an acoustical sensor array and applying classification algorithms for detecting specific sources.
Following this activity a system for real-time sound event detection and labelling was developed. The objective of this audio related work was to show the added value of audio to video-surveillance. The approach was based on ‘scanning’ public environments by applying beamforming on the outputs of an acoustical sensor array and applying classification algorithms for detecting specific sources. In this study, a real-time beamforming system that delivers required sound parameters used in sound classification was developed. For this purpose a number of different beamforming methods were considered, differing in performance and computational complexity, Conventional methods like Delay and Sum (DAS), and other advanced methods provide an improved performance on the cost of increased complexity Minimum Variance Distortionless Response (MVDR).
The developed real-time application takes into account the intensive routines of MVDR. Since the properties of the deployment platform are not known in advance, it is supplied with a mechanism for adapting to different and changing, available hardware resources such as available CPU-time, arithmetic units and memory. In this way it will always deliver the best possible solution, based on what the user is offering. Still, an extensive implementation process has led to a relatively fast execution of the algorithm. The system is supplied with a user interface for controlling a number of parameters and for obtaining the first visual effects. Furthermore, it is provided with a user-friendly mechanism for calibrating the system for each possible deployment environment.
Furthermore a system for real-time sound event detection and labelling was developed. New functionality to detect (interesting) acoustic events (like gunshots, screams and glass breaks) recorded by microphone array was developed. The approach to solving this problem was based on statistical hypothesis testing theory known as change point detection. The method was further improved by considering normalized frequency spectrum. In this way the power dependency of the events was reduced. Also, spectral differences in different guns or glass panes can be accounted.
Several recordings sessions were conducted during the project to acquire validation data to test and optimize the various algorithms. The specific approach was based on statistical hypothesis testing theory and was proven successful in discriminating between specified acoustic events (gunshots, screams and glass breaks). The audio related tasks within the ADABTS project showed the added value of audio to video-surveillance.
Finally, gestures provide valuable information on behaviour, but are difficult to estimate in crowded scenes. However, much of the information available in gestures can be extracted without explicitly tracking the pose of individuals, namely by computing image motion in the sense that it utilizes the optical flow computed from a video sequence. In ADABTS, two fight detectors was successfully developed using motion features (e.g. waving arms) as input.
A key element in aggressive behaviour once it has reached the level of physical violence appears to be rapid motion of the aggressors’ limbs since punches and kicks thrown constitute motion atypical of the normal.
Summary of major achievements from WP5:
• 2D person detection and tracking algorithms (Single view)
• Multi-person tracking from overlapping cameras
• 3D person detection and tracking algorithms (Multiple view)
• Advancements in human pose recovery in 3D
• A system for real-time sound localisation and sound enhancement by means of a microphone array and beamforming techniques
• A system for real-time sound event detection and labeling
• Advancements in combining sensors information for sensor management
• Improved Motion gesture detection (e.g. for fight detection)
• Advancements in detection of anomalous behavior, like loitering or entering
sterile zone detection.
WP5 was completed in 2013 and comprised three deliverables
• D5.1 Vision-based Human Detection and Action Analysis
• D5.2 Sound Source Localization and Analysis
• D5.3 Abnormal behaviour Detection by Sensor Fusion
2.5 WP6 REAL TIME PLATFORM AND SYSTEM INTEGRATION
In WP6 Real Time Platform and System Integration contained the development of a real-time hardware and software platform integrating the algorithms developed in WP5. Concepts and methods for optimal implementation on heterogeneous hardware of low-level image analysis algorithms were developed as well as overall guidelines for GPU architecture and guidelines for algorithm design.
The overall purpose of the work package was to identify and alleviate processing bottlenecks in the ADABTS project through the use of off-the-shelf graphics cards intended for gaming, so called Graphics Processing Units (GPUs). These GPUs can offer high performance at a very low cost when used for suitable parallel algorithms. Developing algorithms for GPU-based implementations consists of several phases. The first phase involves validating different algorithms for their potential for GPU implementation. Following this phase, a GPU adapted version of the algorithm must be developed, taking into account information of how to achieve high performance on GPUs. Finally, the algorithm is implemented and optimized on the actual GPU.
Specifically, three important challenges were identified for image processing work on GPUs; firstly, most image processing and analysis algorithms will usually run slower than optimal simply because data transfer is a major hurdle. Secondly, latency is identified as a serious issue, both in literature and in this work. Finally, many image processing and analysis tasks can be solved by standard GPU patterns while other algorithms will quickly outgrow the size of the on chip memory on the GPU forcing the kernels to access the much slower global memory.
The implementation work for the ADABTS Demonstrator System (ADS) highlighted some challenges, as well as reinforcing the experiences from work discussed in the literature. Specifically, it was showed that one of the major bottlenecks in a practical system is the data transfer. The development hardware system could be shown to handle transfer and decoding of data from around 8 cameras in full HD if video frame rate was of the essence, meaning that the decoding speed constituted a bottleneck, resulting in some analysis algorithms being starved for data. A concern that was raised during the work was frame latency, related to the data transfer. However, the algorithms chosen for acceleration were suitable for adapting to the massively parallel architecture of the GPU, and no major obstacles were met in the implementation work.
The ADABTS Demonstrator System (ADS), described by the WP4 system specification was built and adopted for a live demonstration of the achievements in the ADABTS project. The system was designed as a network of independent slaves, communicating over Ethernet. Images, video as well as audio and alarms were transferred using suitably defined formats, see Figure 3.
Figure 3. Graphical illustration of the ADS network
In summary, three important challenges were highlighted that should be considered when working with image processing and analysis algorithms on GPUs:
• Most image processing and analysis algorithms will usually run slower than optimal simply because data transfer is a major hurdle. This is especially important when analyzing video streams. In order to manage streaming of data on a practical system without saturating the network, compression is fundamental, and thus the bottleneck will be in decompression/ decoding.
• Both in literature and in this work, latency is identified as an issue. While computing power is abundant related to data availability and the algorithms implemented, inevitable buffering (especially in frame decoding) leads to rapidly growing delays. As seen in our system, latencies can reach one second or more. The latency in ADS is mainly due to the algorithms running on the GPU - it is essentially encoder delay in the camera end and the delay in decoding. Approximately 70-80% of the latency of the overall system to SINTEF located on decoding.
• Algorithmically, many image processing and analysis tasks tend to perform (usually necessary) normalizations or non-local weightings or comparisons. While some of these can be solved by standard GPU patterns, other algorithms will quickly outgrow the size of the on chip (registers and shared) memory on the GPU forcing the kernels to access the much slower global memory.
Summary of major achievements from WP6:
• Recommendations and guidelines for rapid GPU data processing
o Software architecture for real-time high performance video analysis
• The ADS (ADABTS Demonstrator System)
• Additional selective blurring functionality
WP6 was completed in 2013 and comprised the deliverables
• D6.1 Recommendations and guidelines for image processing on heterogeneous hardware
• D6.2 System software and hardware description
2.6 WP7 SYSTEM EVALUATION
The main objective of WP7“System evaluation” was to assess operational effectiveness of the overall system i.e. the sensor system, the algorithms and the user interface. The work was divided in design and acquisition of validation data, system performance evaluation and finished by an end user evaluation. The evaluation work was performed both objectively and subjectively.
For the objective evaluation, the system’s performance in terms of detection capability was quantitatively compared to ground truth data. For the subjective evaluation, a qualitative evaluation of the overall system was performed determining the added value of the system and the implications on the organization as perceived by the end-users. The data of the second measurement campaign at the Kyocera football stadium was used for this purpose.
An important step towards practical implementation of these techniques was the implementation of various low-level techniques (volume carving, camera synchronization, flow estimation, tracking, and head detection) on GPUs allowing for real-time processing. ADABTS developed a single camera based technique using perspective and appearance models which turn out to perform well in not too dense crowds. In order to also cope with moderately dense crowds several 3D detection and tracking have been developed using multiple overlapping cameras. This multi-camera approach allows for tracking in scenarios with moderate/high person density and dynamic backgrounds, outperforming single camera solutions. The evaluation shows that this tracker works well in moderately dense environments, especially with targets in small groups. Human pose estimation, which has a range of applications (e.g. detecting interactions between people, detecting the focus of interest of a group), is found to work reasonably well for scenes with low and medium person densities, and that performance degrades when using a single camera or in scenes with high person densities.
ADABTS developed various support tools that use audio. By using a microphone array instead of a single microphone allows to selectively enhance sounds coming from a certain direction and to detect and visualize where a sound came from. The spatial resolution of the sound direction is limited and depends on the reverberance of the environment. Methods to combine sound and video (e.g. tracking information) depend largely on the performance of the underlying (low-level) algorithms. Methods for detecting abnormal sounds and classification of specific sounds (screams, gun shots, breaking glass) rely on precise tuning of the parameters and are impaired by high levels of background noise.
As an example of attention based steering of sensors a Pan-Tilt-Zoom cameras was implemented in combination with sound source localization, which was shown to work in the live demo during the Final Demonstration Day. The method for detecting aggressive behaviour based on audio and video from multiple overlapping cameras can distinguish calm situations from rowdy situations, but has difficulties in classifying more subtle cases and can (currently) only detect overt aggressive behaviour taking place. Another method for detecting abnormal behaviour using flow patterns in single camera views was demonstrated to work during the live ADABTS demonstration. Evaluation of the track-based anomaly detection showed that tracks identified as most unusual corresponded to the anomalous tracks.
Two datasets recorded in the main entrance hall of football club ADO Den Haag´s home arena (See Figure 4) in October 2010 and June 2012, provided a common test bed for the algorithms developed. The dataset contained up to 30 people moving around in several scenarios tagged “normal”, “terrorist”, “pick-pocket” and “aggression”. Video data was recorded by a dozen cameras from various viewpoints. Audio was recorded using two acoustic arrays.
Figure 4. The Kyocera Stadium in the Hague, and its entrance hall.
For the subjective evaluation, the different functionalities were qualitatively evaluated by a number of expert end-users using an online questionnaire using demonstration material to introduce the functionalities. The different functionalities developed within ADABTS were generally regarded as useful by many of the participants. Visual person detection & tracking, directional audio, combining audio & video, attention driven Pan-Tilt-Zoom camera and the use of maps received a (relatively) high overall rating of usefulness. Sound classification, abnormal movement pattern detection, suspicious track detection and alarm handling got a moderate overall rating, while body orientation estimation, aggressive behaviour detection and privacy enhancing features received a relatively low rating. A somewhat lower rating may be related to lower familiarity and/or lack of trust in the maturity of it. It may also indicate that the functionality is applicable to a specialized market. Many of the functionalities concerns were raised with regard to the maturity of the (especially more advanced) functionalities, false alarm rates, presentation of information and affordability. Often, the participants expected the false alarm rate to be high, partly due to the nature of the task of discriminating threatening behaviours from normal behaviours, which can be very similar, and strongly depend on the context and cultural settings. The participants stated to be willing to install various of the functionalities (especially in high risk/high priority settings), provided there was a strong business case in favour of it, despite that various functionalities rely on the use of specialized equipment (multiple overlapping cameras, microphone arrays), which allow one to deal with the challenging context of moderately dense crowds (by using 3-D reconstruction and localization of persons and sounds).
Summary of major results from WP7:
• Objective evaluation ADS
o Performance analysis of ADS detecting, tracking and classifying abnormal video and acoustic events. In general a good performance of the system was observed in vast majority of the cases tested, which can be regarded as a sign of stability of the system.
• Subjective evaluation ADS (End user evaluation results)
o Functionalities with (relatively) high end user rating were: visual person detection & tracking, directional audio, combining audio & video, attention driven Pan-Tilt-Zoom camera and the use of maps.
o Functionalities with medium rating are: sound classification, abnormal movement pattern detection, suspicious track detection and alarm handling.
o Functionalities with (relatively) low ratings are: body orientation estimation, aggressive behaviour detection, privacy enhancing features.
• A selection of techniques was shown to work in real-time in a complete system context (ADS).
• The users expressed interest in the techniques that were developed and see the possibilities.
• In many cases they doubt the maturity of the (especially more advanced) techniques.
• The techniques may be advanced by investigating their adaptation to real-live settings.
WP7 was completed in 2013 and comprised the deliverable
• D7.1 System evaluation
2.7 WP8 DISSEMINATION AND EXPLOITATION
WP8 was concerned with all project activities aiming at disseminating and exploiting research results of the developed technologies. This work package included scientific dissemination through publication at workshops, conferences and in journals. The Dissemination included promotional activities and was planned as a process providing information to stakeholders in regard to the project aims, objectives, developments and results. Dissemination activities took place from the very beginning until the end of the project.
Furthermore, IPR management and exploitation plans for commercialization were handled in this work package. The final system demonstration was also part of WP8. To profile the ADABTS project as a brand, the consortium partners behind it and the results derived from the project on a running basis, first and foremost to raise awareness among all potential stakeholders, but also to ensure a smooth and affirmative relationship with all known stakeholders, certify effective and timely spreading of information to identified target groups, and to uphold a positive project image with the general public.
Dissemination activities concerned transferring results into formats appropriate for industrial or commercial application in research activities, creating products and/or services. Based on the scope of the project, the overall mission of the dissemination was in general, at every contact with the public, media and stakeholders the partners were encouraged to take care of the opportunities to spread the project concept and objectives. During the project lifetime ADABTS has communicated results to various kinds of identified actors: Security stakeholders like European and national authorities, police organisations, event organizers, security system operators and security service companies; security system integrators; technology developers and finally the research communities for psychology, human factors, and signal processing community.
The Dissemination Plan handled the activities for an effective promotion of the ADABTS project outcomes. The document describes the planned dissemination strategies, channels and actions of the ADABTS project. More specifically dissemination activities included promotional activities planned as a process of providing information to stakeholders in regard to the project aims, objectives, developments and results. The project website (www.adabts-fp7.eu) shows the latest results and the advancements of the project.
In order to demonstrate the outcome of ADABTS a final demonstration was conducted at the Kyocera stadium in The Hague in the Netherlands in June 2013. The demonstration comprised real time detection of a selected set of events based on staged scenes that involved a limited number of people, as well as an offline (not real time) demonstration of more advanced detection capabilities.
The overall objective of the final demonstration was to gather European stakeholders, ranging from operators, end-users, system integrators, manufacturers, consultants, policy makers, political stakeholders and academia, for an in-depth presentation, demonstration and discussion of ADABTS’ results. The final demonstration was held in June 2013 in order to give the consortium enough time to absorb feedback from final demonstration participants into the project’s evaluation and final deliverables. The ADABTS final demonstration presented the project’s scope, approach and major results followed by in-depth presentations and both offline and online demonstrations of functionalities.
These activities included to;
• Present scientific results from the ADABTS project
• Disseminate findings to scientific stakeholders, thus increasing their potential impact on future research in the EU
• Disseminate findings to industrial stakeholders, thus increasing their potential for successful deployment and adoption in the commercial marketplace
• Collect feedback from diverse sources such as operators, end-users, system integrators, manufacturers, consultants, policy makers, political stakeholders and academia - if not directly at the final demonstration, then at least get visitors to register for evaluation survey
• Build upon presentations and demonstrations and integrate feedback into the final deliveries of the project, especially deliverable 7.1 System Evaluation which will contain the survey
The final demonstration gathered European stakeholders, ranging from operators, end-users, system integrators, manufacturers, consultants, policy makers, political stakeholders and academia, for an in-depth presentation and discussion of ADABTS’ results. The turnout was good, with good diversification in types of stakeholders. The final demonstration day was built up to give a general overview of the progress leading to the actual functionalities described above. The demonstration was finalized with a panel discussion to collect feedback on the presentations and demonstrations. The panel discussion mainly concerned the ethical aspects of the project as well as results versus current state of the art and was finished with an outlook on future adoption of developed technologies.
Exploitation activities concerned transferring results into formats appropriate for industrial or commercial application in research activities, creating products and/or services. Exploitation included activities of transferring results into formats (reports, algorithms etc.) appropriate for industrial or commercial application in research activities, creating products and/or services.
An Exploitation Plan was developed using a model that divides different stakeholder actors into categories and suggests that different project members have different motivation for engaging in exploitation activities. The Exploitation Plan presents reasoning for following certain strategies and the potential impact of following these strategies, as well as a more detailed plan on what information ADABTS need in order to decide what exploitation activities to execute. Stakeholders were divided into industrial and scientific stakeholders who will influence the type and target audience of an activity, see Figure 5.
Furthermore, exploitation activities were divided into types depending on the result achieved and the time horizon for implementation. So-called technical improvements will have a relatively direct impact within a short time frame, while activities that have a long-term impact are being called strategic guidelines.
Figure 5. The Exploitation Activities model
ADABTS made a division between exploitation results that were made internally among actors in the project and externally to stakeholders. The Exploitation plan pointed out the planned exploitation activities for each partner within the consortium, and also presents categories of stakeholders and expected impact on these if objectives from the project are obtained. The categories of stakeholders were further specified and explored throughout the project period.
The industrial stakeholders were mainly focusing their exploitation activities on improving their current product portfolio and/or services, and will try to position themselves strongly in existing markets, while also looking to create and prepare for new markets with the focus of securing a leadership in these new markets.
The exploitation activities and goals of scientific stakeholders (i.e. universities and research Institutes) were different, yet complementary, to those of industrial partners. Scientific stakeholders seek to raise the quality of their own institution to attract highly qualified personnel and students, teach and educate own findings, collaborate with technology users to prepare them for new markets, and influence future research fields.ADABTS will rely on scientific ADABTS partners to disseminate findings in the project to scientific stakeholders so that these will have an impact on future research in the EU. ADABTS expect several scientific project partners to pursue new projects building upon results from ADABTS.
The Exploitation Plan also addressed the guidelines set up on exploitation of results and intellectual property rights. Any knowledge developed in the project that has potential industrial and/or commercial applications will be protected with due regard to the legitimate interests of the partners concerned.
The success or failure of industrial exploitation activities is dependent on proper monitoring of the state of the art in the market. The Exploitation Plan listed a number of topics that will form the basis for an early commercialization strategy at the end of the project, and will also act as a reference for general input needed from each consortium member to make the best possible strategic choices for commercialization.
During the project press releases has been written in such a form that general media and security industry media in particular will take interest in it. The intention was that the press releases would be picked up by leading industry media agencies, increasing the potential for successful deployment and adoption of technology in the security industry. Industrial partners, Detec and BAE Systems, will use ADABTS participation in advertising and highlight the successful demonstration in pursuing increased sales revenue.
Exploitation activities Industrial partners:
DETEC will make direct use of the results from the ADABTS project by extending their current system with the new solutions and new functionalities that were developed in the project. More advanced high security applications is also an important market segment for Detec, with for instance surveillance of central governmental buildings, harbors, prisons, and homes of people needing special protection. Detec’s strategy is to move more heavily into this segment, and ADABTS is a direct step to achieve such a strategy. Detec will, in addition to the tests performed within ADABTS, conduct demonstrations and evaluation tests at its existing and potential new customers and distributors. This will be part of Detec’s business strategy that aims at broadening its market segments and distribution network.
DETEC Main short-term activities
• Launch of new platform «Detec Next» entry level model
• Beta-version of new platform «Detec Next» with integrated ADABTS master – ADABTS Demonstration System
• Commercial launch of full version «Detec Next» with video content analysis and alarm management, fusing ADABTS results with other projects’ results
• Launch of a Software Developer Kit (SDK) and Educational License Software
DETEC Main long-term activities
• Further specialization within the field of Video Content Analysis (VCA)
• Explore sensor-fusion that correlates well with VCA scenarios
• Emphasize further development and implementation on needs identified in WP7 End-User Evaluation
DETEC Other long-term activities
• Tailor project results to other surveillance needs and possibly new projects using video as the main sensor
• Simplify installation of solutions through obtained automation and HMI knowledge
• Continue strategy on focusing on low cost heterogeneous hardware to obtain high performance-to-cost ratio
BAE will exploit the results of the ADABTS project in a number of ways. Firstly, it will directly exploit the project results through its Homelands Defence business, for example to enhance the capability of border patrol systems now being marketed by the company. More widely, the research ideas generated in ADABTS will also provide an important contribution into the surveillance capability being developed by the company for a range of security applications.
The results from the project will also be used by BAE to improve its technical expertise reinforcing the already available experience in the security surveillance and associated fields. The ADABTS knowledge will bring to BAE the possibility to provide the needed integration services and added value surveillance systems to their core customers, to explore new markets, and to initiate joint ventures with complementary companies. Protection of crowded areas is a significant area across the National Security sectors and the project’s focus on Large Scale Events is highly relevant to potential National Security stakeholders.
BAE Systems will ensure that the project can be presented to potential users of the system: public administration, private businesses, at national or international level. BAE Systems will disseminate the project results internally to Business Units that could benefit from the technologies being studied and developed within ADABTS, and externally at national and international surveillance conferences.
BAE Main short-term activities
• Evaluation on potential company applications for the ADABTS outputs;
• SAD Filter,
• Suspicious Behaviour Detector and
• Multi-Camera Tracking
• Development of a prototype product called pTrack
• Incorporates both ADABTS’ and other research projects’ technology
BAE Main long-term activities
• Improve technical expertise, reinforcing experience.
• The results of ADABTS are of direct interest to a number of the different entities (sister companies) of BAE exploring areas as:
• Universal Video Management System and their tactical sensor management systems
• Target recognition capability for future UAV/UCAV
• Security technology for mass transportation systems
• Intelligent systems for the global security market
• New markets
Exploitation activities scientific partners:
FOI: The mission of FOI is to perform applied research in order to enhance the security of society. Thus, FOI will strive to transfer the results of the project to the security stakeholders in its network, including but not limited to the type of security consumers directly involved in the project (football stadium and international airport). Also included are critical infrastructures like the national power grid, power plants (nuclear and hydro), ports, and military installations.
FOI Main short-term activities:
• New research project focusing on threat detection for protecting power plants. (3D detection & tracking, attention driven pan-tilt-zoom, aggressive behaviour detection, abnormal movement pattern detection, suspicious track detection and privacy enhancement features will be used)
• Articles on event detection emerging from ADABTS are expected.
• Arena security development
FOI main Long-term activities:
FOI will exploit results by its own marketing channels, i.e. through its close contacts with Swedish and European defence and security industry. FOI undertakes research and development assignments from government as well as from private industry, and aims through this project to strengthen its market position as a provider of new knowledge and new technology.
FOI has, naturally, a strong collaboration with the Swedish Armed Forces, who is funding FOI’s participation in ADABTS through its Research & Development Programme. The SwAF’s main interest is in evaluating scenarios and possibilities for ADABTS technology to be used in peace keeping operations (force protection, camp protection, and urban surveillance), thus also being a potential customer of the industrial partners in ADABTS.
SINTEF has a history of more than fifteen years collaboration with Detec A/S on developing automatic video surveillance systems. It is expected that this collaboration will continue also in the future, and that this will directly build on the new knowledge, methods and technology developed in this project.
As an independent research organization SINTEF is working in a wide specter of application areas for video, acoustic and other sensing technologies (www.sintef.no\omd). SINTEF has also, since 2003 when programmable GPUs first became available, been exploring the possibilities that lies in this technology for general purpose computing (www.sintef.no\gpgpu).
SINTEF main short-term activities:
Results from this project will for certain be utilized in other application areas, such as
• Real time monitoring of fish in fish farming facilities (within an RTD project running from 2007 to 2017)
• Detection of unwanted traffic and floating objects in the surroundings of offshore oil installations.
• Security and safety monitoring onboard unmanned offshore installations
• Monitoring harbors and airports in remote locations in Norway
• Heterogeneous hardware used for processing of large 3D images, e.g. seismic data for oil exploration and medical images.
Main short-term activities in collaboration with Detec AS:
• New research project focusing on onshore/offshore oil installation safety
• 3D detection & tracking, object orientation estimation, object characteristics, prioritized alarm handling, use of maps, use of thermal cameras and GPU implementation will be used
• Project application in 2013 – earliest start in spring 2014 – estimated project end 2017
SINTEF Long-term activities:
• SINTEF aim to attract new assignments in applications for video and other sensors, as well as further exploring GPU possibilities
TNO will disseminate research results obtained within the ADABTS project through scientific publications in leading international conference and journals. Furthermore, TNO will exploit the results of the project in the design of an operator support system that can be integrated in the novel crowd control system of the new ADO football stadium (The Hague, Netherlands), and to design proactive surveillance systems for the Dutch police force and the Dutch Armed Forces.
TNO Main short-term activities:
• Expert knowledge + eye movements
• Implement expert knowledge in (operator) training programs
• Apply method of eye-movement recording for expert knowledge elicitation to other areas
• Design an operator support system that can be integrated at ADO football stadium
• Sound processing
• Exploit real-time beam forming in collaboration with industrial partners (e.g. AVEQ)
• Explore market for attentive hearing (PTZ steering by sound)
• Real-time incident detection implementation
TNO Long-term activities:
• Design proactive multi-sensor surveillance systems for the Dutch police force and the Dutch Armed Forces
UvA will disseminate research results obtained within the ADABTS project through scientific publications in leading international conference and journals, as described earlier in this section. UvA will furthermore pursue activities that relate to benchmarking of abnormal behaviour detection (test procedures, algorithm performance on common datasets) in wider EU context.
UvA main short-term activities:
• At least 4 publications planned – ongoing submissions
• One Master of Science Thesis for Mihai Morariu
• Two Ph.D.s for Martijn Liem and Julian Kooij
• Expected to be finalized in 2014
• Focus on;
• 3D detection & tracking, combining audio and video sensors, body orientation estimation, aggressive behaviour detection, abnormal movement pattern detection
Exploitation activities Internal reference group:
HOSDB(CAST) and IPMI
It is expected that ADABTS partners representing end-user interests and setting requirements, CAST and IPMI, will support ADABTS partners in the future by;
• Helping to enforce new policies
• New research project participation
• Product evaluation and testing/benchmarking
• Distribution/marketing aid
In conclusion, the exploitation activities were undertaken by all partners, each with its specific interests and missions aiming at different stakeholders. Based on the results from the project, all partners are expected to exploit results, creating a competitive advantage and new market opportunities for themselves. What kind of strategies that will be followed, and what impact they will have, heavily depends on the derived results from the project. Given the structure of the consortium, unique synergy effects and close cooperation between industrial and scientific actors are expected, enabling exploitation of the project results at different levels covering all kinds of stakeholders including RTD policy makers and RTD organizations, system suppliers and integrators, service providers, end-users, standardization and legislation makers and the European society as a whole.
Summary of major achievements from WP8 Dissemination and Exploitation:
• Provided information to potentially interested parties and the public about the goals, work and achievements of the ADABTS project e.g. the ADABTS Final demonstration.
• Raised awareness about the benefits of the targeted applications.
• Facilitated the development of intelligent surveillance also outside the consortium by providing an ADABTS data set.
• Final demonstration of the ADABTS demonstrator system.
WP8 was completed in 2013 and comprised the deliverables
• D8.1 Public website
• D8.2 Dissemination plan
• D8.3 Exploitation plan
• D8.4 Final demonstration
Potential Impact:
3 POTENTIAL IMPACT FROM THE PROJECT
To improve the safety and security in everyday life for all European citizens’, research and development of new methods for surveillance is a prerequisite. Automatic interpretation and recognition of different human behaviour is an important key for future safe and secure surveillance systems. ADABTS has developed such supporting technology with applications, helping people to live a normal life, as well as for supporting society in securing safety in public places. At the same time personal integrity is of crucial importance and surveillance data shall only be available for after an alarm, and not distributed at all times. Unattended surveillance seeks the right balance between detection of undesirable situations and violation of the private life of citizens. The ADABTS project provides knowledge about how to handle these issues as well. When, as a result of this project, new products with new surveillance capability eventually arrive on the market, this will have an impact on most aspects related to security in the European society and industry.
Impacts for the European society, human factors, and societal implications
The need for protection of the European citizens and critical infrastructure against deliberate crime, terrorism and riot actions is increasing and has resulted in extensive use of video and other surveillance systems in public areas. The deployment of such systems in itself raises several ethical issues with respect to civil liberties and citizens’ rights of privacy. Systems that automatically monitor people’s behaviour and detects and alerts about abnormal behaviour may have additional issues that need to be considered, but may also eliminate or diminish some other issues related to manual monitoring of people. The ADABTS project is expected to have the following impacts related to the European society, human factors, and societal implications:
• Higher level of security. Improved protection of citizens and critical infrastructure against deliberate crime, terrorism and riot actions by earlier and more reliable detection and alerting of critical events to the police or security guards. The earlier and more reliable detection is due to the possibilities that automatic detection gives for continuous monitoring of all cameras and other sensors in a surveillance area. With manual operation one operator has to focus on several screens simultaneously, often with a switching between several cameras on each monitor. It is thus not possible to obtain a 100% surveillance of all cameras and acoustic sensors 24 hours a day. Today many cameras are not manned at all. They only record for post analysis giving a false sense of security.
• Higher security per cost of operation. Automatic systems require much less personnel for achieving the same level of security as compared to manual surveillance. The cost of operation is thus reduced.
• Better compliance with civil liberties and citizens’ rights of privacy. With automatic surveillance individuals are in principle not observed by personnel until an abnormal and unwanted event is detected. The public may thus feel less observed and thus be more comfortable with knowing that the surveillance system is based on automatic detection. On the other hand one can argue that a system for automatic abnormal behaviour detection will put more focus and attention to people that behave differently, such as people that are out partying, singing or jogging. It can for instance be a questionable ethical issue if a person is detected and footage stored in a database of suspicious events every time she or he is jogging in the park.
• Objective to gender, ethnic and cultural belonging. An automatic system is (or can be made) objective to gender, ethnic and cultural belonging. People belonging to a minority or a “more frequently harassed” groups will feel more at ease with trusting that an automatic system is objective and treats all people as equal since it is harmful behaviours rather than appearances that is detected. Hence, an automatic surveillance system can possibly get higher acceptance among those groups.
Impact on policy, standardization and legislation makers
The introduction of automatic detection in surveillance systems introduces several new issues compared to manual surveillance. In particular when doing surveillance of public areas for detection of abnormal behaviour this raises several new issues that may have impact on policy and legislation makers:
• Storage and usage of footage from detected events. Assuming that the objective of efficient detection of abnormal behaviour is achieved, then, the system will collect a large number of abnormal events that are not necessarily security critical. These events are stored in a database for fast and easy retrieval. The current legislation of footage storage and usage of such material may thus not suffice for automatic systems.
• Usage of automatic detection. With the new functionality that abnormal behaviour detection introduces in surveillance such systems will probably be relevant for use in many areas and applications where such surveillance was earlier not feasible, efficient or economical. This may imply that policy and legislation makers need to reconsider practices for how, where and when such systems are to be used.
• User needs and definitions of abnormal behaviour. The project has aggregated user needs related to abnormal behaviour detection from a diversity of European stakeholders and application areas. This, together with the definitions of various abnormal behaviour categories, will be valuable input to standardization authorities and system developers.
Impact on end users and service providers
If the project objective of developing technology that enables automatic detection of abnormal behaviour is accomplished this will obviously have impact on the operations for end users and service providers:
• New capability. Today’s automatic surveillance systems have only limited capability for discriminating between “normal” and “threatening abnormal behaviour” of humans in open spaces and in crowds. We realize that this is a very demanding task, but still we aim at developing technology in this project that advances such capability a significant step. This means that service providers and end users will be able to provide new services in public areas where much people move, but only certain behaviours are of interest. This is of particular relevance for large scale events where special high security levels are required, but can also be used on a regular basis around critical infrastructures such as mass transportation terminals or government buildings.
• New organisation of operations and personnel. New technology of this kind will most likely have impact on how the security service is organised and operated. In particular it will have impact on how to set up the surveillance system with number and placement of sensors and layout and organisation in the control centre. But it may also affect the deployment of personnel, the commands structure, etc. It is thus anticipated that such technology need to be gradually introduced and that both the technology and adjustments to operational modes need to be verified over time. (Most likely this will happen continuously as new technological capability is gradually the introduced into the market).
• New competence required. The introduction of new technology of this kind will require new competences among the users, both in how to operate the system and in it possibilities and limitations in practical use. A training programme for new surveillance applications is thus expected among the end users.
Impact on system suppliers and integrators
One of the main results from this project is the development of new methods and technology for automatic abnormal behaviour detection. This is of course of major interest for system suppliers and integrators, and will be exploited by the industrial partners of the consortium. However, some results are of more general interest to the industry, and will be made available also outside the consortium:
• User needs. The output from WP2: “User needs” will most certainly contain confidential information. But it is expected that some generic elements can be extracted from this work that is of general interest both for the industry, the RTD community and the security community in general.
• Evaluation report. The system evaluation report will be made publicly available, and this will certainly be of interest for the industry, the RTD community and the security community in general.
• New system capabilities. Results from this project will be exploited and implemented in commercial products by the industrial consortium partners.
• A software developer kit. Detec have made a software developer kit (SDK), which system integrators can use in order to implement key features resulting from this project.
Impact on RTD policy makers and RTD organisations
The project expects to advance the RTD frontiers in several respects:
• User needs. The generic, non-classified results from WP3: “Behaviour definition” will give interesting information to the RTD policy makers on where the European stakeholders want the technological and operational development to go. It will also be valuable input to RTD organisation on where more RTD is needed.
• Evaluation report. The system evaluation report will be made publicly available. This will show where the frontiers on capability and technology are today, and represent a benchmark for other projects.
• Methodology and algorithms for abnormal behaviour detection. General results on methods and algorithms has been published in articles and at conferences.
• Heterogeneous CPU/GPU hardware for image processing. Several RTD organisations are today starting to pursue the possibilities that lie in this rapidly developing technology for general purpose processing and real-time image processing in particular. The general methodology, recommendations and experience gained in this project will be published in articles and conferences and be beneficial for future utilization of such hardware.
Strategic impact:
The main impact of the ADABTS project is expected to be mainly on the technological level, advancing the capability and application range for automatic surveillance systems used for detecting abnormal and threatening behaviour of crowds or individuals in crowds. These advancements will be in many directions:
• New understanding of the user needs for automatic detection of abnormal behaviour in crowds, and new definitions of and methods for describing such behaviour. This knowledge can also be applied to design new operator training tools.
• New and adapted methods and algorithms for abnormal behaviour detection based on video and acoustic sensors.
• Real time optimization for commercially available low-cost heterogeneous hardware architectures that integrates CPUs and GPUs (Graphical Processing Units).
• Legal and Ethical considerations on new in surveillance applicability.
This technological development will give the end users in the project (system suppliers and integrators) a competitive advantage and create new market opportunities.
List of Websites:
3.1 THE PROJECT PUBLIC WEBSITE
ADABTS public website is located at www.adabts-fp7.eu and provides a description of the project and a brief presentation of the consortium. Furthermore the main results/foreground is presented.
3.1.1 Logo
To create a strong image of the project easy to recognize, a logo was early designed. This logo was used for all ADABTS reports and presentations.
Contact information to the Coordinator:
FOI (Swedish Defence Research Agency)
58111 Linköping, Sweden
Project manager
Henrik Allberg
Tel: +46 13 378 162
Fax: +46 8 555 031 00
henrik.allberg@foi.se