Periodic Reporting for period 2 - TOREADOR (TrustwOrthy model-awaRE Analytics Data platfORm)
Reporting period: 2017-07-01 to 2018-12-31
In the last few years, European enterprises of all sizes have developed a deep awareness of the opportunities of leveraging BDA within their organizations, but many of them now realize that lack of information and skills may lead to disappointments and even costly failures. Some other major hindering factors to their adoption are the difficulty to evaluate whether BDA bring real advantages and the so-called "regulatory barrier", that is, the concerns about possible violations of data access sharing and custody regulations when using BDA, with the high cost of obtaining legal clearance. This is discouraging companies, particularly SMEs, from taking over BDA.
TOREADOR tackled these issues by using declarative models to streamline Big Data Analysis processes. To achieve this, TOREADOR carried out a research programme with the following scientific and technological objectives: (i) specification of a fully declarative framework and a model set supporting Big Data analytics, (ii) development of Big Data Analytics as-a-service approach supporting Big Data adoption, (iii) development of SLA and assurance approaches to guarantee contractual quality, performance, and security of BDA, (iv) support of legal and compliance aspects, (v) design and development of automatic deployment of TOREADOR analytic solutions, (vi) pilot-based development and validation of the technical soundness, deployment feasibility, and industrial applicability of the TOREADOR infrastructure.
Then the project consortium worked on TOREADOR model-based methodology and vocabulary. The following solutions have been provided or specified: a preliminary vocabulary and syntax for defining declarative, procedural, and deployment models at the basis of the Big Data Analytics-as-a-Service (BDaaS) framework, a methodology specifying early mechanisms for models design and development, a first Model Driven Architecture (MDA)-based approach to model transformation, supporting BDaaS.
TOREADOR has also defined ready-to-be-executed deployment models that include all executable artefacts for the semi-automatic deployment of TOREADOR’s customer Big Data campaigns and has showed how a procedural model can be transformed into a vendor-specific deployment model by means of a binding process. TOREADOR has also described two complete case studies based on the clickstream analysis pilot and the security log pilot.
Several project meetings were held to consolidate the outcomes of the project.
The outreach activities during this period consisted of participation and presentations in conferences and meetings with the scientific community, with the policy makers and with the industry. The general public has also been reached through coverage in newspapers. News have been broadcasted in Web channels. Furthermore, newsletters have been published as a dissemination tool for the project activities.
As regards the legal aspects, a first general legal framework and quick reference guide were delivered at M6 to provide an overview of the major legal issues to be considered throughout the TOREADOR project by all the project partners, WPs and pilots. At M12, TOREADOR produced a comprehensive deliverable looking into the ownership and intellectual property aspects of data management in a big data context, the results thereof not only aimed at the partners of the TOREADOR Project but also at policymakers to provide additional evidence on the emerging issues of data ownership. Finally, at M18, the privacy and security aspects of data management in a big data context were examined in a final legal research deliverable. Said deliverable is not only aimed at the TOREADOR consortium members but can also provide useful insights to anyone working on similar big data projects. A first audit deliverable was delivered at M6, providing a preliminary review of the legal management and compliance of the project pilots in light of the Legal Aspects Specifications to be drafted for each Pilot at M12. At M12, a second audit deliverable was provided, focusing on the legal management and compliance of the TOREADOR architecture. A third audit deliverable was delivered at M18, providing an overview of the replies to public consultations given on behalf of TOREADOR, as well as the main legal issues related to data ownership, privacy and security, which are to be covered in legal SLAs and contractual arrangements.
In the second reporting period, we mainly focused on the implementation of the TOREADOR framework (GUI + platform) and on the validation of the latter on internal and external pilots and use cases. In particular, we implemented one complete workflow for each of the pilots evaluating our methodology in two different instantiations: service-based and code-based. Two of them (Energy Production Data Analysis and Clickstream Analysis Pilots) have then been deployed on the TOREADOR platform, while the remaining two (Application Log Analysis and Aerospace Products Manufacturing Analysis Pilots) have been executed on premises of the pilot partners for privacy reasons. In addition, some external pilots and industrial use cases have been conducted to test our approaches outside the TOREADOR consortium. As an example, we considered an infrastructure for pollution monitoring managed by Lombardia Informatica, the main ICT agency of the Lombardy region in Italy, to the aim of defining and deploying a Decision Support System (DSS) for pollution data labeling.
To ensure its industrial applicability, the TOREADOR platform was evaluated in reference to the pilot scenarios involved in the project. The delivery of positive and useful experience, as assessed by the industrial partners of the consortium, testifies that they benefit from integrating MBDAaaS in their decision-making.
TOREADOR is one of the first attempts to streamline BDA processes through declarative procedures.
TOREADOR contributes to the state of the art along three lines. First, our methodology supports both stream and batch computations. The computations are implemented as service compositions, where connectors automatically guarantee correct and consistent data flowing between services. Second, building on service connectors, our methodology implements smart compilers supporting the deployment of interconnected computations residing on different platforms. Finally, our methodology supports end-to-end verification of specifications (from declarative to deployment models).
We also provided an assessment of the benefits of adopting a model-driven approach, providing a quantitative evaluation of the effort saved in terms of software development.
We first measured the average management effort (in person months – PM) required by our 4 pilots plus the Lombardia Informatica’s scenario.