CAse studies on the Development and Application of in-Silico Techniques for Environmental hazard and Risk assessment

Final Report Summary - CADASTER (Case studies on the Development and Application of in-Silico Techniques for Environmental hazard and Risk assessment)

Executive summary:

Authorization and restriction of chemicals (REACH) requires demonstration of the safe manufacture of chemicals and their safe use throughout the supply chain. REACH is based on the precautionary principle, but aims to achieve a proper balance between societal, economic and environmental objectives. Both new and existing chemicals will be evaluated within REACH, amongst others aiming to efficiently use the scarce and scattered information available on environmental fate and effects of chemicals. REACH thus aims at closing huge gaps of knowledge on physicochemical properties and adverse effects of large numbers of chemicals. Thereupon REACH aims to reduce animal testing by optimized use of qualitative and quantitative information on related compounds. The REACH proposals advocate the use of non-animal testing methods, but guidance is needed on how these methods should be used.

Project Context and Objectives:

Authorization and restriction of chemicals (REACH) requires demonstration of the safe manufacture of chemicals and their safe use throughout the supply chain. REACH is based on the precautionary principle, but aims to achieve a proper balance between societal, economic and environmental objectives. Both new and existing chemicals will be evaluated within REACH, amongst others aiming to efficiently use the scarce and scattered information available on environmental fate and effects of chemicals. REACH thus aims at closing huge gaps of knowledge on physicochemical properties and adverse effects of large numbers of chemicals. Thereupon REACH aims to reduce animal testing by optimized use of qualitative and quantitative information on related compounds.

The main goal of CADASTER is to exemplify the integration of information, models and strategies for carrying out safety-, hazard- and risk assessments for large numbers of substances to the new categories of risk assessors within REACH. Real risk estimates are delivered according to the basic philosophy of REACH of minimizing animal testing, costs, and time. CADASTER thus shows how to increase the use of non-testing information for regulatory decision whilst meeting the main challenge of quantifying and reducing the level of uncertainty. By fusing the research findings with other ongoing research and regulatory developments, recommendations on a viable management strategy for optimized testing and in-silico modeling of hazardous organic chemicals, are provided. The focus of the activities was on assessing and quantifying uncertainty and variability in probabilistic risk assessment, as introduced by the use of non-testing information.

To achieve the main goals set for CADASTER, four objectives were identified and operationalized within four workpackages:

Objective 1 (workpackage 2): Collection of data and models
Sub-activities within this objective included:
1) Collection of existing experimental data on the most common regulatory endpoints considered in the Screening Initial Data Set Dossier (SIDS - internationally agreed data on the intrinsic hazards of a chemical) for the four classes of chemicals selected.
2) Collection of existing (Q)SARs for the endpoints considered in the SIDS for the four classes of chemicals selected.
3) Generation of new data on endpoints and chemicals for which, as identified in workpackage 3, insufficient data are available for model validation and proper hazard/risk assessment.
4) Development of a database on experimental data and (Q)SAR models for dissemination of the results of the activities 1 - 3 to all project partners and to other interested bodies.

Recommendations for future activities

- Support workshops on the assessment and consideration of uncertainty in testing alternatives with practical training.
- Support the development of more case-studies to show the impact and usefulness of considering uncertainty in data generated by means of testing alternatives, in the regulatory decision context under REACH.
- Support more research on the QSAR integrated assessments and development of user-friendly tools for uncertainty analysis and evaluation of quality assessment output with respect to quality in available background knowledge.

Project Results:

General introduction

The REACH proposals advocate the use of non-animal testing methods, but guidance is needed on how these methods should be used. As an example: the REACH system requires that non-animal methods should be used for the majority of tests in the 1-10 tonne band, even though such methods are not yet available for most of the endpoints relevant at this tonnage.

In an attempt to resolve the issue of lack of guidance, the European Commission made suggestions on how reduction, refinement and replacement strategies could be applied to animal use in the REACH system:

1 - Encouragement of the use of validated in silico techniques such as (Q)SAR models.
2 - Encouragement of the development of new in vitro test methods.
3 - Minimization of the actual numbers of animals used in the required tests, and replacement of animal tests wherever possible by alternative methods.
4 - Formation of Substance Information Exchange Forums (SIEFs) for the obligatory provision of data and cost sharing.
5 - Requirement of official sanctioning of proposals for tests for compounds with production volumes of above 100 tonnes to minimize animal testing.

General overview of activities within CADASTER

The current regulatory developments within REACH have provided important considerations in the initial design of the CADASTER project. CADASTER is aimed at exemplifying the integration of the various alternatives to experimental testing and the consequences for environmental risk assessment, explicitly considering uncertainties associated with the use of alternatives to replace experimental data. In line with its acronym, the project is designed to provide case studies on the development and application of in-silico techniques for environmental hazard and risk assessment, and to use the project results as an illustration of how to deal with the major limitations and uncertainties related to the implementation of alternatives to testing in hazard and risk assessment.

To this end, the following activities were amongst others performed:

- Collection of existing data and predictive models on the endpoints that are essential for performing hazard and risk assessment of chemicals within REACH (Aim of this activity: filling the research gap of lack of data for endpoints relevant for Environmental Risk Assessment (ERA) and for in silico model building);
- Exploring and making available of (eco)toxicological data, amongst others for the purpose of in silico model development (Aim of this activity: filling the research gap of making data available for in silico model building);
- Assessing the quality of available toxicity data (Aim of this activity: filling the research gap of low or difficult to ascertain quality of available toxicity data, and need of characterization and validation of existing toxicity data);
- Supplementing existing experimental data to allow for in silico model development for endpoints essential for risk assessment, and to allow for validation of existing and newly developed models (Aim of this activity: filling the research gap of lack of data and models for endpoints relevant for ERA and for in silico model building);
- Collection of existing alternatives to experimental testing, with a focus on QSAR models on the endpoints that are essential for performing hazard and risk assessment of chemicals within REACH (Aim of this activity: filling the research gap of lack of models for endpoints relevant for ERA);
- Development of new (QSAR) models and validation of existing and newly developed (QSAR) models, including development of consensus models (Aim of this activity: filling the research gap of lack of models for endpoints relevant for ERA);
- Implementation of tools to estimate the applicability domain of models and to optimize experimental design. This activity includes characterisation of variability and uncertainty of models and underlying, and sensitivity analysis of individual models (Aim of this activity: filling the research gap of lack of definition of the applicability domain of models);
- Development of a computational framework for QSAR based probabilistic risk assessment, including uncertainty analysis of the risk characterisation ratios (Aim of this activity: filling the research gap of need of probabilistic risk assessment);
- Development and public release of the QSPR-THESAURUS Website and associated databases containing all data and models made collected and generated within the CADASTER project (Aim of this activity: filling the research gap of lack of robustness of different database entries for toxicity, consequently leading to different QSAR models and results);
- To improve and validate individual QSAR models, and prepare standardized reporting formats for the models (like QMRF - the QSAR Model Reported Format developed by JRC and implemented in the OECD QSAR Toolbox). Aim of this activity: filling the research gap of lack of validation of individual QSAR models which are incorporated in complex existing tools such as ECOSAR);
- Perform training to risk assessors, national chemicals authorities (particular from Eastern European countries), industry and SMEs on the use of alternative tools for risks assessment in REACH, amongst others demonstrating how the tools developed within CADASTER as well as the models available in the OECD QSAR toolbox can be used to estimate REACH end-points for chemical compounds and thus decrease the number of animal tests. This activity included training on how to develop new models for the assessment of REACH-end points (in particular for new scaffolds of compounds for which there are no reliable QSAR models) and how to use the software developed by the CADASTER project participants (Aim of this activity: filling the gap of need of appropriate training and understanding of personnel, particularly of those dealing with registration dossiers);

The core of the activities was directed towards the following topics:
- Collection and generation of fate and effect data essential for risk assessment;
- Collection and development of predictive models for endpoints essential for risk assessment;
- Development of methodologies for assessment of the applicability domain of models;
- Making data and models available to any outside user via the project websites http://www.cadaster.eu and http://www.qspr-thesaurus.eu.
- Characterisation of uncertainty, variability, model sensitivity;
- Training of (future) risk assessors and outreach of project results.

WORKPACKAGE 2 COLLECTION OF DATA AND MODELS

Within WP2, various milestones and deliverables have been identified to warrant proper monitoring and timely execution of the project:

Deliverables WP2

D2.1 (Month 12) Overview of data on physicochemical properties, fate and environmental effects of chemicals within the four classes of chemicals selected (report)
D2.2 (Month 12) Overview of (Q)SAR models and their specific features for assessing fate and effects (report)
D2.3 (Month 18) Overview of non-testing approaches available for implementation in REACH (report)
D2.4 (Month 36) Establishment of a database on properties and fate/effect parameters of chemicals within the four classes of chemicals selected (report).
D2.5 (Month 38) Overview of new data generated (report).

Milestones WP2

M2.1 Prototype of a user-friendly database on properties and fate/effect parameters operational.

The work within WP2 was subdivided along 4 tasks:
Task 2.1 Collection of existing experimental data.
Task 2.2 Collection of (Q)SAR models and non-testing approaches.
Task 2.3 Generation of new data.
Task 2.4 Establishment of database.

Task 2.1 Collection of existing experimental data

D2.1 (Month 12) Overview of data on physicochemical properties, fate and environmental effects of chemicals within the four classes of chemicals selected

Overview

A data search on all endpoints of relevance was performed for the environmental risk and hazard assessment of the groups of chemicals included in the case studies. Physicochemical properties, environmental fate parameters, and aquatic and terrestrial ecological effect parameters are included, among other available toxicity data. This task was carried out by means of a literature search, supplemented with searches of existing databases on risk and hazard assessment parameters, like IUCLID, AQUIRE, etc. Thereupon, additional data were collected from industry sources and regulatory agencies (Dupont, RIFM).

Activities performed

Existing experimental data on polybrominated diphenylethers (PBDE), perfluoroalkylated substances and their transformation products, substituted musks/fragrances, and triazoles/benzotriazoles were collected from the literature and from existing databases on physico-chemical properties, environmental fate parameters, and aquatic and terrestrial ecological effects parameters. Thereupon, tools for automatic querying of on-line databases were applied, and US_EPA dockets databases with information about four related classes of chemicals (greater than 5,000 documents) were uploaded and made searchable on the web at http://www.cadaster.eu/DocSearch/ using natural language search tools. This tool should help to search relevant information in dockets without a need to open all of them.

Task 2.2 Collection of (Q)SAR models and non-testing approaches
D2.2 (Month 12) Overview of (Q)SAR models and their specific features for assessing fate and effects

Overview
A survey of the existing QSAR/QSPR models for the four CADASTER classes of chemicals was completed. The analysis of these models according to the requirements of the 'OECD principles for QSAR validation' for regulatory applicability was the topic of Deliverable 3.2. Publicly available EPI Suite models were also taken into consideration. In this case, it has been assessed, which of these models are reliably applicable to the four classes of chemicals.

Fragrances

Till now, to our knowledge, no ad hoc QSAR/QSPR models have been developed for the prediction of physicochemical properties and environmental toxicity of fragrances. Nevertheless different QSARs exist for skin sensitization, an endpoint related to human toxicity but not included in SIDS.

Perfluorinated chemicals (PFCs)

For QSPRs on SIDS physico-chemical properties, data on boiling point, Fluorophilicity - Fluorous partition coefficient are modeled (Rucker et al., 2005; Kiss et al., 2001). In addition, commercial softwares were used to derive 'polyparameter linear free energy relationships' for various end points. EPI Suite models were also considered. Their performances have been compared with those of some preliminary models developed by UI on MP, BP and VP (presented in Conferentia Chemometrica 2009, Siofok, Hungary).

Triazoles and Benzotriazoles (TAZs and BTAZs)
QSPR models specifically on TAZs and BTAZs have not been found in literature. Only logP data are modeled where few TAZs are part of the larger dataset. Regarding EPI Suite models, their predictions for MP, VP, LogKOW and WS do not show lare deviations from available experimental data. However, preliminary ad hoc QSPRs developed by UI for triazoles and benzotriazoles have RMSE values always lower than those calculated for EPI Suite models, the main exception being the LogKow model.

Task 2.2 Collection of (Q)SAR models and non-testing approaches
D2.3 (Month 18) Overview of non-testing approaches available for implementation in REACH

Overview
In this task, an overview is provided of the non-testing options given under REACH to either replace experimental testing, or to strengthen confidence in experimental results. The latter is needed as the (in general scarcely available) experimental data for specific (SIDS) endpoints and for specific chemicals, might on their own not be sufficiently convincing as a proper reflection of the actual value of specific endpoints. The non-testing options available under REACH are: Quantitative Structure Activity Relationships (QSARs), read-across, category approaches, and exposure based waiving.

Task 2.3 Generation of new data
Delivery 2.5 (Month 38) Overview of new data generated

Overview
This task was carried out by means of experimental testing of chemicals and the report provides an overview of the CADASTER testing of chemicals. New data were generated on endpoints and chemicals for which, as identified in WP3, insufficient data were available for model validation and proper hazard/risk assessment. The following testing of toxicity and fate and behaviour was performed:
1 Polybrominated diphenylethers (PBDE)
28-day sediment testing of PBDEs was performed on bioaccumulation with aquatic oligochaeta Tubifex tubifex by PHI.
2 - Perfluoroalkylated substances and their transformation products
Toxicity testing of fluorinated compounds was performed with lettuce (Lactuca sativa) and green algae (Pseudokirchneriella subcapitata) at the RIVM. Thereupon, testing was performed with two cladoceran species (Daphnia magna and Chydorus sphaericus), as well as with embryos of the zebrafish (Danio rerio), also at the RIVM.
3 Substituted musks/fragrances
Toxicity testing of fragrances was performed with green algae (Pseudokirchneriella subcapitata) and with Daphnia magna at the PHI. Substituted musks/fragrances were tested also on ready biodegradability, at the PHI.
4 - Triazoles/benzotriazoles
Toxicity testing of substituted (benzo)triazoles was performed with Daphnia magna and with embryos of the zebrafish (Danio rerio) at the RIVM. Toxicity testing of substituted (benzo)triazoles was performed with green algae (Pseudokirchneriella subcapitata) at the PHI. Substituted (benzo)triazoles were tested also on ready biodegradability, at the PHI.

Task 2.4 Development of a database on experimental data and (Q)SAR models
Delivery 2.4 (Month 36) Establishment of a database on properties and fate/effect parameters of chemicals within the four classes of chemicals selected

The CADASTER QSPR-THESAURUS database has been developed within the framework of the CADASTER project. The database is based on the On-line Chemical Modeling Environment (QSPR THESAURUS) http://www.qspr-thesaurus.eu which has been developed by Dr Tetko's group at HMGU1 and is currently being offered as a commercial software from eADMET GmbH http://www.eadmet.com. The database was further developed according to the request of the CADASTER project partners and database users. The database provides the main repository to store and handle endpoint data collected and measured during the CADASTER project.

Overview of the database structure

The front page of the QSPR THESAURUS database provides an access to the 4 classes of chemicals which are the focus of the CADASTER project. After selection of one of the classes, the user accesses the database of experimental and calculated properties. The database contains experimentally measured biological and physicochemical properties of molecules belonging to the four classes, together with the conditions under which the experiments have been conducted and references to the sources where the data were published. These data were collected or measured by CADASTER partners during the project.

WORKPACKAGE 3 DEVELOPMENT AND VALIDATION OF QSARs

Paola Gramatica1, Ester Papa1, Simona Kovarich1, Stefano Cassani1, Barun Bhhatarai1, Magnus Rahmberg2, Sara Nilsson2, Tomas Öberg 3, Ullrika Sahlin3, Igor Tetko4, Stefan Brandmaier4, Nina Jeliazkova5, Nikolay Kochev 6, Ognyan Pukalov6

1. QSAR Research Unit in Environmental Chemistry and Ecotoxicology, University of Insubria, Via J.H. Dunant 3 - 21100 Varese, Italy
2. IVL Swedish Environmental Research Institute, Box 210 60, SE- 100 31 Stockholm, Sweden
3. Linnaeus University, School of Natural Sciences, 391 82 Kalmar, Sweden.
4. Helmholtz Zentrum Muenchen - German Research Center for Environmental Health, Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany
5. IdeaConsult Ltd., 4 A.Kanchev str. Sofia 1000, Bulgaria
6. University of Plovdiv, Department of Analytical Chemistry and Computer Chemistry Plovdiv, Bulgaria

Major outcomes for WP3
The major outcomes and recommendations related to future risk assessment within the REACH framework, for WP3 activities can be summarized in the following points:
- A strong effort was made to fill in the QSPR-THESAURUS web-database, created by HMGU, chemical structures, molecular descriptors, experimental data, and to upload QSAR models developed by WP3 partners. This work was useful to improve the web-database and to adapt it to the needs of QSAR developers and public users. This work also highlighted that the lack of experimental data for the SIDS endpoints, of interest for studying the CADASTER chemicals in the environment, represents the biggest problem to develop new, ad hoc, QSARs specific for these chemicals. For this reason, the available experimental data of physicochemical properties and toxicity (i.e. beyond the SIDS endpoints) were collected and modelled, in order to support the prioritization exercises with additional information.
- The evaluation of existing QSARs according to the 'OECD principles for QSAR validation and application in regulation' demonstrated that only a small number of models for SIDS endpoints are currently available for the CADASTER chemicals. Additionally, the majority of the existing QSARs were shown to have no external validation, nor a definition of the applicability domain. Thus they do not fulfill the OECD principles for use of QSAR models in regulatory assessment of chemicals. Therefore these models are of limited utility for the specific classes of compounds studied under the project CADASTER.
- The upload of new QSAR and QSPR models in the QSPR-THESAURUS web-database was particularly challenging since their reproducibility in the database was dependent on the calculation of molecular descriptors which were derived by different WP3 partners, from the molecular structures, by different tools. The application of this procedure raised a number of issues related to the need of harmonization for structural design and SMILES writing, as well as for the calculation of molecular descriptors starting from different structural inputs and using different software. These problems have been addressed in a tutorial paper (Gramatica et al. Mol.Inform. 2012).
- The limited availability of experimental data was challenging for the procedures of external validation and evaluation of the applicability domain of the models. Different techniques were applied to grant the external predictivity of the models and to evaluate the applicability domain of the models. This led to the in depth study of the behavior of the most used statistical validation parameters and to the proposal of the Concordance Correlation Coefficient (CCC) as an additional index of model quality. These parameters were implemented in the QSAR-INS software developed by UI.
- Similarity analysis and multivariate ranking were applied for the identification of priority chemicals in the four chemical classes of interest. Various priority lists were generated for all the four CADASTER classes and used to focus the experimental tests, performed in WP2, on the prioritized chemicals. The prioritization was performed:
A) on the basis of toxicological and chemical-physical profiles defined for the studied classes of chemicals obtained from the experimental data available in the CADASATER database and also considering data predicted by the new 'ad hoc' QSAR models developed in WP3.

B) on the basis of chemical structure, in order to select representative chemicals in the structural space of each studied chemical class.
- The predictions obtained from the individual QSAR models, developed by different modeling approaches and different molecular descriptors, were averaged in a consensus approach to propose predicted data that are not biased by any specific model.
- A comparison of prediction accuracy of the QSAR/QSPR models developed ad hoc for the CADASTER classes with some freely available tools was performed. In particular,
1) QSPR models developed for the prediction of physicochemical properties of BFRs, PFCs and TAZs/BTAZs were compared with EPI Suite estimation program,
2) QSAR models developed for aquatic toxicity of TAZs/BTAZs were compared with the ECOSAR toll,
3)classification models for the prediction of ready biodegradability of fragrances were compared with BioWIN software (implemented in Epi Suite v 4.1) and START estimation tool (implemented in Toxtree v. 2.5.1).
This study highlighted the importance of ad hoc models, which resulted more accurate than general models, to predict specific classes of chemicals, like those studied in the CADASTER project.

Task 3.1- Chemical Structures and molecular descriptors database

The design of the database was completed in June 2009 (i.e. within the first 6 months of the project). The database was made available via the project website CADASTER.eu and filled with the relevant information on molecular descriptors for chemicals belonging to the four classes of interest for the CADASTER project.

The basic database published in 2009 included:

- Flame retardants, including PBDEs: 240 structures and 1403 molecular descriptors
- PolyFluorinated chemicals, including their transformation products: 366 structures and 1862 molecular descriptors
- Fragrances: 79 structures and 1429 molecular descriptors
- Triazoles and Benzotriazoles: 279 structures and 1879 molecular descriptors

Additional chemicals were introduced in the database during the course of the project.

DRAGON descriptors (ver. 5.5 for Windows, Talete srl., 2007) were calculated starting for all the chemicals reported above starting from the x,y,z coordinates of the chemical structure.

Related Documents and Papers:
1) Deliverable 3.1 Chemical Structures and molecular descriptors database
2) Igor Tetko, Pantelis Sopasakis, Prakash Kunwar, Stefan Brandmaier, Sergii Novotarskyi, Larisa Charochkina, Volodymyr Prokopenko, Willie Peijnenberg. Prioritization of PolyBrominated Diphenyl Ethers (PBDEs) using the QSPR-THESAURUS webtool. ATLA submitted (2013)

Task 3.2 Evaluation of existing QSARs according to OECD principles

The existing QSARs (collected in WP2, Deliverable 2.2) were evaluated according to the OECD principles for QSAR models validation for regulatory application:1) a defined endpoint; 2) an unambiguous algorithm; 3) a defined domain of applicability; 4) appropriate measures of goodness-offit, robustness and predictivity; 5) a mechanistic interpretation, if possible.

Task 3.3 Gap analysis and plan for QSAR development work
Under this Task, in CADASTER Project, it has been written: Identification of gaps in the model library with input from task 3.2. Gaps can be either lack of models for end-points or models with insufficient predictive performance. Both ecotoxicological and fate related end-points are considered, and existing QSARs will be applied as much as possible to fill data gaps.

Task 3.4 Prioritization through similarity analysis and ranking methods
The activity on prioritization to optimize the experimental testing (WP2) was performed during the first 24 months of the Project. Similarity analysis and multivariate ranking methods were applied for experimental design and the identification of priority chemicals in the four chemical classes of interest. The prioritization was performed:
- on the basis of toxicological and chemical-physical profiles defined for the studied classes of chemicals. The basic idea is that the priority chemicals are those more hazardous sensu latu, thus those that have any demonstrated kind of toxicity. Owing to the general paucity of experimental data, in particular for SIDS endpoints, for chemicals belonging to the four CADASTER classes, the available experimental data on toxicity end points (also mammalian toxicity and endocrine disruption activity) were used for the characterization of toxicological profile.
- on the basis of chemical structure, in order to select representative chemicals in the structural space of each studied chemical class.

1) Brominated Flame Retardants (BFRs) A priority list of BFRs was provided by UI on the basis of their toxicological profile obtained from some preliminary UI-QSAR models developed for several endpoints related to dioxin-like activity (AhR RBA, EROD induction, AhR agonism) and endocrine disruption (ED) potency (ER agonism, PR antagonism, T4-TTR competition, E2SULT inhibition). The PBDE congeners predicted with higher activity for both dioxin-like activity and endocrine disruption potency were listed and were suggested as priority compounds for the experimental tests. In addition to PBDEs, the most active OH-PBDE congeners, found to be even more active than parental compounds by both experimental evidences and QSAR predictions, were proposed for testing.
2) Perfluorinated compounds (PFCs) A priority list of PFCs was provided by UI on the basis of their toxicological profile. QSAR models developed for mouse and rat oral toxicity (Bhhatarai et al. 2011) and inhalation toxicity (Bhhatarai et al. 2010) were applied to predict the activity of 376 PFCs (including some in ECHA list).
3) Triazoles and Benzotriazoles (TAZs and BTAZs) A priority list of (B)TAZs was provided by UI on the basis of their toxicological profile and structural similarity. Available experimental data for several eco-toxicity and environmental behavior endpoints (i.e. EC50 Algae, EC50 Daphnia, LC50 fish, LogP, BCF) collected from the Footprint Database were analysed in order to identify the most active compounds.
Development of active learning approaches

The investigation of the active learning approaches (i.e. methods which create datasets by selection of compounds which will increase accuracy of models to be developed, so called optimal experimental design) was performed as part of planned activities in WP 3.4. In collaboration of LNU and HMGU a new approach for experimental design was developed (Brandmaier et. Al 2012a, Brandmaier et. Al 2012b), The proposed approach consisted in an extension of standard approaches like Full or Fractional Factorial design or D-Optimal design, to a stepwise procedure that utilizes the D-Optimal design and combines it with partial least squares techniques (PLS) to iteratively refine the descriptor space for the compound selection.

Task 3.5 Development of new QSARs
The various Partners, involved in WP3, have developed models for the endpoints/classes by applying different modelling approaches. The models were developed taking into account the OECD principles for validation and acceptability of QSARs for regulation purposes, in particular external validation and check of applicability domain. The models, developed in the Project, have been documented in publications on international journals, peer reviewed (ISI), and in meeting presentations, listed below, and also in the CADASTER database (qspr-thesaurus) and CADASTER website (see http://www.cadaster.eu online).

Task 3.6 Development of multi-model approaches

The present task provides a summary of all the consensus models developed by WP3 partners for the melting and the boiling point of PFCs, the aquatic toxicity of (B)TAZs and the Biodegradation of fragrances. These models were chosen, among those reported in task 3.5 for the consensus analysis because they are the only QSA(P)Rs developed on SIDS endpoints among the models created within the CADASTER project by all the WP3 Partners. In brief, local models for the endpoints of interest were individually developed by different WP3 Partners for the four chemical classes studied within CADASTER (Task 3.5) by different approaches and methodologies. Finally, the consensus approach was used to combine the individual models developed for the same endpoint and chemical class.

1) Consensus models for Perfluorinated compounds (PFCs):

Within the CADASTER Project, WP3 Partners developed local QSPR models for the prediction of Melting Point (MP) and Boiling Point (BP) of per- and poly-fluorinated chemicals (PFCs). MP and BP are important physicochemical properties that indirectly affect the solubility, and hence the transport, distribution and environmental fate of these compounds. All these models are statistically robust, internally and externally validated, and with a verified applicability domain.

Consensus model for MP

Consensus predictions were derived by averaging the predictions obtained by individual models.The highest accuracy in prediction was obtained when the consensus approach was applied. Indeed, the Consensus Model was characterized by the highest R2 values and the lowest value of RMSE. WP3 individual models and the consensus model were compared with the EPI Suite model for the prediction of MP, by calculating the value of the RMSE (Root Mean Square of Errors). As expected, the accuracy in prediction of the local models (WP3 models) and the consensus model, which are specific for PFCs, was higher than the accuracy in prediction obtained by applying the general EPI Suite model.

Consensus model for BP

As it was done for the MP models, consensus predictions were derived by averaging the predictions obtained by individual models. The comparison of WP3 local models (Individual Full Models and Consensus Model) with EPI Suite estimations of BP data highlighted again the higher prediction accuracy obtained by applying local models specifically developed for PFCs than general models.

2) Consensus models for Triazoles and Benzotriazoles (BTAZs):

Within the CADASTER project, several QSAR models have been developed by WP3 Partners UI, LnU, IVL, IDEA and HMGU for the prediction of aquatic toxicity of (B)TAZs. The QSARs were realized by different modeling approaches (e.g. MLR-OLS, PLSR, Kohonen Neural Network) starting from theoretical molecular descriptors calculated by commercial and freely available software (DRAGON, PaDEL-Descriptor, QSPR-THESAURUS web).

The considered end-points were:

- EC50 (72h) in Pseudokirchneriella subcapitata
- EC50 (48h) in Daphnia magna
- LC50 (96h) in Onchorhynchus mykiss

3) Consensus models for Fragrances

Within the CADASTER project, several classification QSAR models have been developed by WP3 Partners UI, IDEA and HMGU for the prediction of biodegradation (Deliverable 3.7 Kovarich 2013 (PhD thesis)). The models were realized by different modeling approaches and were based on different dataset (specific for fragrances (UI) or general (IDEA, HMGU). Among the models developed by different WP3 partners, only those characterized by the best classification performances and based on the most different approaches (modelling methods and molecular descriptors) were selected to derive consensus predictions for the 45 fragrances included in the validation set.

Task 3.7 External validation of QSAR models

The aim of this task was to validate the QSAR models developed within the CADASTER project, using the new experimental data obtained from WP2. This was done by comparing predicted and actually measured fate and effect endpoints for (B)TAZs and Fragrances.

1) External validation of QSARs developed for aquatic toxicity of Triazoles and Benzotriazoles
The QSAR regression models of toxicity of triazoles and benzo-triazoles to algae (EC50 72h in Pseudokirchneriella subcapitata), EC50 48h in Daphnia magna and fish (LC50 96h in Onchorhynchus mykiss) were developed by five partners in WP3 (UI , LnU, IVL, IDEA and HMGU) and reported in Deliverable 3.5 and 3.6. The models were developed by different methods (MLR-Ordinary Least Squares (OLS), PLSR, Bayesian Lasso on PLS components (BLASSO/PLS), and Associative Neural Network (ASNN)), using various molecular descriptors (DRAGON, PaDEL- Descriptors and QSPR-THESAURUS web), and different procedures for variable selection, validation and applicability domain inspection. The predictions of the developed models, as well as those obtained in a consensus approach by averaging the data predicted from each model, were compared with the results of experimental tests that were performed by two CADASTER partners (PHI and RIVM in WP2).

2) Consensus models and External validation of QSARs developed for ready biodegradability of fragrance materials
The consensus model developed within the CADASTER project for the prediction of ready biodegradability of fragrance materials was externally validated on the dataset composed of 45 fragrances, 11 of them were tested by the WP2 Partner PHI. These 11 chemicals were previously selected by UI as priority compounds on the basis of the available information on potential toxicity (cyto-toxicity and mammalian toxicity) and structural representativeness (Task 3.4 report Deliverable 3.4).

WORKPACKAGE 5 OUTREACH VIA DEVELOPMENT OF WEBSITE, NEWSLETTERS/WORKSHOPS AND STAND ALONE TOOLS FOR DISSEMINATION OF PROJECT RESULTS

The major outcomes of WP5 are the development of a public database of molecules and their properties and dissemination of models, tools and results developed within the project to the external users. Below, we briefly overview them:
Dissemination of information by the project website: A web site for CADASTER project http://www.cadaster.eu was established. This site provided a dissemination of information about the activities of the project, publications, presentations at the conferences and meetings as well as contains all deliverables (public deliverables are open to all users) of the project. The web site was continuously supported during 4 years of the project and served as the central dissemination point. The CADASTER publications, announcement of events, agenda and materials of workshops, project publications and deliverables were timely and continuously upload there by the HGMU team. The project web site was attracting more than 2000 visitors per month. It was also used to organize 'Environmental Toxicity Prediction Challenge' http://www.cadaster.eu/node/65 which was co-organized with International Conference on Neural Networks (ICANN) http://www.kios.org.cy/ICANN09. The challenge attracted 518 submissions from 108 participants from more than 25 countries world-wide. The website also provided dissemination of newsletters and materials and tutorials of two CADASTER Workshops (in Maribor and Munich), which provides a comprehensive coverage of tools available at the web site.

Dissemination of information by publications and newsletters: The publication of materials of the project in peer-reviewed scientific journals was an essential part of the dissemination of the project results. Up to know 49 articles with project results have been published or submitted for publications in scientific literature (a couple of articles are under preparation). The project results were presented as 64 posters and 65 oral presentations at 51 conferences, meetings and workshops. The project results were also summarized in newsletters, which were sent to more than 6,000 registered users. The project has been highlighted in national publications, including newsletters of RIVM, HMGU and of other partners.

Dissemination of information by on-line database: An on-line QSPR-THESAURUS database http://www.qspr-thesaurus.eu which stores properties of chemical compounds and QSAR/QSPR models was developed. It has been used by the CADASTER partners to upload, verify and curate experimental data and to upload models developed during the project. The database currently contains 5,5k data points collected for greater than 800 molecules from four analyzed classes. More than 95% of data points in the database were collected by HMGU, PHI and UI groups, which contributed 2.9k 1.6k and 0.9k datapoints respectively. All this information is publicly available to the external users. The external users can access, search, upload and download data available in the database for the four analyzed classes of molecules. The users can also perform calculations using models contributed by the individual partners to new chemical structures and use predictions of these models for the risk assessment. Currently the database contains 30 models, majority of which (greater than 85%) were contributed by UI and HMGU groups. The user can access the developed tools either through the web-interface or with help of web services and standalone tools, which provides remote calculations on the QSPR-THESAURUS web site (developed by HMGU) or through http://toxpredict.org web site of OpenTox project (contributed by IDEA). The users can also upload and publish their own models. The (Q)SAR Model Reporting Format (QMRF) format was integrated as a part of the developed tools. This feature allows the CADASTER partners and external users to report information about published models, which is required for the use of models in the REACH assessment according to 'OECD principles for QSAR validation and application in regulation'.

Task 5.1 Development of a prototype of the http://www site (experimental database).
This task was to develop a prototype of the QSPR Thesaurus database, which would be suitable for download and storage of data required for the project. The QSPR Thesaurus was based on the OCHEM platform developed by HMGU group, which was adapted to the requirements of the CADASTER project. HMGU team also provided training and continuous support of other project members with respect to the use of the web site during the project. The database allowed the CADASTER users to submit, store and annotate molecular records collected from the literature. It stored data in original units, tracked users and modifications to the data that were performed by them, allowed to introduce new units, new properties. The database automatically checked for duplicates, allowed editing of single or several records simultaneously, performed batch upload of data as Excel and/or SDF files, and allowed exporting data as Excel files.

Task 5.2 Development and testing of a prototype of the http://www site (models).

The goal of this task was the development of an infrastructure and a prototype of the database of models that could be used by CADASTER participants to upload their previously developed models, publish them and make them available on-line to CADASTER and, later on, to external participants. The task was completely achieved. The QSPR Thesaurus also incorporated tools for linear model upload that were validated for the upload of models by the CADASTER participants. It was found that reproducing of models requiring 3D structures is a challenging task.

Task 5.3 Implementation of tools to estimate the Applicability Domain of Models and Experimental Design.

The main goal of this task was to implement tools to estimate the Applicability Domain of Models and Experimental Design, including both the implementation of new approaches for the AD estimation, developed in WP3, and the integration of available AMBIT tools to estimate the applicability domain (AD) of models. All these goals were achieved. Moreover, the (Q)SAR Model Review Format (QMRF) was extended to enable the display of validation results for published models. Users can add several predefined parameters (R2, RMSE, Q2Loo, Q2ext, etc.) for both internal and external validation of their models. The possibility to add any kind of graphical representation (graphs, tables, images) to support or explain the validation results as well as applicability domain of models was also implemented.

Task 5.4 Public QSPR-THESAURUS site

The goal of this task was to make the database and tools developed within the CADASTER project publicly available to web users. The deliverable of this task contained an overview of the main features of the database and the additional tools that were developed and made available to the external users. The upload of data to the database and development of models were further continued during the project. The functionality of QSPR Thesaurus was demonstrated to the participants of two workshops, which were organized by CADASTER project.

Task 5.5 Development of a stand-alone version of the tools

The goal of this task was to provide tools to remote access models developed and published within the CADASTER project on the QSPR-THESAURUS web to the external users. While it was expected that the major access to the models and the results would be accessed through the web interface, the remote standalone tools could provide an easy way to integrate the CADASTER models into other software packages by the external users. Within this task we did a number of developments that were requested by CADASTER users, e.g. incorporation of Dragon 5.4 and 5.5 packages (in addition to Dragon 6.0); authorization to download descriptors to the users that have the respective license (however, all users can use the developed models for free on the web site); storage of predicted values for molecules with 3D descriptors, development of enhanced web interface and web SOAP services to allow remote calls to the models developed by CADASTER project participants.

Task 5.6 Workshop on the use of QSARs models in REACH

The main goal of this task was to organize a workshop http://www.cadaster.eu/node/116 which would provide a dissemination about the project development and results and to assist the risk assessors and national chemicals authorities, particular in Eastern European countries, with the use of the QSAR tools for the environmental risks assessments in REACH. The workshop involved 35 participants, including invited speakers from JRC - Institute for Health and Consumer Protection (Italy), ECHA Evaluation Unit (Finland), Douglas Connect (Switzerland) and University of North Carolina USA). It was organized by the Public Health Institute Maribor (PHI) in Maribor, Slovenia and took place from September 1st to September 2nd 2011.

Task 5.7 Final workshop and guidelines for model development

The second CADASTER workshop (see http://www.cadaster.eu/workshop online) was organized by the HMGU and took place in its premises from October 7th to 9th 2012. The main goal of this workshop was to provide a tutorial to all interested partners, including industry and SMEs, on how to develop new models for the assessment of REACH-end points (in particular for new scaffolds of compounds for which there are no reliable QSAR models) and how to use the software developed by the project participants. The workshop was attended by 52 participants, including invited speakers from JRC - Institute for Health and Consumer Protection (Italy), EPA Environmental Protection Agency (USA), Umwelt Bundesamt (Germany) and coordinators of the Seventh Framework Programme (FP7) funded projects OSIRIS (Prof. G. Schüürmann) and COSMOS (Prof. M. Cronin). There were two main thematic areas of the workshop: 1) Data collection and QSAR Model development for REACH (October 8th) and 2) Case studies and use of QSARs in the risk assessment (October 9th).

Potential Impact:

Potential impact and main dissemination activities

The REACH legislation requires chemical industries to demonstrate the safe manufacture and use of their chemicals throughout the supply chain. The major innovation of CADASTER concerns REACH-targeted assessment of environmental effects and exposure of chemicals belonging to four chemical classes of emerging compounds, whilst considering the European diversity from the viewpoint of the stakeholders who are primarily responsible for carrying out safety, hazard and risk assessment, and who face the task of actually integrating all tools that are made available for application within REACH for 'their' classes of compounds. This is to be done whilst minimizing animal testing and costs of assessment. CADASTER merged high-costs endpoints of environmental toxicology, intelligent combinations of in silico techniques which in turn were combined with other alternatives to animal testing (like read across), risk-targeted decision support systems and economic valuation of substitution of chemicals from within chemical classes, and it integrated probabilistic precautionary approaches to chemical risk assessment with science-based assessment and management. CADASTER thus provided the tools and scientific and pragmatic insights to enable Europe to become a stronger partner within the international area of regulating industrial chemicals.

A brief overview of the major dissemination activities and their outcomes is given below:
Dissemination of information via the project website: A web site for the CADASTER project http://www.cadaster.eu was established. This site provided a dissemination of information about the activities of the project, publications, presentations at the conferences and meetings as well as contains all deliverables (i.e. restricted to all public deliverables that are open to all users, non-public deliverables are made available to the project partners via a user-restricted part of the website) of the project. The web site was continuously supported during 4 years of the project and served as the central dissemination point. The CADASTER publications, announcement of events, agenda and materials of workshops, project publications and deliverables were timely and continuously upload there by the HGMU team. The project web site was attracting more than 2000 visitors per month. It was also used to organize the 'Environmental Toxicity Prediction Challenge' http://www.cadaster.eu/node/65 which was co-organized with the International Conference on Neural Networks (ICANN) http://www.kios.org.cy/ICANN09. The challenge attracted 518 submissions from 108 participants from more than 25 countries world-wide. The website also provided dissemination of newsletters and materials and tutorials of two CADASTER Workshops (in Maribor and Munich), which provides a comprehensive coverage of tools available at the web site.

Dissemination of information by publications and newsletters: The publication of materials of the project in peer-reviewed scientific journals was an essential part of the dissemination of the project results. Up till now 49 articles with project results have been published or submitted for publication in scientific literature (a couple of articles are under preparation). The project results were presented as 64 posters and 65 oral presentations at 51 conferences, meetings and workshops.

Dissemination of information by on-line database: An on-line QSPR-THESAURUS database http://www.qspr-thesaurus.eu which stores properties of chemical compounds and QSAR/QSPR models, was developed. It has been used by the CADASTER partners to upload, verify and curate experimental data and to upload models developed during the project. The database currently contains 5,5k data points collected for greater than 800 molecules from four analyzed classes. More than 95% of the data points in the database were collected by HMGU, PHI and UI groups, which contributed 2.9k 1.6k and 0.9k datapoints respectively. All this information is publicly available to the external users. The external users can access, search, upload and download data available in the database for the four analyzed classes of molecules. The users can also perform calculations, using models contributed by the individual partners to new chemical structures and use predictions of these models for the risk assessment. Currently the database contains 30 models, the majority of which (greater than 85%) were contributed by the UI and HMGU groups. The user can access the developed tools either through the web-interface or with help of web services and standalone tools, which provides remote calculations on the QSPR-THESAURUS web site (developed by HMGU) or through http://toxpredict.org web site of the OpenTox project (contributed by IDEA).

Task 5.1 Development of a prototype of the http://www site (experimental database).

This task was to develop a prototype of the QSPR Thesaurus database, which would be suitable for downloading and storage of data required for the project. The QSPR Thesaurus was based on the OCHEM platform developed by HMGU group, which was adapted to the requirements of the CADASTER project. HMGU team also provided training and continuous support of other project members with respect to the use of the web site during the project. The database allowed the CADASTER users to submit, store and annotate molecular records collected from the literature. It stored data in original units, tracked users and modifications to the data that were performed by them, allowed to introduce new units, new properties. The database automatically checked for duplicates, allowed editing of single or several records simultaneously, performed batch upload of data as Excel and/or SDF files, and allowed exporting data as Excel files. by integrating AMBIT functionality for substructure search.

Task 5.2 Development and testing of a prototype of the http://www site (models).

The goal of this task was the development of an infrastructure and a prototype of the database of models that could be used by CADASTER participants to upload their previously developed models, publish them and make them available on-line to CADASTER and, later on, to external participants. The task was completely achieved. The QSPR Thesaurus also incorporated tools for linear model upload, which were validated for the upload of models by the CADASTER participants. It was found that reproducing of models requiring 3D structures is a challenging task. In order to have a possibility to provide sustainable platform for such models on the CADASTER web site, a pipe-line for optimization of molecules using MOPAC 7 program was developed and integrated as part of the web site. A graphical interface to submit and access optimized conformation of molecules was developed http://www.cadaster.eu/mopac.

Task 5.3 Implementation of tools to estimate the Applicability Domain of Models and Experimental Design.

The main goal of this task was to implement tools to estimate the Applicability Domain of Models and Experimental Design, including both the implementation of new approaches for the AD estimation, developed in WP3, and the integration of available AMBIT tools to estimate the applicability domain (AD) of models. All these goals were achieved. Moreover, the (Q)SAR Model Review Format (QMRF) was extended to enable the display of validation results for published models. Users can add several predefined parameters (R2, RMSE, Q2Loo, Q2ext, etc.) for both internal and external validation of their models. The possibility to add any kind of graphical representation (graphs, tables, images) to support or explain the validation results as well as applicability domain of models was also implemented. Thus, the models could be published on the CADASTER web site together with their QRMF as required for their use by the regulators. Experimental design methods were developed and made available on the web site of the project.

Task 5.4 Public QSPR-THESAURUS site

The goal of this task was to make the database and tools developed within the CADASTER project publicly available to web users. The deliverable of this task contained an overview of the main features of the database and the additional tools that were developed and made available to the external users. The upload of data to the database and development of models were further continued during the project.

Task 5.5 Development of a stand-alone version of the tools.

The goal of this task was to provide tools to remote access models developed and published within the CADASTER project on the QSPR-THESAURUS web to the external users. While it was expected that the major access to the models and the results would be accessed through the web interface, the remote standalone tools could provide an easy way to integrate the CADASTER models into other software packages by the external users. Within this task we did a number of developments that were requested by CADASTER users, e.g. incorporation of Dragon 5.4 and 5.5 packages (in addition to Dragon 6.0); authorization to download descriptors to the users that have the respective license (however, all users can use the developed models for free on the web site); storage of predicted values for molecules with 3D descriptors, development of enhanced web interface and web SOAP services to allow remote calls to the models developed by CADASTER project participants.

Task 5.6 Workshop on the use of QSARs models in REACH

The main goal of this task was to organize a workshop http://www.cadaster.eu/node/116 which would provide a dissemination platform about the project development and project results, and to assist the risk assessors and national chemicals authorities, particular in Eastern European countries, with the use of the QSAR tools for the environmental risks assessments in REACH. The workshop involved 35 participants, including invited speakers from JRC - Institute for Health and Consumer Protection (Italy), ECHA Evaluation Unit (Finland), Douglas Connect (Switzerland) and University of North Carolina USA). It was organized by the Public Health Institute Maribor (PHI) in Maribor, Slovenia and took place from September 1st to September 2nd 2011.

Task 5.7 Final workshop and guidelines for model development

The second CADASTER workshop (see http://www.cadaster.eu/workshop online) was organized by the HMGU and took place in its premises from October 7th to 9th 2012. The main goal of this workshop was to provide a tutorial to all interested partners, including industry and SMEs, on how to develop new models for the assessment of REACH-end points (in particular for new scaffolds of compounds for which there are no reliable QSAR models) and how to use the software developed by the project participants. The workshop was attended by 52 participants, including invited speakers from JRC - Institute for Health and Consumer Protection (Italy), EPA Environmental Protection Agency (USA), Umwelt Bundesamt (Germany) and coordinators of the Seventh Framework Programme (FP7) funded projects OSIRIS and COSMOS (Prof. M. Cronin). There were two main thematic areas of the workshop: 1) Data collection and QSAR Model development for REACH (October 8th) and 2) Case studies and use of QSARs in the risk assessment (October 9th).

ADVISORY BOARD

In the design of the CADASTETR project it was judged essential that an Advisory Board consisting of representatives of the major stakeholders in REACH would supervise progress of the project and optimal dissemination of the project results. The Advisory board was chaired by a senior scientist with long-term experience in industry in chemical risk assessment, and contained representatives from SMEs, regulatory agencies, JRC, and research institutes. As the members of the Advisory Board had ample experience in risk assessment, and exploitation and dissemination of risk assessment related activities, the input of the Advisory Board was continuously used to steer the activities within the various workpackages and to warrant optimal output of the project. To illustrate the opinions of the Advisory Board on the progress of the project and the project results, a brief overview of the input received is given below, arranged per individual Workpackage:

ADVISORY PANEL INPUT TO CADASTER

The Advisory Panel, chaired by Mike Comber, comprised of Dr Barry Hardy, Professor Gerrit Schüürmann, Dr Andrew Worth, Dr Theo Traas, Dr Chris Watts and Dr Ian Doyle. Although the involvement of the Advisory Panel was limited due to various reasons, the worst of which was REACH, the following was input received during the project.

Work Package 2: Collection of Data and Models
The Advisory Panel were asked for their opinion on the approach used as part of WP2 for selecting the chemicals to develop the QSARs and whether they had any comments on processes used for selecting chemicals in such testing programs. The Advisory Panel felt that the selection of substances for study is always a difficult task as there are many thousands of substances that could be chosen, and only limited resources for the research project are available. A balance has to be struck between regulatory interest, academic interest, availability of data, use and occurrence [in the environment], available resources and many other factors.

WP 3: Development and validation of QSAR models

One of the problems with developing and validating QSARs for specific groups of chemicals is variability in the experimental data and only having a small data set. The Advisory Panel commented that small datasets and variable test data quality have been a problem for development of QSARs since this area of research started and there hasnt been significant improvement in availability of quality datasets over that period. The release of test data [produced for regulatory purposes] to the wider scientific community would have been one major way to address this problem. Of course, regulatory testing has to be paid for [by the company registering the chemical] and provides information that may be commercially sensitive and there is an understandable reluctance to make it available in the public domain. Hopefully, the approach adopted by ECHA in relation to test data supplied for the REACH registration will enable more test data to be made widely available.

WP 4: Integration of QSARs within hazard and risk assessment
With respect to uncertainty the problem considered by the Advisory Panel was that Regulators need a clear outcome from a risk assessment that either says that a chemical is safe [for the environment, workers and consumers] for use in the amounts and specific purposes assessed, or that it is not safe and those areas where use needs to be limited or banned to provide safe use are indicated. Unfortunately, providing such 'black and white' outcomes from a risk assessment is limited by the quality [uncertainty] of the input data and the uncertainty of the risk assessment model. When reporting uncertainty, breaking it down into uncertainty of input data and uncertainty of model prediction is useful, as this allows the user to determine whether more effort put into improving input data quality will provide a sufficiently level of uncertainty for the overall risk assessment.

WP 5: Outreach via development of website, newsletters/workshop(s) and standalone tools for dissemination of project results

The Advisory Panel requested that all of the CADASTER reports from meetings be made readily available on the website together with all progress reports. Copies of all scientific publications resulting from the Cadaster project should be available in full on the website.

The availability, ease of use and transparency of the models developed will be the key to their widespread application, so the CADASTER website needs to be user friendly and structured in a way that makes its use logical. The incorporation of the QSAR models into the QSAR Toolbox is a must and doesnt yet appear to have been done. There could usefully be more information on the Cadaster website on how to use the models and what the outputs mean and this would encourage use by less experienced potential users in Industry and government.

List of Websites:

http://www.cadaster.eu

Documents connexes

Final Report - CADASTER (Case studies on the Development and Application of in-Silico Techniques for Environmental hazard and Risk assessment)

Final Report Summary - CADASTER (Case studies on the Development and Application of in-Silico Techniques for Environmental hazard and Risk assessment)

Documents connexes

Partager cette page

Télécharger