Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

Article Category

Article available in the following languages:

Soft biometrics and data labelling for privacy preservation

An EU-funded Innovative Training Network (ITN) was established to train the next generation of researchers and develop effective solutions to tackle the complex issue of privacy preservation in the 21st century.

As the use of advanced technologies like AI, sensors and biometrics continues to grow, the preservation of an individual’s right to privacy has become increasingly challenging. With the support of the Marie Skłodowska-Curie Actions programme, the PriMa project brings together industrial and academic experts to prepare young researchers to tackle the complex issue of privacy preservation in the 21st century. Through inter-related PhD programmes, academic partners from all over Europe are sharing knowledge and expertise, providing researchers with the tools to make a meaningful impact in the field. One area of focus for the PriMa ITN is the use of soft biometrics. These are physical or behavioural characteristics that are more subjective and less discriminative than traditional biometric identifiers like faces, fingerprints and iris scans. Examples of soft biometric attributes include hair colour, height, gait, typing rhythm and voice. During the project, the need to rethink how to approach the labelling of demographic attributes became apparent. The current categories can be limited and not reflective of the diversity of human identities, and inconsistencies are present in the labelling of soft biometric attributes in facial image data sets. The project team has found that achieving a high agreement rate between annotators is crucial to ensuring the reliability and consistency of labelled data. Overall, the PriMa project has provided valuable insights into the challenges and opportunities of using subjective and less discriminative biometric identifiers to identify and describe individuals. Project work highlights the need for transparency in the labelling process and for a more inclusive approach to defining demographic attributes. “The PriMa contribution is that we identified the issues with the labelling of soft biometric data illustrated in Zohra Rezgui’s blog post, which led to the conclusion that labelling demographic data is a complex task that requires careful consideration and attention to detail,” said Prof. Raymond Veldhuis, coordinator of PriMa. “The way in which data is labelled can have a significant impact on the reliability and accuracy of machine learning algorithms,” he explained. “It is important to monitor the quality of labels, particularly in data sets related to soft biometrics such as facial images, and to take into account the subjectivity and diversity of human identities when defining demographic attributes. PriMa has not contributed by, for instance, proposing better protocols for labelling.” One contribution is the work on gender concealment from facial images and facial templates (biometric data stored for recognition). Another is the work on privacy-preserving gait recognition from smartphone sensors, maintaining the authentication performance without revealing demographic data. PriMa has also provided an in-depth analysis of the personal and sensitive data extracted from mobile background sensors and the corresponding automated methods. These focus on demographics, activity and behaviour, health parameters and body features, mood and emotion, location tracking and keystroke logging, as well as a summary of the metrics proposed in the literature for privacy quantification from the perspective of sensitive data. If you are interested in having your project featured as a ‘Project of the Month’ in an upcoming issue, please send us an email to editorial@cordis.europa.eu and tell us why!

Keywords

PriMa, privacy, privacy preservation, soft biometrics, biometric attributes, labelled data, demographic attributes, sensitive data