Periodic Reporting for period 2 - FashionBrain (Understanding Europe’s Fashion Data Universe)
Reporting period: 2018-07-01 to 2019-12-31
The action concluded successfully, with novel online shopping experiences, the detection of influencers, and the prediction of upcoming fashion trends. The main innovations of the FashionBrain project are:
- MonetDB with improved support for time series and unstructured data processing
- Flair: A New State-of-the-Art Library For Natural Language Processing
- End-to-End Text-to-Image Search with Neural Information Retrieval
- Human-in-the-loop Fashion Influencer Discovery
- PredTS: predict fashion trends in multiple incomplete fashion time series data
- Developed a fashion taxonomy, which aggregates various sources, such as the Fashwell taxonomy complemented with publicly available sources.
- Produced software requirements for time series analysis and developed a Probabilistic RNN for sequential data with missing values.
- Extended work on the FashionBrain taxonomy visualisation tools.
Secondly, we have worked on semantic data integration from three different perspectives:
- Developed techniques for entity extraction from text and images. An important outcome is FashionNLP: a natural language processing tool for fashion related text.
- Provided initial solutions to store and share the FashionBrain taxonomy, common datasets, extracted entities and link, and as well in-database methods and solutions.
- Provided the FashionBrain integrated architecture for data integration in fashion data.
- Provided a demo paper entitled “RecovDB: accurate and efficient missing blocks recovery for large time series”.
- SQL window functions have been released in MonetDB.
- Flair was extended from handling only text to images.
Thirdly, we have developed human computation and crowdsourcing tools to improve the quality of training data and perform annotation at scale:
- Improved crowdsourcing agreement measures, with a publication at HCOMP-17, live demo and open source code.
- Released the open source ModOp browser plugin to improve crowdsourcing interfaces.
- Analysed the vulnerabilities of crowdsourcing interfaces and potential biases related, with publication at HCOMP-18 (winning Best Paper Award).
- Performed a study on perceived bias in crowdsourcing, with a publication at SIGIR 2018 and publicly available dataset.
- Presented poster paper on rating systems in crowdsourcing at HCOMP-16.
- Presented a WWW 2020 paper on OpenCrowd: A Human-AI Collaborative Approach for Finding Social Influencers via Open-Ended Answers Aggregation.
- Built a new annotated dataset (“FashionTweets”).
- Provided a new methodology to measure difficulty of a crowdsourcing task, tested on the FashionTweets dataset.
- A new innovation “Human-in-the-loop Fashion Influencer Discovery” has been developed, based on UNIFR Open Crowd and on Flair trained on the FashionTweets dataset.
- The organisation of the First symposium on Biases in Human Computation and Crowdsourcing in synergy with another H2020 partner (Qrowd).
- A new set of experiments has been conducted, on biases in crowdsourcing for fashion data, published in the Journal of Artificial Intelligence Research (JAIR)
- A study on payment biases in crowdsourcing “Platform-related Factors in Repeatability and Reproducibility of Crowdsourcing Tasks” has been published in HCOMP 2019.
- Two publications on task abandonment in crowdsourcing have been realised, in WSDM 2019 and in IEEE Transactions on Knowledge and Data Engineering (TKDE).
Fourthly, we have developed In-Database-Mining and Deep Learning methods:
- Performed a study for integrating entity linkage in a main memory database system (IDEL) integration with MonetDB.
- Undertaken work on Neural Paragraph Retrieval (SMART-MD).
- Realised an in-database machine learning approach in MonetDB.
- Collected and annotated of a new Fashion Corpus, in collaboration with the Hong Kong Polytechnic, to be used in relation extraction experiments.
- Published a paper “Analysing Errors of Open Information Extraction Systems” in the workshop “Workshop on Building Linguistically Generalizable NLP Systems” EMNLP 2017.
- Published a paper on layer-wise analysis of Transformer representation) at CIKM 2019.
- Published a demonstrator for layer-wise Transformer representations in ACM WWW 2020,.
- Published a paper on contextualized document representations at ACM WWW 2020.
- Published a work on neural models for topic segmentation and classification at TACL 19.
Fifthly. we have worked on social media streams:
- Developed a tool for the recovery of missing values and implemented within MonetDB, with a demo paper entitled that has been published at ICDE’19.
- Created a method for the prediction of user preferences in fashion data.
- Developed a tool for prediction of trends in time series data implemented on top of MonetDB.
- Completed the integration of MonetDB and Flair for advanced analysis of terabytes of Twitter data.
- Written a research paper that has been accepted at the Very Large Database conference (VLDB’20).
Finally, we have worked on text-to-product and image-to-product search:
- Developed an image to product entity linkage data model.
- Developed state-of-the-art general NLP framework, Flair.
- Published a NLDL 2019 paper on multilingual language modeling and application to sequence labeling.
- Published an EMNLP 2019 paper on evolving word representations.
- Published an EMNLP 2019 demonstration paper on the Flair framework.
- Improved the Flair framework which includes support for (1) two-tower search architecture (2) image and text embeddings and (3) the FEIDEGGER dataset.
- Built a live demonstration on FashionBrain website showcasing text-image search capabilities.
An image entity linkage data model that outperforms Google’s state-of-the-art on academic DeepFashion consumer-to-shop benchmark datasets: Google (Song et al 2017) 39.2%, Fashwell 40.1%. Moreover, Fashwell technology These achievements significantly improve the quality of existing technologies.