Final Report Summary - ACQDIV (Acquisition processes in maximally diverse languages: Min(d)ing the ambient language)
How can infants learn any human language in the first few years of their life? This is one of the great unanswered questions in cognitive science. While considerable progress has been made with regard to specific mechanisms in specific languages, one of the biggest remaining challenges is how to account for the extreme flexibility that children show when acquiring any one of the approximately 7000 languages of the world. Each of these comes with unique and widely differing demands on what needs to be learned — from different sound inventories and grammatical categories to different syntactic constructions and patterns in the lexicon — and each is embedded in substantially varied cultural and social settings. How is acquisition possible nevertheless? A popular answer to this question is to assume grammar universals, i.e. structural features that are found in all languages, even if only latently. This assumption, however, has proven to be difficult to defend in the face of great diversity across various aspects of language and language development. In this project we explored an alternative and ask what common patterns can children rely on in the speech they hear despite structural differences?
To answer this question, we developed a database of naturalistic longitudinal data in thirteen maximally diverse languages. These languages were selected to vary as extremely as possible in a set of structural features that are relevant for learning. The languages are Chintang (a Sino-Tibetan language in Nepal), Cree (an Algonquian language spoken in Canada), Indonesian (Austronesian), Inuktitut (Eskimo-Aleut language family, spoken in Canada), Japanese, Russian, Sesotho (a Bantu language spoken in Lesotho), Turkish, Yucatec (a Mayan language spoken in Mexico), Nungon (a Finisterre-Huon language spoken in northeast Papua New Guinea), Qaqet (a Baining language spoken in New Britain Papua New Guinea), and Ku Waru (a Trans-New Guinea language family, Highlands of Papua New Guinea). For one language, Dene Suline (an Athapaskan language family in Canada), widely considered to be one of the most complex languages in the world, we conducted field work and collected new data to study the acquisition process. The underlying rationale of this approach is: if we find similar statistical patterns in the distributions children receive as input in as diverse languages as possible we then can propose shared learning mechanisms that rely on these patterns. This then allows us to tackle the requirements for learning of any human language on the one hand, and the underlying mechanisms that enable this learning process on the other. Our results revealed many statistical patterns that are fundamentally similar across all languages in the study. We also showed that children rely on these patterns when learning using well-established mechanisms of distributional learning. For example, children hear repetitions of the same word in very short intervals tied to interactional units with their interlocutors. This helps them to segment words from the speech stream and to learn their meaning and structural use. Our findings suggest that the main underlying cognitive mechanism responsible for the learning of linguistic units is statistical learning. We illustrate how statistical learning allows children to learn words, morphology, and even individual sounds in these maximally diverse languages. We have disseminated our research results by peer-reviewed publications in professional journals (e.g. Cognition, Plos Biology, Language), in conference proceedings (e.g. BUCLD, LREC, Cognitive Science Society), and edited volumes on specific aspects of language acquisition. We presented our results on many international conferences including events such as the ERC Ideas Lab at the World Economics Forum Meeting of the New Champions in Tianjin 2018. The ACQDIV database will become open access including those languages which are not subject to ethics and privacy restrictions.
To answer this question, we developed a database of naturalistic longitudinal data in thirteen maximally diverse languages. These languages were selected to vary as extremely as possible in a set of structural features that are relevant for learning. The languages are Chintang (a Sino-Tibetan language in Nepal), Cree (an Algonquian language spoken in Canada), Indonesian (Austronesian), Inuktitut (Eskimo-Aleut language family, spoken in Canada), Japanese, Russian, Sesotho (a Bantu language spoken in Lesotho), Turkish, Yucatec (a Mayan language spoken in Mexico), Nungon (a Finisterre-Huon language spoken in northeast Papua New Guinea), Qaqet (a Baining language spoken in New Britain Papua New Guinea), and Ku Waru (a Trans-New Guinea language family, Highlands of Papua New Guinea). For one language, Dene Suline (an Athapaskan language family in Canada), widely considered to be one of the most complex languages in the world, we conducted field work and collected new data to study the acquisition process. The underlying rationale of this approach is: if we find similar statistical patterns in the distributions children receive as input in as diverse languages as possible we then can propose shared learning mechanisms that rely on these patterns. This then allows us to tackle the requirements for learning of any human language on the one hand, and the underlying mechanisms that enable this learning process on the other. Our results revealed many statistical patterns that are fundamentally similar across all languages in the study. We also showed that children rely on these patterns when learning using well-established mechanisms of distributional learning. For example, children hear repetitions of the same word in very short intervals tied to interactional units with their interlocutors. This helps them to segment words from the speech stream and to learn their meaning and structural use. Our findings suggest that the main underlying cognitive mechanism responsible for the learning of linguistic units is statistical learning. We illustrate how statistical learning allows children to learn words, morphology, and even individual sounds in these maximally diverse languages. We have disseminated our research results by peer-reviewed publications in professional journals (e.g. Cognition, Plos Biology, Language), in conference proceedings (e.g. BUCLD, LREC, Cognitive Science Society), and edited volumes on specific aspects of language acquisition. We presented our results on many international conferences including events such as the ERC Ideas Lab at the World Economics Forum Meeting of the New Champions in Tianjin 2018. The ACQDIV database will become open access including those languages which are not subject to ethics and privacy restrictions.