Final Report Summary - ACQDIV (Acquisition processes in maximally diverse languages: Min(d)ing the ambient language)
To answer this question, we developed a database of naturalistic longitudinal data in thirteen maximally diverse languages. These languages were selected to vary as extremely as possible in a set of structural features that are relevant for learning. The languages are Chintang (a Sino-Tibetan language in Nepal), Cree (an Algonquian language spoken in Canada), Indonesian (Austronesian), Inuktitut (Eskimo-Aleut language family, spoken in Canada), Japanese, Russian, Sesotho (a Bantu language spoken in Lesotho), Turkish, Yucatec (a Mayan language spoken in Mexico), Nungon (a Finisterre-Huon language spoken in northeast Papua New Guinea), Qaqet (a Baining language spoken in New Britain Papua New Guinea), and Ku Waru (a Trans-New Guinea language family, Highlands of Papua New Guinea). For one language, Dene Suline (an Athapaskan language family in Canada), widely considered to be one of the most complex languages in the world, we conducted field work and collected new data to study the acquisition process. The underlying rationale of this approach is: if we find similar statistical patterns in the distributions children receive as input in as diverse languages as possible we then can propose shared learning mechanisms that rely on these patterns. This then allows us to tackle the requirements for learning of any human language on the one hand, and the underlying mechanisms that enable this learning process on the other. Our results revealed many statistical patterns that are fundamentally similar across all languages in the study. We also showed that children rely on these patterns when learning using well-established mechanisms of distributional learning. For example, children hear repetitions of the same word in very short intervals tied to interactional units with their interlocutors. This helps them to segment words from the speech stream and to learn their meaning and structural use. Our findings suggest that the main underlying cognitive mechanism responsible for the learning of linguistic units is statistical learning. We illustrate how statistical learning allows children to learn words, morphology, and even individual sounds in these maximally diverse languages. We have disseminated our research results by peer-reviewed publications in professional journals (e.g. Cognition, Plos Biology, Language), in conference proceedings (e.g. BUCLD, LREC, Cognitive Science Society), and edited volumes on specific aspects of language acquisition. We presented our results on many international conferences including events such as the ERC Ideas Lab at the World Economics Forum Meeting of the New Champions in Tianjin 2018. The ACQDIV database will become open access including those languages which are not subject to ethics and privacy restrictions.