This OO-L1_Readme.txt file was generated on 2022-02-24 by Qianting Yuan and Raquel Fernández Fuertes   INDEX OF THE OO-L1 DATASET (OBJECT OVERTNESS IN L1 ACQUISITION)  1. GENERAL INFORMATION  1.1. Title of dataset  1.2. Author information  1.2.1. PI and co-PI  1.2.2. Lab  1.3. Objectives  1.4. Funding sources  1.5. Citing information  2. ACCESS INFORMATION  2.1. Licenses or restrictions  2.2. Publications  3. METHODOLOGICAL INFORMATION  3.1. Data selection procedure from CHILDES 3.1.1. Bilingual participants 3.1.2. Monolingual participants 3.2. Data extraction procedure  3.3. Data classification procedure: variables  4. DATA  4.1. Database  4.2. Last update  5. RELATED DATASETS    1. GENERAL INFORMATION    1.1. Title of dataset: OO-L1_Dataset    1.2. Author Information  1.2.1. PI and co-PI:  Name: Qianting Yuan   Institution: University of Valladolid  Address: Facultad de Filosofía y Letras, Paseo del Cauce s/n 47011, Valladolid (Spain)  Email:    Name: Raquel Fernández Fuertes  Institution: University of Valladolid  Address: Facultad de Filosofía y Letras, Paseo del Cauce s/n 47011, Valladolid (Spain)  Email:    1.2.2. Lab:  Name of the lab: UVALAL (University of Valladolid Language Acquisition Lab)  Institution: University of Valladolid  Address:   Email:    1.3. Objectives  This investigation is focused on the crosslinguistic influence between the two first languages (L1s) of bilingual children in the domain of direct objects. We examine the oral production of bilingual children with two different language pairs (i.e., Cantonese-English and Spanish-English) and compare it with that of the monolingual English and monolingual Cantonese children. All the data have been taken from the CHILDES project (Child Language Data Exchange System; (MacWhinney 2000) (i.e., Cantonese-English bilingual corpus: YipMatthews; Spanish-English bilingual corpus: FerFuLice; English monolingual corpora: Sachs, Bloom, Demetras-Trevor; Cantonese monolingual corpus: Lee/Wong/Leung). These corpora comprise spontaneous data where the children interact with adults in a natural context (e.g., at home). While null objects are possible and pervasive in Cantonese, their occurrence in languages like English and Spanish is rather restricted. The analysis of how Cantonese-English bilingual children produce direct objects in a quantitatively and qualitatively different way when compared to their Spanish-English bilingual and English monolingual counterparts provides valuable information about the nature and the directionality of crosslinguistic influence between bilingual children’s two L1s; it also presents new empirical evidence for the postulation that the development of the two L1s in bilingual children is interdependent.   1.4. Funding sources  o 2018-2022: Spanish Ministry of Science, Innovation and Universities and European Regional Development Fund (ERDF) [PGC2018-097693-B-I00], Linguistic competence indicators in heritage and non-native languages: linguistic, psycholinguistic and social aspects of English-Spanish bilingualism, PRINCIPAL INVESTIGATOR: R. Fernández Fuertes (University of Valladolid, Spain) o 2017-2019: Regional Government of Castile and León (Spain) and ERDF [VA009P17], Aspectos de la dimensión internacional del contacto de lenguas: diagnósticos de la competencia lingüística bilingüe inglés-español, PRINCIPAL INVESTIGATOR: R. Fernández Fuertes (University of Valladolid, Spain)    1.5. Citing information  Publications using this dataset (or any part of it) should cite this dataset as follows:    Yuan, Q., & Fernández Fuertes, R. (2016). An Analysis of Interlinguistic Influence from Chinese into English in Direct Object Realization in Chinese-English Bilingual Children. ES: Revista de Filología Inglesa, 37, 33–55.   2. ACCESS INFORMATION  2.1. Licenses or restrictions: There are no licenses/restrictions placed on the data from the corpora in CHILDES as they are freely available at the CHILDES project ( (MacWhinney 2000). However, in order to be able to run the CLAN programs (Computerized Language ANalysis) to perform automatic searches and calculations in the data from the corpora, the CLAN software needs to be downloaded and installed. The CLAN software is freely available in CHILDES and there are Windows, Mac and Unix versions (    2.2. Publications: A partial or total access to information contained in the database can be found at the UVALAL webpage (publications section,    3. METHODOLOGICAL INFORMATION  3.1. Data selection procedure from CHILDES All the data included in the present investigation have been taken from the CHILDES project ( (MacWhinney 2000). The information below appears organized as follows: name of the corpus; selected children; age range of children; language or language pair; region/state (country); date (if available)  3.1.1. Bilingual participants: o YipMatthews (; Timmy, Sophie, Alicia, Llywelyn and Charlotte; 1;03-4;06; Cantonese-English; HongKong (China); 1995-2008 o FerFuLice (; Leo and Simon; 1;01-6;11; Spanish-English; Salamanca (Spain); 1998-2004  3.1.2. Monolingual participants: o Sachs (; Naomi; 1;01-5;01; American English; North America (USA); 1969-1973  o Bloom (; Peter; 1;09-3;02; American English; New York (USA); 1971-1973  o Demetras-Trevor (; Trevor; 2;00-2;11; American English; North America (USA); 1985-1987 o Lee/Wong/Leung (; Chunyat, Gakei and Kingstun; 1;05-3;08; Cantonese; Hong Kong (China); 1991-1994   3.2. Data extraction procedure All the transitive verbs, either taking overt or null objects, were manually extracted from each data source, and they are compiled in the following csv documents:  - OO-L1_English.csv - OO-L1_Cantonese.csv   3.3. Data classification: variables    - Identifying variables: name of the corpus, names of the files included in the investigation - Demographic variables: data type (i.e., monolingual / bilingual); age of the child (years; months); MLUw (per month); participant’s name, language pair in the case of bilingual participants (i.e., Cantonese-English / Spanish-English); dominant language in the case of bilingual participants (if applied) - Linguistic variables for English data:  o VERB TYPE Pure transitive verb Mixed verb o DIRECT OBJECT OVERTNESS AND GRAMMATICALITY Grammatical overt object Grammatical null object Ungrammatical overt object Ungrammatical null object o OVERT DIRECT OBJECT TYPE DP Pronoun VP CP - Linguistic variables for Cantonese data:  o DIRECT OBJECT OVERTNESS AND GRAMMATICALITY Grammatical overt object Grammatical null object Ungrammatical overt object Ungrammatical null object o LINGUISTIC CONTEXT Context open to both an overt or null direct object Context requiring an overt direct object o OVERT DIRECT OBJECT TYPE DP Pronoun VP CP 4. DATA  4.1. Database  The database contains the raw data with all the information related to the dataset, organized according to the three different types of variables, namely, identifying variables, demographic variables and linguistic variables, as seen in section 3.3. - OO-L1_English.csv: it contains all the direct object cases in English (either overt or null, and either grammatical or ungrammatical) produced by the selected participants during the investigation period in the selected corpora in the CHILDES project ( (MacWhinney 2000). Number of variables = 11; number of rows = 2809. o the five selected Cantonese-English bilingual participants, namely, Timmy, Sophie, Alicia, Llywelyn and Charlotte from the YipMatthews corpus o the two Spanish-English bilingual participants, namely, Leo and Simon, from the FerFuLice corpus o the English monolingual participant, namely, Naomi, from the Sachs corpus, the selected English monolingual participant, namely, Peter, from the Bloom corpus and the English monolingual participant, namely, Trevor, from the Demetras-Trevor corpus - OO-L1_Cantonese.csv: it contains all the direct object cases in Cantonese (either overt or null, and either grammatical or ungrammatical) produced by the selected participants during the investigation period in the selected corpora in the CHILDES project ( (MacWhinney 2000). Number of variables = 11; number of rows = 3290. o the five selected Cantonese-English bilingual participants, namely, Timmy, Sophie, Alicia, Llywelyn and Charlotte from the YipMatthews corpus o the three selected Cantonese monolingual participants, namely, Chunyat, Gakei and Kingstun from the Lee/Wong/Leung corpus   4.2. Last update: 2021   5. RELATED DATASETS  - Bilingual acquisition data: longitudinal corpus_FerFuLice dataset: