This FerFuLice_Readme.txt file was generated on 2021-12-13 by Raquel Fernndez Fuertes INDEX OF THE FerFuLice DATASET 1. GENERAL INFORMATION 1.1. Title of dataset 1.2. Author information 1.2.1. PI and co-PI 1.2.2. Labs 1.2.3. People involved in the data collection 1.3. Corpus description 1.4. Funding sources 1.5. Citing information 2. ACCESS INFORMATION 2.1. Licenses or restrictions 2.2. Publications 3. METHODOLOGICAL INFORMATION 3.1. Data elicitation procedure 3.2. Data transcription procedure 4. DATA 4.1. Inventory of data files 4.2. Database 4.3. Last update 5. RELATED DATASETS 1. GENERAL INFORMATION 1.1. Title of dataset: the FerFuLice corpus 1.2. Author Information 1.2.1. PI and co-PI: Name: Raquel Fernndez Fuertes Institution: University of Valladolid (Spain) Address: Facultad de Filosofa y Letras, Paseo del Cauce s/n 47011, Valladolid (Spain) Email: raquelff@uva.es Name: Juana M. Liceras Institution: University of Ottawa (Canada) Address: Faculty of Arts, University of Ottawa, Dept. of Modern Languages, 70 Laurier East, Ottawa, Ontario, K1N 6N5 (Canada) Email: Juana.Munoz-Liceras@uottawa.ca 1.2.2. Labs: Name of the lab: UVALAL (University of Valladolid Language Acquisition Lab) Institution: University of Valladolid (Spain) Address: https://uvalal.uva.es Email: gir.uvalal@uva.es Name of the lab: LarLab (Language Acquisition Research Lab) Institution: University of Ottawa (Canada) Address: https://www.uottawalarlab.com Email: larlab.uottawa@gmail.com 1.2.3. People involved in the data collection Videotaping was done by Raquel Fernndez Fuertes, K. Todd Spradlin, and Esther glvarez de la Fuente. Transcription in Valladolid was done by Esther glvarez de la Fuente, Susana Muiz Fernndez, Isabel Parrado Romn, K. Todd Spradlin, Elisa Rosado Villegas, Israel de la Fuente Velasco, and Alfonso Martnez Prez. Transcription in Ottawa was done by Roco Prez-Tattam, Tamara Vardomskaya, Anah Alba de la Fuente, K. Todd Spradlin, Marco Llamazares, Melissa Grimes, Shauna Flynn, and Deidre Butters. The coordination of both transcription teams was done by Raquel Fernndez Fuertes. 1.3. Corpus description This corpus contains spontaneous productions from a longitudinal study of two English/Spanish bilingual identical twins with the pseudonyms of Simon and Leo. They were born 28-DEC-1998 into a middle-class family in Spain. The father is a native speaker of Peninsular Spanish, and the mother is a native speaker of American English. The father always speaks to the children in Spanish and the mother always addresses them in English. The parents generally communicate in Spanish with each other, except on summers when they travel to the United States for approximately two months or when a monolingual English speaker is present. Therefore, we are dealing with bilingual English/Spanish first language acquisition in a monolingual-Spanish social context, a type of bilingualism that is referred to in the literature as individual bilingualism (Bhatia and Ritchie, 2004). During the first year, the mother was the primary caretaker of the twins. The father was present all day on weekends and less on weekdays. At age 1;10, the twins started going to day care for 3 hours a day on weekdays, where the language of the staff and other children was Spanish. Apart from the mother, additional contact with English was provided by occasional visits by the maternal grandparents and during the two-month visits to the United States every summer. The twins and participants other than the investigators have pseudonyms to protect their privacy. If names of children or participants other than the investigators appear in the recordings, only the first initial or first and second initial are transcribed. When it was not clear in the recording which of the twins was speaking (mainly because they were off-screen) we have used SOL (Simon or Leo) instead of S (Simon) or L (Leo). 1.4. Funding sources - 2007-2010. Spanish Ministry of Science and Technology and ERDF (European Regional Development Fund) [HUM2007-62213], Elaboracin y anlisis de un corpus de datos de adquisicin del ingls y del espaol como L1 y L2 de nios y adultos: aprendizaje formal, naturaleza del input y factor edad. PRINCIPAL INVESTIGATOR: Raquel Fernndez Fuertes (University of Valladolid Language Acquisition Lab). - 2006-2008. Castile and Len Regional Government (Spain) [VA046A06], Lenguas en contacto [ingls/espaol] en el contexto de Castilla y Len: adquisicin de L1 y L2. PRINCIPAL INVESTIGATOR: Raquel Fernndez Fuertes (University of Valladolid Language Acquisition Lab). - 2004-2007. Social Sciences and Humanities Research Council of Canada [RE/C: 410-2004-2034], Two perspectives on optionality and parametric values in second language acquisition: primary language development and diachronic change. PRINCIPAL INVESTIGATOR: Juana M. Liceras (University of Ottawa). - 2002-2005. Spanish Ministry of Science and Technology and ERDF [BFF2002-00442], La teora lingstica y el anlisis de los sistemas bilinges simultneos del ingls y del espaol. PRINCIPAL INVESTIGATOR: Raquel Fernndez Fuertes (University of Valladolid Language Acquisition Lab). - 2002-2003. Castile and Len Regional Government (Spain) [UV 30/02], Estrategias para la enseanza de lenguas y la formacin del profesorado: estudio terico y prctico de la produccin lingstica de gemelos bilinges ingls/espaol. PRINCIPAL INVESTIGATOR: Raquel Fernndez Fuertes (University of Valladolid Language Acquisition Lab). - 2000-2004. Faculty of Arts research funds, University of Ottawa (Canada), Bilingualism (English/Spanish) as a first language: a case study of identical twins. PRINCIPAL INVESTIGATOR: Juana M. Liceras (University of Ottawa). 1.5. Citing information Publications using this dataset (or any part of it) should cite this dataset as follows: Fernndez Fuertes, R. and J.M. Liceras. 2010. Copula omission in the English developing grammar of English/Spanish bilingual children. International Journal of Bilingual Education and Bilingualism 13 (5): 525-551. DOI: https://doi.org/10.1080/13670050.2010.488285. 2. ACCESS INFORMATION 2.1. Licenses or restrictions: There are no licenses/restrictions placed on the data from the corpora in CHILDES (Child Language Data Exchange System) as they are freely available at the CHILDES project (https://childes.talkbank.org/) (MacWhinney 2000). However, in order to be able to run the CLAN programs (Computerized Language ANalysis) to perform automatic searches and calculations in the data from the FerFuLice corpus the CLAN software needs to be downloaded and installed. The CLAN software is freely available in CHILDES and there are Windows, Mac and Unix versions (https://dali.talkbank.org/clan/). 2.2. Publications: A partial or total access to information contained in the database can be found at the UVALAL webpage (publications section, http://uvalal.uva.es/index.php/results/publications-2/) 3. METHODOLOGICAL INFORMATION 3.1. Data elicitation procedure The data we have collected cover the age range of 1;01 to 6;11. A total of 178 sessions were recorded on videotape and DVD, of which 117 are in an English context (i.e., with an English interlocutor such as the interviewer or their mother) and 61 in a Spanish context (i.e., with a Spanish interlocutor such as the interviewer or their father). The Spanish recordings were made at intervals of 2-3 weeks until age 3;00 (with some interruptions during the summer holidays), and then once a month after that. The English recordings were sometimes made more frequently, but the sessions are usually much shorter and recorded on consecutive days. The children were recorded in naturalistic settings, usually at home, and appear together in most of the sessions. They were mostly engaged in normal play activities with the interlocutor. 3.2. Data transcription procedure All video files recorded were transcribed using the CHAT (Codes for the Human Analysis of Transcripts) transcription system from the CHILDES project (MacWhinney 2000). All people involved in the transcription of the FerFuLice corpus were previously trained in the CHAT transcription system. To ensure uniformity in the transcription procedure followed by the different people involved in the process, a transcribing-in-chat document was elaborated when the transcription procedure started in 2004 and was frequently updated. This document was based on and in agreement with the CHAT transcription manual in CHILDES (https://talkbank.org/manuals/CHAT.pdf). Spanish recordings were transcribed by the University of Valladolid team while English sessions were so by the University of Ottawa team. All transcribers were English-Spanish bilinguals and the language they were transcribing in was their first language. 4. DATA 4.1. Inventory of data files Data transcripts appear in three folders (English, Spanish and bilingual). The first two correspond to the recordings made in English and in Spanish respectively. The bilingual folder includes recordings in which we have used different experimental tests that involved a combination of English and Spanish: 1 file involves code switching and 3 involve natural interpreting. The code-switching task is a repetition task which consisted of a series of English-Spanish functional-lexical mixings which have appeared in the studies by Fantini (1985) and Deuchar and Quay (2000) as well as our own invented ones. The translation tasks are experimental tasks in which the children were asked to act as interpreters between two monolingual participants. A full inventory of files in the FerFuLice corpus appears in the following CSV file: FerFuLice_files_inventory.csv. Four sections appear in this file (English, Spanish, bilingual and totals). Each of the first three sections (English, Spanish and bilingual) contains information relative to the file, age of the children and duration of the recording. Additionally, several calculations have been performed for each file and for each child: number of utterances and number of words produced, the Mean Length of Utterance value measured in words (MLUw), the MLUw rate per month and MLUw rate per year. The adult input that appears in the recordings and transcripts has been quantified, as well. An estimate of the childrenUs primary input in English and in Spanish in terms of number of utterances and number of words is also provided. Calculations involving the entire corpus appear in the last section of the CSV file (totals). Some clarifications about the information in the FerFuLice_files_inventory.csv follow: - MLUw calculations (session, month & year): weighted mean taking as a point of reference the number of utterances. - Primary input provides information about the approximate adult input recorded for the children: *primary input givers in English: MEL (Melanie, the mother), TOD (Todd, the researcher), SAF (Safta, the grandmother), JEF (Jeff, visitor), EMM (Emma, visitor) *primary input givers in Spanish: IVO (Ivo, the father), RAQ (Raquel, the researcher), EST (Esther, the researcher), JUA (Juana, the researcher) *primary input givers in the bilingual sessions: RAQ (Raquel, the researcher), MEL (Melanie, the mother), IVO (Ivo, the father), SUS (Susana, the researcher), EST (Esther, the researcher), TOD (Todd, the researcher) - SOL stands for RSimon or LeoS and it was used when it was not clear who was speaking. In this case, no MLUw is calculated. 4.2. Database The FerFuLice corpus is fully available at https://childes.talkbank.org/access/Biling/FerFuLice.html. Data include the following: - original oral data (audio files of all the recordings) - transcribed data (CHAT transcription files of all the recordings) Age range covered: 1:01.22-6:11.00 Number of files: 177 Number of words: 537,171 Number of utterances: 123,075 Number of hours recorded: 80:04:17 4.3. Last update: 2020 5. RELATED DATASETS - Bilingual acquisition data: Natural Interpreting_NI dataset: https://uvadoc.uva.es/handle/10324/50963 - Bilingual acquisition data: Object Overtness_OO-L1 dataset: https://uvadoc.uva.es/handle/10324/52715 - Bilingual acquisition data: Dative Alternation_DA-L1 dataset: https://uvadoc.uva.es/handle/10324/52646