This SO-L2_Readme.txt file was generated on 2022-06-15 by Sonja Mujcinovic & Raquel Fernández Fuertes   INDEX OF THE SO-L2 DATASET (SUBJECT OVERTNESS IN L2 ACQUISITION)  1. GENERAL INFORMATION  1.1. Title of dataset  1.2. Author information  1.2.1. PI and co-PI  1.2.2. Lab 1.2.3. People involved in the data collection  1.3. Objectives  1.4. Funding sources  1.5. Citing information  2. ACCESS INFORMATION  2.1. Licenses or restrictions  2.2. Publications  3. METHODOLOGICAL INFORMATION  3.1. Data elicitation 3.1.1. Oral data 3.1.2. Written data 3.2. Data transcription procedure 3.3. Data extraction procedure  3.4. Data classification procedure: variables  4. DATA  4.1. Database  4.2. Last update  5. RELATED DATASETS    1. GENERAL INFORMATION    1.1. Title of dataset: SO-L2_Dataset    1.2. Author Information  1.2.1. PI and co-PI:  Name: Sonja Mujcinovic   Institution: University of Valladolid  Address: Facultad de Filosofía y Letras, Paseo del Cauce s/n 47011, Valladolid (Spain)  Email: sonja.mujcinovic@uva.es    Name: Raquel Fernández Fuertes  Institution: University of Valladolid  Address: Facultad de Filosofía y Letras, Paseo del Cauce s/n 47011, Valladolid (Spain)  Email: raquelff@uva.es  1.2.2. Lab:  Name of the lab: UVALAL (University of Valladolid Language Acquisition Lab)  Institution: University of Valladolid  Address: https://uvalal.uva.es   Email: gir.uvalal@uva.es  1.2.3. People involved in the data collection and data transcription The collection of both the oral and the written data as well as the corresponding transcriptions of both data sets were done by Sonja Mujcinovic, Tamara Gómez Carrero and Luis Miguel Toquero Pérez. 1.3. Objectives  This investigation is focused on the contact between [+null subject] and [-null subject] languages. More specifically, it aims at characterizing the nature of crosslinguistic influence from the L1 into the L2 in the specific case of sentential subjects. The target language is L2 English (a [-null subject] language) and three different L1s are considered (i.e., Bosnian and Spanish as [+null subject] languages; and Danish as a [-null subject] language). The study addresses three issues: (i) the role of typology in terms of whether subjects in the L2 share the same parametric option as that in each participant’s L1; (ii) the role of the amount of L2 English exposure in institutional contexts (i.e., 2 and 4 years); and (iii) the role of modality in the data collection process (i.e., oral and written production data). A total of 78 sequential bilingual children with different language pairs have participated: 26 L1 Spanish-L2 English, 26 L1 Bosnian-L2 English and 26 L1 Danish-L2 English. Also 13 L1 English children participated as a control group. Both oral and written production data have been elicited via a free production task and a story telling task, respectively. 1.4. Funding sources  o 2018-2022: Spanish Ministry of Science, Innovation and Universities and European Regional Development Fund (ERDF) [PGC2018-097693-B-I00], Linguistic competence indicators in heritage and non-native languages: linguistic, psycholinguistic and social aspects of English-Spanish bilingualism, PRINCIPAL INVESTIGATOR: R. Fernández Fuertes (University of Valladolid, Spain) o 2017-2019: Regional Government of Castile and León (Spain) and ERDF [VA009P17], Aspectos de la dimensión internacional del contacto de lenguas: diagnósticos de la competencia lingüística bilingüe inglés-español, PRINCIPAL INVESTIGATOR: R. Fernández Fuertes (University of Valladolid, Spain)  1.5. Citing information  Publications using this dataset (or any part of it) should cite this dataset as follows:     Mujcinovic, S. (2015). The analysis of subjects in the oral and written production of L2 English learners: transfer and language typology. In Pedro A. Fuertes-Olivera et al. (eds.), Current Work in Corpus Linguistics: Working with Traditionally-conceived Corpora and Beyond. Selected Papers from the 7th International Conference on Corpus Linguistics (CILC2015). Procedia Social and Behavioral Sciences. Amsterdam: Elsevier. 2. ACCESS INFORMATION  2.1. Licenses or restrictions: There are no licenses/restrictions placed on this data set. 2.2. Publications: A partial or total access to information contained in the database can be found at the UVALAL webpage (publications section, http://uvalal.uva.es/index.php/results/publications-2/)  3. METHODOLOGICAL INFORMATION  3.1. Data elicitation All the data included in the present investigation were collected from the child participants in the schools they attended to in the country where they lived (i.e., Spain, Denmark or Bosnia). The following two types of tasks were elicited: oral semi-guided interview and written story. 3.1.1. Oral data The data from the oral task were elicited via a semi-guided interview. The questions asked were formulated so that the participants would answer with complete sentences. If this was not the case, they were encouraged to do so. The participants were interviewed individually, and voice recorded at their schools. Each participant’s interview lasted 8 to 16 minutes. Even though a protocol with different topics proposed was established (e.g., family, hobbies, interests, school, preferences, music, friends, etc.), the participants were encouraged to talk about any desired topic. 3.1.2. Written data The data from the written task were elicited via a wordless picture sequence task adapted from the A1-ball story from the Edmonton Narrative Norms Instrument (ENNI) (Schneider et al. 2005; http://www.rehabresearch.ualberta.ca/enni/). The story consists of five pictures that show an elephant and a giraffe playing with a ball. 3.2. Data transcription procedure All the oral stories and all the written narratives were transcribed using the CHAT (Codes for the Human Analysis of Transcripts) transcription system from the CHILDES project (Child Language Data Exchange System) (MacWhinney 2000). 3.3. Data extraction procedure All the sentential subjects, either overt or null, produced by the participants were manually extracted from each file, and they are compiled in the following csv files:  - SO-L2_Spanish.csv - SO-L2_Bosnian.csv - SO-L2_Danish.csv - SO-L1_English.csv 3.4. Data classification: variables    - Identifying variables: participant code - Demographic variables: group (i.e., group 1 has been exposed to L2 English for 2 years and group 2 has been exposed to L2 English for 4 years); age of the child (years); language (L1 of the participant); MLUw. - Linguistic variables:  * Modality Oral Written * Grammaticality: o SUBJECT TYPE: grammatical DPs [STDP] Proper nouns [STPropernoun] Overt pronouns [STOvert Pron] Null pronouns [STNullgram] o SUBJECT TYPE: ungrammatical Null subjects [STNullgram] * Adequacy: o ADEQUATE SUBJETCS DP for reference introduction [ADE1] DP for reference re-introduction [ADE2] Null subject for reference maintenance [ADE3] Pronouns for 1st person singular [ADE4] Pronouns for reference maintenance [ADE5] o NON-ADEQUATE SUBJECTS DP for reference maintenance [ADE6] Pronouns for reference introduction [ADE7] Pronouns for reference re-introduction [ADE8] 4. DATA  4.1. Database  The database contains the raw data with all the information related to the dataset, organized according to the three different types of variables, namely, identifying variables, demographic variables and linguistic variables, as seen in section 3.3. - SO-L2_Spanish.csv: it contains all the sentential subjects in English (either overt or null, and either grammatical or ungrammatical) produced by the L1 Spanish participants in the oral and in the written task. Number of variables = 34; number of rows = 26. - SO-L2_Bosnian.csv: it contains all the sentential subjects in English (either overt or null, and either grammatical or ungrammatical) produced by the L1 Bosnian participants in the oral and in the written task. Number of variables = 34; number of rows = 26. - SO-L2_Danish.csv: it contains all the sentential subjects in English (either overt or null, and either grammatical or ungrammatical) produced by the L1 Danish participants in the oral and in the written task. Number of variables = 34; number of rows = 26. - SO-L2_English.csv: it contains all the sentential subjects in English (either overt or null, and either grammatical or ungrammatical) produced by the L1 English participants in the oral and in the written task. Number of variables = 34; number of rows = 13. 4.2. Last update: 2022 5. RELATED DATASETS - Bilingual acquisition data: soraUVALAL dataset: https://uvadoc.uva.es/handle/10324/53750