This CS_DP_gender_readme.txt file was generated on 2022-07-15 by Raquel Fernández Fuertes & Tamara Gómez Carrero. INDEX OF THE CS_DP_gender_dataset (Codeswitching experimental data: grammatical gender in DPs) 1. GENERAL INFORMATION 1.1. Title of dataset 1.2. Author information 1.2.1. PI and co-PI 1.2.2. Lab 1.2.3. People involved in the data collection 1.3. Objectives 1.4. Funding sources 1.5. Citing information 2. ACCESS INFORMATION 2.1. Licenses or restrictions 2.2. Publications 3. METHODOLOGICAL INFORMATION 3.1. Data elicitation and participant groups 3.1.1. Offline data 3.1.2. Online data 3.2. Data codification procedure 3.3. Data extraction procedure 3.4. Data classification procedure: variables 3.4.1. Offline dataset variables 3.4.2. Online database variables 4. DATA 4.1. Database 4.2. Last update 5. RELATED DATASETS 1. GENERAL INFORMATION 1.1. Title of dataset: CS_DP_gender_dataset 1.2. Author Information 1.2.1. PI and co-PI: Name: Raquel Fernández Fuertes Institution: University of Valladolid Address: Facultad de Filosofía y Letras, Paseo del Cauce s/n 47011, Valladolid (Spain) Email: raquelff@uva.es Name: Tamara Gómez Carrero Institution: University of Valladolid Address: Facultad de Filosofía y Letras, Paseo del Cauce s/n 47011, Valladolid (Spain) Email: tamara.gomez.carrero@uva.es 1.2.2. Lab: Name of the lab: UVALAL (University of Valladolid Language Acquisition Lab) Institution: University of Valladolid Address: https://uvalal.uva.es Email: gir.uvalal@uva.es 1.2.3. People involved in the data collection The offline data were collected and codified by Esther Álvarez de la Fuente, Patricia Carro García, Raquel Fernández Fuertes, Tamara Gómez Carrero, Juliana Naleiro, Noelia Recio Fernández and Pablo Sánchez Martín. The online data were collected by Esther Álvarez de la Fuente, Raquel Fernández Fuertes, Tamara Gómez Carrero and Sonja Mujcinovic. 1.3. Objectives This investigation focuses on the role of Spanish grammatical gender in English-Spanish switched determiner phrases (DPs). In particular, the aim of this study is to determine (i) which directionality is preferred and easier to process (e.g., English determiner switches -1- vs. Spanish determiner switches-2-) and (ii) which gender agreement mechanisms are preferred and easier to process in the case of Spanish determiner switches. Regarding the latter, two mechanisms are under consideration: the analogical criterion (AC), i.e., whether there is gender agreement between the Spanish determiner and the Spanish translation equivalent of the English noun ([+AC] switches) (2) or the lack of it ([-AC] switches) (3 and 4); and the masculine as default, i.e., whether masculine determiners are preferred and easier to process regardless of the gender of the translation equivalent of the English noun (5). (1) The casa (“the house”) (2) La house / el book → la-FEM. house-SP FEM. / el-MASC. book-SP MASC. (3) El house → el-MASC. house-SP FEM. (4) La book (“the book”) → la-FEM. book-SP MASC. (5) El house / book → el-DEFAULT house-SP FEM. / book-SP MASC. [Note. FEM.=feminine; MASC.=masculine; SP=Spanish] 1.4. Funding sources - 2018-2022: Spanish Ministry of Science, Innovation and Universities and European Regional Development Fund (ERDF) [PGC2018-097693-B-I00], Linguistic competence indicators in heritage and non-native languages: linguistic, psycholinguistic and social aspects of English-Spanish bilingualism, PRINCIPAL INVESTIGATOR: R. Fernández Fuertes (University of Valladolid, Spain) - 2019-2022: Predoctoral fellowship, Regional Government of Castile and León (Spain) and European Social Fund (ESF) (ORDEN EDU/556/2019, 5 junio), El género gramatical en la alternancia de código entre inglés y español: los sintagmas determinantes y las construcciones copulativas, recipient: T. Gómez Carrero (University of Valladolid, Spain), supervisor: R. Fernández Fuertes (University of Valladolid, Spain) - 2019: University of Valladolid & Banco Santander, UVa predoctoral fellowship, El género gramatical en la alternancia de código entre inglés y español: los sintagmas determinantes y las construcciones copulativas, recipient: T. Gómez Carrero (University of Valladolid, Spain), supervisor: R. Fernández Fuertes (University of Valladolid, Spain) - 2017-2019: Regional Government of Castile and León (Spain) and ERDF [VA009P17], Aspectos de la dimensión internacional del contacto de lenguas: diagnósticos de la competencia lingüística bilingüe inglés-español, PRINCIPAL INVESTIGATOR: R. Fernández Fuertes (University of Valladolid, Spain) - 2010-2012: International Council for Canadian Studies & Department of Foreign Affairs and International Trade [Canada-Europe Award], Minority and majority languages in Canada and Spain: English, French and Spanish as first, second and heritage languages, PRINCIPAL INVESTIGATOR: R. Fernández Fuertes (University of Valladolid, Spain) - 2006-2008: Regional Government of Castile and León (Spain) [VA046A06], Lenguas en contacto [inglés/español] en el contexto de Castilla y León: adquisición de L1 y L2, PRINCIPAL INVESTIGATOR: R. Fernández Fuertes (University of Valladolid, Spain) - 2002-2005: Spanish Ministry of Science and Technology and ERDF [BFF2002-00442], La teoría lingüística y el análisis de los sistemas bilingües simultáneos del inglés y del español, PRINCIPAL INVESTIGATOR: R. Fernández Fuertes (University of Valladolid, Spain) 1.5. Citing information Publications using this dataset (or any part of it) should cite this dataset as follows: Fernández Fuertes, R. and J.M. Liceras. 2018. Bilingualism as a first language: language dominance and crosslinguistic influence. In Cuza, A. and P. Guijarro-Fuentes (eds.). Language Acquisition and Contact in the Iberian Peninsula. De Gruyter. 2. ACCESS INFORMATION 2.1. Licenses or restrictions: There are no licenses/restrictions placed on this data set. 2.2. Publications: A partial or total access to information contained in the database can be found at the UVALAL webpage (publications section, http://uvalal.uva.es/index.php/results/publications/) 3. METHODOLOGICAL INFORMATION 3.1. Data elicitation and participant groups Two types of data were collected for the present investigation. The offline data were collected at the University of Valladolid and at the International School of Valladolid (Spain); at the Bayside Comprehensive School and at the Westside School in Gibraltar (UK); and at the University of Florida (USA). The online data were collected at the University of Valladolid, at the International School of Valladolid and at the Francisco Pino Primary School (Spain); at the University of Gibraltar and at the John Mackintosh Hall in Gibraltar (UK). In the case of young children, warming-up sessions were conducted so that they were familiar with both the researchers and the tasks. 3.1.1. Offline data The offline data were elicited via two tasks. The first task was an acceptability judgment task in which participants were presented an image representing a scene and a dialogue (a question with an answer containing a switched DP). Each participant had to rate the answer to the question by choosing an emoticon representing a value (very bad, bad, good and excellent). The second task was a sentence completion task in which participants were presented a scene and a sentence with an English noun preceded by a gap. Each participant had to complete the sentence by producing a determiner in Spanish. Once finished, participants performed a post task. They were presented with the images from the sentence completion task again and they had to name each element represented in each scene by using a full Spanish DP. The tasks were designed in collaboration with Juana M. Liceras from the LAR-LAB of the University of Ottawa (Canada). These were pen-and-paper tasks. Participants were tested in groups in a room where they were shown a PowerPoint presentation with the images and the dialogues. Each participant wrote down their rates on an answer sheet. If the participant was too young, the tasks were completed individually in an oral mode with the help of the researcher. In the case of the offline data, 340 participants have taken part in this study, organized into the following groups: [Note. L1=first language or mother tongue; L2=second language; L3=third language; HL=heritage language] - L1 Spanish – L2 English (114 adults, 56 children) - L1 Spanish – HL English (8 adults, 32 children) - L1 English – HL Spanish (31 adults and 11 children from Florida and 14 adults and 34 children from Gibraltar) - L1 English – L2 Spanish (8 adults, 24 children) - L1 English – L3 Spanish (1 adult) - L2 English participants whose L1 was different from Spanish (3 adults, 2 children) 3.1.2. Online data The online data were elicited via an eyetracking during reading task. Participants were asked to read sentences on a computer screen while their eye movements were recorded. An EyeLink Portable Duo was used which sampled eye-movements at 1000 Hz (degrees of visual angle were at 0.67 (horizontally) and 0.44 (vertically) at 600 mm of viewing distance). Some of the sentences were followed by comprehension questions in which participants had to answer yes or no by clicking a gamepad button. Participants were tested individually in a quiet room and at least one researcher was present during the testing. The task was organized into four blocks so participants could take breaks when needed. In the case of the online data, 116 participants carried out the online experimental task. They have been organized into the following groups: - L1 Spanish – L2 English (32 adults, 33 children) - L1 Spanish – HL English (9 adults, 7 children) - L1 English – HL Spanish (15 adults and 1 children) - L1 English – L2 Spanish (19 adults) 3.2. Data codification procedure The codification of the offline data depended on the type of task. In the case of the acceptability judgment task, the answers were codified by using numbers from 1 to 4 (1= very bad; 2=bad; 3=good; 4=excellent). In the case of the sentence completion task and its post-task, answers were codified as expected (1) and as non-expected (0). If there were other answers, they were given a 555 value in the acceptability judgment task and a 2 value in the sentence completion task. In the case of the online data, the eyetracking measures used in the analysis of the data were selected by using the DataViewer software (SR-Research). The fixations in each region or interest area (pre-target region, the two target regions and the post-target region) were extracted, organized and cleaned using the DataViewer. After that, the data were transferred to an Excel document. 3.3. Data extraction procedure All the data collected and codified are compiled in the following csv files: - CS_DP_gender _offline.csv - CS_DP_gender _online.csv 3.4. Data classification: variables Both databases are in long format (one answer per row): 3.4.1. Offline dataset variables:   - Identifying variables: participant code; item number; list (A, B); expected answer (only for PRT and PRT POST-TASK). - Demographic variables: group; L1 (ES=Spanish; EN=English); HL (ES=Spanish; EN=English); L2; L3 (FR= French; CA=Catalan; PT=Portuguese; DA= Danish; DE= German; IT=Italian; GL= Galician; HI= Hindi; HU=Hungarian; EU= Basque; ZH= Chinese; AR= Arabic; NL= Dutch; GA= Irish; BG= Bulgarian; RO= Romanian); age of the participant (years); sex (F = female; M = male); place of present residence (Florida, Gibraltar, UK, Spain). - Linguistic variables: - Task: AJT_conc = acceptability judgment task PRT = sentence completion task (production task) PRT POST-TASK = production post-task - Structure: Conc =experimental item Filler = filler item - Condition: - AJT_CONC: DEF = English determiner, Spanish feminine noun (e.g., the lluvia –“the rain”). DEM = English determiner, Spanish masculine noun (e.g., the suelo –“the floor”). DFF = Spanish feminine determiner, Spanish feminine translation equivalent of English noun (e.g., la chair – “the chair”). DMM = Spanish masculine determiner, Spanish masculine translation equivalent of English noun (e.g., el plane – “the plane”). DMF = Spanish masculine determiner, Spanish feminine translation equivalent of English noun (e.g., el window – “the window”). DFM = Spanish feminine determiner, Spanish masculine translation equivalent of English noun (e.g., la clock – “the clock”). X = filler - PRT: DFO = Spanish determiner, Spanish feminine canonical translation equivalent of English noun (e.g., la/el- ____ chair-SP silla). DFX = Spanish determiner, Spanish feminine non-canonical translation equivalent of English noun (e.g., la/el- ____skin-SP piel). DMO = Spanish determiner, Spanish masculine canonical translation equivalent of English noun (e.g., el/la- ____sky-SP cielo). DMX = Spanish determiner, Spanish masculine non-canonical translation equivalent of English noun (e.g., el/la-____plane-SP avión). - Answer: 1= very bad (AJT_conc) 2= bad (AJT_conc) 3= good (AJT_conc) 4= excellent (AJT_conc) 555 = no answer or more than one answer (AJT_conc) 0 = non-expected answer (PRT and PRT POST-TASK) 1 = expected answer (PRT and PRT POST-TASK) 2 = other (PRT and PRT POST-TASK) - RESPONSE = when PRT and POST-TASK answers are codified as 2, the actual answer the participant provided is included here. 3.4.2. Online database variables: - Identifying variables: participant code; trial_index (trial number); category (1 = pre-target region; 2 = determiner region; 3 = noun region; 4 = post-target region); IA_label (Interest Area label). - Demographic variables: group; proficiency level (beginner, intermediate, advanced, starters, movers, flyers); age of the participant (years); sex (F = female; M = male). - Linguistic variables: - Conditions: DM = English determiner, Spanish masculine noun (e.g., the libro – “the book”) DF = English determiner, Spanish feminine noun (e.g., the ventana – “the window”) FF = Spanish feminine determiner, Spanish feminine translation equivalent of English noun (e.g., la window – “the window”) MM = Spanish masculine determiner, Spanish masculine translation equivalent of English noun (e.g., el book – “the book”) FM = Spanish feminine determiner, Spanish masculine translation equivalent of English noun (e.g., la book –“the book”) MF = Spanish feminine determiner, Spanish feminine translation equivalent of English noun (e.g., el window – “the window”) - Eyetracking measures in milliseconds: IA_first_fix_progressive: 1= no higher interest areas in earlier fixations have been fixated before the first fixation in the current interest area; 0= higher interest areas in earlier fixations have been fixated before the first fixation in the current interest area. First fixation duration = requires IA_first_fix_progressive to be 1. Gaze duration = requires IA_first_fix_progressive to be 1. Regression path duration = requires IA_first_fix_progressive to be 1. Total fixation duration 4. DATA 4.1. Database The databases contain the raw data in long form with all the information related to the dataset. Each database is organized according to the variables described in section 3.4. CS_DP_gender_offline.csv: it contains the offline data elicited via the acceptability judgment task, the sentence completion task and its post-task from 340 participants organized into diverse groups according to their linguistic backgrounds. Number of variables = 17; number of rows = 19538. CS_DP_gender_online.csv: it contains the online data elicited via the eyetracking during reading task from 116 participants organized into diverse groups according to their linguistic backgrounds. The data were arranged according to the fixations in each region (pre-target, the two target regions and the post-target region). Number of variables = 14; number of rows = 15285. 4.2. Last update: 2022 5. RELATED DATASETS - Bilingual acquisition data: CS_copula_gender_dataset: https://uvadoc.uva.es/handle/10324/54597 CS_person_dataset: https://uvadoc.uva.es/handle/10324/54599