This CS_copula_gender_readme.txt file was generated on 2022-07-15 by Raquel Fernández Fuertes & Tamara Gómez Carrero. INDEX OF THE CS_copula_gender_dataset (Codeswitching experimental data: grammatical gender in copula constructions with codeswitching) 1. GENERAL INFORMATION 1.1. Title of dataset 1.2. Author information 1.2.1. PI and co-PI 1.2.2. Lab 1.2.3. People involved in the data collection 1.3. Objectives 1.4. Funding sources 1.5. Citing information 2. ACCESS INFORMATION 2.1. Licenses or restrictions 2.2. Publications 3. METHODOLOGICAL INFORMATION 3.1. Data elicitation 3.1.1. Offline data 3.1.2. Online data 3.1.2.1. Visual world paradigm task 3.1.2.2. Reaction time task in Gorilla 3.2. Data codification procedure 3.3. Data extraction procedure 3.4. Data classification procedure: variables 3.4.1. Offline dataset variables 3.4.2. Online dataset variables (visual world paradigm task) 3.4.3. Online dataset variables (reaction time task in Gorilla) 4. DATA 4.1. Database 4.2. Last update 5. RELATED DATASETS 1. GENERAL INFORMATION 1.1. Title of dataset: CS_copula_gender_dataset 1.2. Author Information 1.2.1. PI and co-PI: Name: Raquel Fernández Fuertes Institution: University of Valladolid Address: Facultad de Filosofía y Letras, Paseo del Cauce s/n 47011, Valladolid (Spain) Email: raquelff@uva.es Name: Tamara Gómez Carrero Institution: University of Valladolid Address: Facultad de Filosofía y Letras, Paseo del Cauce s/n 47011, Valladolid (Spain) Email: tamara.gomez.carrero@uva.es 1.2.2. Lab: Name of the lab: UVALAL (University of Valladolid Language Acquisition Lab) Institution: University of Valladolid Address: https://uvalal.uva.es Email: gir.uvalal@uva.es 1.2.3. People involved in the data collection The offline data were collected and codified by Esther Álvarez de la Fuente, Patricia Carro García, Raquel Fernández Fuertes, Tamara Gómez Carrero, Juliana Naleiro, Noelia Recio Fernández and Pablo Sánchez Martín. The online data were collected by Esther Álvarez de la Fuente, Raquel Fernández Fuertes, Tamara Gómez Carrero and Sonja Mujcinovic. 1.3. Objectives This investigation focuses on the role of Spanish grammatical gender in English-Spanish switched copulative constructions with an adjective as subject complement (1). In particular, the aim of this study is to determine (i) which directionality is preferred and easier to process (i.e., switches in which Spanish provides the subject determiner phrase (DP) and English provides the adjective -1- or structures in which the subject DP is in English while the adjective is in Spanish -2-); and (ii) which gender agreement mechanisms are preferred and easier to process in copulative switches in which Spanish provides the adjective. Regarding the latter, two mechanisms are under consideration: the analogical criterion (AC), i.e., whether there is gender agreement between the Spanish adjective and the translation of the equivalent of the English DP subject ([+AC] switches; 1) or the lack of the analogical criterion ([-AC] switches; 3); and the masculine as default, i.e., whether masculine adjectives are preferred and easier to process regardless of the gender of the translation equivalent of the English DP (4 and 5). (1) la casa is beautiful (“the house is beautiful”) (2) the house es bonita → the house-SP FEM. es bonita-FEM. (3) the house es bonito → the house-SP FEM. es bonito-MASC. (4) the book es bonita (“the book is beautiful”) → the book-SP MASC. es bonita-FEM. (5) the book es bonito → the book-SP MASC. es bonito-DEFAULT (6) the house es bonito → the house-SP FEM. es bonito-DEFAULT [Note. FEM.=feminine; MASC.=masculine; SP=Spanish] 1.4. Funding sources - 2018-2022: Spanish Ministry of Science, Innovation and Universities and European Regional Development Fund (ERDF) [PGC2018-097693-B-I00], Linguistic competence indicators in heritage and non-native languages: linguistic, psycholinguistic and social aspects of English-Spanish bilingualism, PRINCIPAL INVESTIGATOR: R. Fernández Fuertes (University of Valladolid, Spain) - 2019-2022: Predoctoral fellowship, Regional Government of Castile and León (Spain) and European Social Fund (ESF) (ORDEN EDU/556/2019, 5 junio), El género gramatical en la alternancia de código entre inglés y español: los sintagmas determinantes y las construcciones copulativas, recipient: T. Gómez Carrero (University of Valladolid, Spain), supervisor: R. Fernández Fuertes (University of Valladolid, Spain) - 2019: University of Valladolid & Banco Santander, UVa predoctoral fellowship, El género gramatical en la alternancia de código entre inglés y español: los sintagmas determinantes y las construcciones copulativas, recipient: T. Gómez Carrero (University of Valladolid, Spain), supervisor: R. Fernández Fuertes (University of Valladolid, Spain) - 2017-2019: Regional Government of Castile and León (Spain) and ERDF [VA009P17], Aspectos de la dimensión internacional del contacto de lenguas: diagnósticos de la competencia lingüística bilingüe inglés-español, PRINCIPAL INVESTIGATOR: R. Fernández Fuertes (University of Valladolid, Spain) - 2010-2012: International Council for Canadian Studies & Department of Foreign Affairs and International Trade [Canada-Europe Award], Minority and majority languages in Canada and Spain: English, French and Spanish as first, second and heritage languages, PRINCIPAL INVESTIGATOR: R. Fernández Fuertes (University of Valladolid, Spain) - 2006-2008: Regional Government of Castile and León (Spain) [VA046A06], Lenguas en contacto [inglés/español] en el contexto de Castilla y León: adquisición de L1 y L2, PRINCIPAL INVESTIGATOR: R. Fernández Fuertes (University of Valladolid, Spain) - 2002-2005: Spanish Ministry of Science and Technology and ERDF [BFF2002-00442], La teoría lingüística y el análisis de los sistemas bilingües simultáneos del inglés y del español, PRINCIPAL INVESTIGATOR: R. Fernández Fuertes (University of Valladolid, Spain) 1.5. Citing information Publications using this dataset (or any part of it) should cite this dataset as follows: Liceras, J.M., R. Fernández Fuertes and R. Klassen. 2016. Language dominance and language nativeness: the view from English-Spanish code-switching. In Guzzardo Tamargo, R.E., C.M. Mazak and M.C. Parafita Couto (eds.). Spanish-English Codeswitching in the Caribbean and the US. John Benjamins. 2. ACCESS INFORMATION 2.1. Licenses or restrictions: There are no licenses/restrictions placed on this data set. 2.2. Publications: A partial or total access to information contained in the database can be found at the UVALAL webpage (publications section, http://uvalal.uva.es/index.php/results/publications) 3. METHODOLOGICAL INFORMATION 3.1. Data elicitation and participant groups Two types of data were collected for the present investigation. The offline data were collected at the University of Valladolid and at the International School of Valladolid (Spain); at the Bayside Comprehensive School and at the Westside School in Gibraltar (UK); and at the University of Florida (USA). The online data were collected in different ways depending on the specific online task. The visual world paradigm task was carried out by participants at the University of Valladolid, at the International School of Valladolid and at the Francisco Pino Primary School (Spain); at the University of Gibraltar and at the John Mackintosh Hall in Gibraltar (UK). The reaction times task with Gorilla was completed online at home by the adult participants and at the International School of Valladolid (Spain) by the child participants. In the case of young children, warming-up sessions were conducted so that they were familiar with both the researchers and the tasks. 3.1.1. Offline data The offline data were elicited via two tasks. The first task was an acceptability judgment task in which participants were presented an image representing a scene and a dialogue (a question with an answer containing a copulative construction with an adjective phrase as subject complement). Each participant had to rate the answer to the question by choosing an emoticon representing a value (very bad, bad, good and excellent). The second task was a sentence completion task in which participants were presented a scene and a sentence with an English DP followed by a Spanish copulative verb and a gap. Each participant had to complete the sentence by producing an adjective in Spanish. Once participants finished the sentence completion task, they performed a post task. They were presented with the images from the sentence completion task again and they had to name each element represented in each scene by using a full Spanish DP. The tasks were designed in collaboration with Juana M. Liceras from the LAR-LAB of the University of Ottawa (Canada). These were pen-and-paper tasks. Participants were tested in groups in a room where they were shown a PowerPoint presentation with the images and the dialogues. Each participant wrote down their rates on an answer sheet. If the participant was too young, the tasks were completed individually in oral mode with the help of the researcher. In the case of the offline data, 337 participants have taken part in this study, organized into the following groups: [Note. L1=first language or mother tongue; L2=second language; L3=third language; HL=heritage language] - L1 Spanish – L2 English (117 adults, 50 children) - L1 Spanish – HL English (8 adults, 32 children) - L1 English – HL Spanish (31 adults and 11 children from Florida; and 14 adults and 34 children from Gibraltar) - L1 English – L2 Spanish (8 adults, 24 children) - L1 English – L3 Spanish (1 adult, 2 children) - L2 English participants whose L1 was different from Spanish (3 adults, 2 children) 3.1.2. Online data The online data were elicited via two tasks: a visual world paradigm task and a reaction time task with Gorilla. 3.1.2.1. Visual world paradigm task The visual world paradigm task is an eyetracking task which combines auditory and visual stimuli. Participants were asked to observe an image with an English DP on it. After that, they had to fixate a cross in the middle of the screen in order to move to the following screen where they found four words in Spanish displayed on a grid (target, competitor, distractor1 and distractor2). While they were reading the words, they listened to a question mixing English and Spanish which was related to the image they had just seen. While they were reading the words, their eye movements were recorded with an EyeLink Portable Duo eyetracker which sampled eye movements at 1000 Hz (degrees of visual angle were at 0.67 horizontally and 0.44 vertically, at 600 mm of viewing distance). When they knew the answer to the question they had heard, they had to select one of the words from the grid by clicking a gamepad button. Participants were tested individually in a quiet room and at least one researcher was present during the testing. The task was organized into three blocks so they could take breaks when needed. The visual world paradigm task was completed by 101 participants organized into the following groups: - L1 Spanish – L2 English (32 adults, 41 children) - L1 Spanish – HL English (7 adults, 8 children) - L1 English – HL Spanish (13 adults from Gibraltar) 3.1.2.2. Reaction time in Gorilla task The reaction time task in Gorilla was the second experiment carried out to test codeswitching within copulative constructions with an adjective phrase as a subject complement. The task was created in the Gorilla platform (https://gorilla.sc/) which is a website for the creation and development of psycholinguistic experiments. Due to the online nature of the platform, participants could complete the task at home on a computer with internet connection. In the case of the child participants, they completed the task online at the International School of Valladolid (Spain) supervised by a researcher. In the task participants read a sentence and rated it by choosing an emoticon representing a value (very bad, bad, good, excellent). Reaction times and judgments were recorded. The reaction time task in Gorilla was completed by 67 participants organized into the following groups: - L1 Spanish – L2 English (35 adults, 27 children) - L1 Spanish – HL English (5 children) 3.2. Data codification procedure The codification of the offline data depended on the type of task. In the case of the acceptability judgment task, the answers were codified by using numbers from 1 to 4 (1=very bad; 2=bad; 3=good; 4=excellent). In the case of the sentence completion task and its post-task, answers were codified as expected (1) and as non-expected (0). If there were other answers, they were given a 555 value in the acceptability judgment task and a 2 value in the sentence completion task. In the case of the online experiments, the codification of the data depended on the task, too. Regarding the visual world paradigm task, the eyetracking measures used in the analysis of the data were selected by using the DataViewer software (SR-Research). In this case, the total fixation duration measure was used on each area (target, competitor, distractor1 and distractor2). At the same time, the region clicked (i.e., the word selected) and accuracy (i.e., if the region clicked corresponded to the target region) were also extracted. All the data were transferred to an Excel document. Regarding the reaction time task, Gorilla creates an Excel document which includes the answer given by the participant and the reaction times recorded for each item. The answers were codified with a number from 1 to 4 (1=very bad; 2=bad; 3=good; 4=excellent). 3.3. Data extraction procedure All the data collected and codified are compiled in the following csv files: -CS_copula_gender_offline.csv -CS_copula_gender_vwp.csv -CS_copula_gender_gorilla.csv 3.4. Data classification: variables The three databases are in long format (one answer per row). 3.4.1. Offline dataset variables: - Identifying variables: participant code; item number; list (A, B); expected answer (only for PRT and PRT POST-TASK). - Demographic variables: group; L1 (ES=Spanish; EN=English); HL (ES=Spanish; EN=English); L2; L3 (FR= French; CA=Catalan; PT=Portuguese; DA=Danish; DE=German; IT=Italian; GL=Galician; HI=Hindi; HU=Hungarian; EU=Basque; ZH=Chinese; AR=Arabic; NL=Dutch; GA=Irish; BG=Bulgarian; RO=Romanian); age of the participant (years); sex (F=Female; M=Male); place of present residence (Florida, Gibraltar, UK, Spain). - Linguistic variables: - Task: AJT_agr = acceptability judgment task PRT = sentence completion task (production task) PRT POST-TASK = production post-task - Structure: Agr =experimental item Filler = filler item - Condition: - AJT_Agr: CEFF = English DP subject with a feminine Spanish translation equivalent, Spanish feminine AdjP (e.g., the house es pequeña – “the house is small”). CEMM = English DP subject with a masculine Spanish translation equivalent, Spanish masculine AdjP (e.g., the book es grueso – “the book is thick”). CEMF = English DP subject with a masculine Spanish translation equivalent, Spanish feminine AdjP (e.g., the castle es preciosa – “the castle is beautiful”). CEFM = English DP subject with a feminine Spanish translation equivalent, Spanish masculine AdjP (e.g., the door es blanco – “the door is white”). CSF = Spanish feminine DP subject, English AdjP (e.g., la mesa is round – “the table is round”). CSM = Spanish masculine DP subject, English AdjP (e.g., el puente is long – “the bridge is long”). X = filler - PRT & POST-TASK: AFO = Spanish adjective, Spanish feminine canonical translation equivalent of the English noun (e.g., the door es___ - negra/negro – “the door is black”) AFX = Spanish adjective, Spanish feminine non-canonical translation equivalent of the English noun (e.g., the milk es___ - blanca/blanco – “the milk is white”) AMO = Spanish adjective, Spanish masculine canonical translation equivalent of the English noun (e.g., the book es___ - amarillo/amarilla – “the book is yellow”) AMX = Spanish adjective, Spanish masculine non-canonical translation equivalent of the English noun (e.g., the tree es ___ - amarillo/amarilla – “the tree is yellow”) - Answer: 1= very bad (AJT_agr) 2= bad (AJT_agr) 3= good (AJT_agr) 4= excellent (AJT_agr) 555 = no answer or more than one answer (AJT_agr) 0 = non-expected answer (PRT and PRT POST-TASK) 1 = expected answer (PRT and PRT POST-TASK) 2 = other (PRT and PRT POST-TASK) - RESPONSE = when PRT and POST-TASK answers are codified as 2, the actual answer the participant provided is included here. 3.4.2. Online dataset variables (visual world paradigm): - Identifying variables: participant code; trial number; interest area (1=target_area; 2=competitor_area; 3=distractor1_area; 4=distractor2_area); IA_label (label of the interest area: target_area, competitor_area, distractor1_area, distractor2_area). - Demographic variables: group; proficiency level (beginner, intermediate, advanced, starters, movers, flyers); age of the participant (years); sex (F=female; M=male). - Linguistic variables: - Conditions (related to the Spanish translation equivalent of the noun forming the English DP): FO = Spanish feminine canonical translation equivalent of the English noun (e.g., the window-SP ‘la ventana’) MO = Spanish masculine canonical translation equivalent of the English noun (e.g., the book-SP ‘el libro’) FX = Spanish feminine non-canonical translation equivalent of the English noun (e.g., the street-SP ‘la calle’) MX = Spanish masculine non-canonical translation equivalent of the English noun (e.g., the pencil-SP ‘el lápiz’) - Canonicity: Canonical Non-canonical - Gender: Feminine Masculine - Answer variables: - Accuracy: 1 = the answer is the target word 0 = the answer is different from the target word - Region_clicked: Among the four interest areas (target, competitor, distractor1 and distractor2), the region selected by the participant. - RT: Total amount of time in milliseconds which goes from the moment the words are shown on the grid until the participant selects one of them with the gamepad button. - Total_fixation_duration: The eyetracking measure to analyze the total amount of time in milliseconds spent on each interest area. 3.4.3. Online dataset variables (reaction time task in Gorilla): - Identifying variables: participant code; item_number. - Demographic variables: group; proficiency level (beginner, intermediate, advanced); age of the participant (years); sex (F=female; M=male). - Linguistic variables: - Condition: - Experimental items: CSF_0 = Spanish DP subject with a feminine canonical noun, English adjective (e.g., la silla is square – “the chair is square”). CSF_X = Spanish DP subject with a feminine non-canonical noun, English adjective (e.g., la fuente is noisy – “the fountain is noisy”). CSM_0 = Spanish DP subject with a masculine canonical noun, English adjective (e.g., el edificio is new – “the building is new”). CSM_X = Spanish DP subject with a masculine non-canonical noun, English adjective (e.g., el diente is yellow – “the tooth is yellow”). CEFF_0 = English DP subject with a Spanish feminine canonical translation equivalent, Spanish feminine adjective (e.g., the chair-SP ‘la silla’ es cómoda – “the chair is comfortable”). CEFF_X = English DP subject with a Spanish feminine non-canonical translation equivalent, Spanish feminine adjective (e.g., the milk-SP ‘la leche’ es sana – “the milk is healthy”). CEMM_0 = English DP subject with a Spanish masculine canonical translation equivalent, Spanish masculine adjective (e.g., the book-SP ‘el libro’ es grueso – “the book is thick”). CEMM_X = English DP subject with a Spanish feminine non-canonical translation equivalent, Spanish masculine adjective (e.g., the bridge-SP ‘el puente’ es ancho – “the bridge is wide”). CEMF_0 = English DP subject with a Spanish masculine canonical translation equivalent, Spanish feminine adjective (e.g., the castle-SP ‘el castillo’ es segura -the castle is safe”). CEMF_X = English DP subject with a Spanish masculine non-canonical translation equivalent, Spanish feminine adjective (e.g., the sun-SP ‘el sol’ es amarilla – “the sun is yellow”). CEFM_0 = English DP subject with a Spanish feminine canonical translation equivalent, Spanish masculine adjective (e.g., the door-SP ‘la puerta’ es blanco -the door is white”). CEFM_X = English DP subject with a Spanish feminine non-canonical translation equivalent, Spanish masculine adjective (e.g., the blood-SP ‘la sangre’ es rojo – “the blood is red”). - Filler and distractor items: D_GR_SP = Grammatical distractor starting in Spanish (e.g., mi hermano compró two shirts – “my broter bought two shirts”) D_UN_SP = Ungrammatical distractor starting in Spanish (e.g., madre mi tiene a grey dress – “my mother has a grey dress”) D_GR_EN = Grammatical distractor starting in English (e.g., your father writes poesía romántica – “your father writes romantic poetry”) D_UN_EN = Ungrammatical distractor starting in English (e.g., sister my drew estos cuadros – “my sister drew these paintings”) F_NN_GR_SP = Grammatical filler in Spanish with an N-N compound (e.g., mi vecino tiene un perro salchicha – “my neighbor has a dachshund” ) F_NN_UN_SP = Ungrammatical filler in Spanish with an N-N compound (e.g., mi padre diseñó mi nido cama – “my father designed my trundle bed”) F_NN_GR_EN = Grammatical filler in English with an N-N compound (e.g., my father needs his toolbox) F_NN_UN_EN = Ungrammatical filler in English with an N-N compound (e.g., I ate pasta with balls meat) F_DEV_GR_SP = Grammatical filler in Spanish with a deverbal compound (e.g, ayer compré un abrelatas – “yesterday I bought a can opener”) F_DEV_UN_SP = Ungrammatical filler in Spanish with a deverbal compound (e.g., necesito comprar un uñascorta – “I need to buy a nail clippers”) F_DEV_GR_EN = Grammatical filler in English with a deverbal compound (e.g., they bought a hair dryer) F_DEV_UN_EN = Ungrammatical filler in English with a deverbal compound (e.g., my aunt has a pen ball) - Canonicity: Canonical Non-canonical Filler Distractor - Lang_adj: Spanish English Filler Distractor - Gender_N: Masculine Feminine Filler Distractor - Item_type: Experimental Filler Distractor - Answer variables: - Response: 1= very bad 2= bad 3= good 4= excellent - RT: Time in milliseconds that indicates from the moment the participant starts reading the sentence until the participant judges the sentence by clicking on one of the emoticons. 4. DATA 4.1. Database The databases contain the raw data in long form with all the information related to the dataset. Each database is organized according to the variables described in section 3.4. CS_copula_gender_offline.csv: it contains the offline data elicited via the acceptability judgment task, the sentence completion task and its post-task from 337 participants organized into diverse groups according to their linguistic backgrounds. Number of variables = 17; number of rows = 20604. CS_copula_gender_vwp.csv: it contains the online data elicited via the visual world paradigm task from 101 participants organized into diverse groups according to their linguistic backgrounds. Number of variables = 15; number of rows = 9697. CS_copula_gender_gorilla.csv: it contains the online data elicited via the reaction time task in Gorilla from 67 participants organized into diverse groups according to their linguistic backgrounds. Number of variables = 13; number of rows = 7508. 4.2. Last update: 2022 5. RELATED DATASETS - Bilingual acquisition data: CS_DP_gender_dataset: https://uvadoc.uva.es/handle/10324/54598 CS_person_dataset: https://uvadoc.uva.es/handle/10324/54599