A comparability study of two Italian language proficiency tests for adult migrants

Size: px
Start display at page:

Download "A comparability study of two Italian language proficiency tests for adult migrants"

Transcription

1 A comparability study of two Italian language proficiency tests for adult migrants Policy and Practice in Language Testing and Assessment Paola MASILLO, PhD

2 Scope of the study Study on the Italian language test for non-eu citizens asking for long-term residence permit in Italy, as provided by law: Legislative decree 286/1998, art. 9, Immigration Act as modified by law 94/2009; Ministerial decree June 4th 2010 on Procedures of the Italian language test Proficiency level required: CEFR A2

3 Research context I. Integration (Salt, 2005; Blommaert, Van Avermaet, 2008; Krumm, Plutzar, 2008; Extra, Spotti, 2009; Extra, Spotti, Van Avermaet, 2009b; Niessen, Huddleston, 2010; Van Avermaet, 2010; Extramiana, Van Avermaet, 2011; Joppke, 2012) Resolution 1437 (2005) Migration and integration: a challenge and an opportunity for Europe White paper on intercultural dialogue. Living together as equals in dignity (CoE, 2008) European Parliament Resolution, 14 March 2013, On the integration of migrants, its effects on the labour market and the external dimension of social security coordination (2012/2131(INI). Resolution 68 (18) on the teaching of languages to migrant workers host country language ability = index of integration

4 Research context II. Testing regimes (Foucault, 1979; Spolsky, 1997; McNamara, 1998, 2000, 2004, 2005, 2008; Shohamy, 2001a, 2005, 2009; Davies, 2003; Blackledge, 2009; Extra, Spotti, Van Avermaet, 2009a; Hogan-Brun, Mar- Molinero, Stevenson, 2009a; Saville, 2009; Shohamy, McNamara, 2009; Karami, 2013). The Shibboleth test (Book of Judges, 12:6) language test = instrument of political and social control

5 Reasearch context III. The role of the Common European Framework of Reference for Languages in the linguistic integration policies (Trim, 1996; Mar-Molinero, Stevenson 2006; Krumm, 2007; Shohamy, 2007; Barni, 2008, 2010a, 2010b, 2014; Van Avermaet 2009, 2010; Extra, Spotti, Van Avermaet, 2009a, 2009b; Fulcher, 2010; Thalgott, 2010; Van Avermaet, 2010; McNamara, 2011; Blommaert, Leppänen, Spotti, 2012) CEFR = operational tool in language policies of integration and immigration

6 Research context LANGUAGE REQUIREMENTS Italy Before entering Residence Citizenship No legislation Other purposes 7 Figure 1 SOURCE: C. Extramania, R. Pulinx, P. Van Avermaet (2014) Linguistic Integration of adult migrants: Policy and practice. Draft Report on the 3rd Council of Europe Survey

7 Research context ASSESSMENT TOOL Italy Language test 3 4 Language test or official certificate of language proficiency 6 Figure 2 SOURCE: C. Extramania, R. Pulinx, P. Van Avermaet (2014) Linguistic Integration of adult migrants: Policy and practice. Draft Report on the 3rd Council of Europe Survey Informal or alternative assessment Prior to entry Residence Citizenship 0 1 No further details

8 Overview of the study 1. (Italian) language ability = instrument to determine access to residency Test = diagnostic tool and powerful tool of discrimination and social exclusion 2. Decentralised assessment procedures Issues of fairness, reliability and ethics Ministerial guidelines (MIUR, 2010) 3. Test construct: listening, reading, written interaction Issues of validity low degree of conformity to the test criterion

9 Aims and objectives Test usefulness (Bachman & Palmer, 1996) o Construct validity o Reliability o Impact Comparability study (Bachman, Kunnan, Vanniaraian, Lynch, 1988; DeMauro, 1992; Bachman, Davidson, Foulkes, 1993; Bachman, Davidson, Ryan, Choi, 1995; Shin, 2005; Weir, Wu, 2006; Wu, Wu, 2012; Weir, Chan, Nakatsuhara, 2013) o Test content validity and fairness o Test scores reliability and comparability Validity and reliability of the assessment scale (Huot, 1990; Shohamy, Gordon, Kraemer, 1992; Lumley, McNamara, 1995; Kroll, 1998; Bacha, 2001; Lumley, 2002; Weigle, 2002; Brown, 2003; Fulcher, Reiter, 2003; Sawaki, 2007; Wang, 2010; Wu, Ma, 2013)

10 Research question (1) The geography of the Italian test Geographical factors Socio-political implications Can decentralised assessment procedures affect test fairness, test validity and test reliability? Pass rate by location SOURCE: Ministry of the Interior - Department for Civil Liberties and Immigration Central Directorate for immigration and asylum policies ( )

11 Research question (2) Score distribution at the national level Failed tests Can we demonstrate the comparability of the different tests designed and administered in the various examination centers? Figure 3 SOURCE: Ministry of the Interior - Department for Civil Liberties and Immigration Central Directorate for immigration and asylum policies ( )

12 Test criterion Research question (3) Assessment criteria: consistency and appropriateness Communicative language competence Consistency Appropriateness Do the test specifications CEFR level A2 (i.e. assessment criteria) lead to valid tests and reliable scores? Assessment criteria Test is performed in a complete and correct way answers are provided consistently and appropriately to the information required or the form is filled in all its parts points Test is performed in a partial way answers are not always provided consistently and appropriately to the information required or the form is filled in partially 1-28 points Test is not evaluable answers are not provided or the form is not completed 0 point Do the test specifications (i.e. assessment criteria) lead to University valid of tests Copenhagen and reliable

13 Study procedures and sample characteristics First year: data collection Database implementation Outcomes survey No. 83 Italian language tests Piemonte 90.2% Veneto 71.6% Selection of two examples of tests

14 Study procedures and sample characteristics Second year: trial Test administration Sample of test takers (no. 157) Sociolinguistic survey Information questionnaire Test administration: The two tests are normally given to an appropriate single group of learners with a short period between the two administrations (Weir, Wu, 2006) Procedures according to the Ministerial guidelines (Ministry of Education, 2010)

15 Study procedures and sample characteristics Third year: data analysis Stage I Standard setting Content analysis through the correlation to the CEFR descriptors Stage II Statistical analyses Measurement of test reliability (equivalent forms) following the model of the Classical Test Theory Stage III Validation of the assessment scale Assessment of a sample of written performances (no. 20/test) and judgment analysis

16 Study procedures and sample characteristics Stage I Standard setting: 11 judges Linking the test (items) to the relevant CEFR level Level of appropriateness of the test to the CEFR level A2 (Likert scale 0-3) Stage II Statistical analyses: Item difficulty and discrimination Reliability: Cronbach alpha Descriptive statistics: Mean, mode, median group Range, standard deviation dispersion Skewness and kurtosis distribution Correlations: Spearman rank correlation coefficient T-tests: Paired sample t-tests; Wilcoxon Signed Rank Test Stage III - Validation of the assessment scale: Task analysis (clarity, appropriateness, relevance for the test taker) Assessment of a sample of written performances (no. 20/test) Validation of the assessment criteria (consistency and appropriateness)

17 Findings and Discussion Research question (1): the unfair testing 1. Standard setting Specification mean values A1 = 1 A2 = 2 B1 = 3 Mean Test 1 Test 2 Listening Listening Reading Reading Frequency distribution maximum score 3. Descriptive statistics mean Out of 10 points Test 1 Test 2 Listening 63.7% Listening 28.7% Reading 28.7% Reading 17.8% Test 1 Test 2 Listening 9.31 Listening 8.46 Reading 8.20 Reading 7.13

18 Findings and Discussion Research question (2): the non-equivalent testing 1. Reliability (internal consistency) Cronbach alpha 2. Correlations Shared variance 15% (Listening) 23% (Reading) Cronbach's alpha Listening 1 Listening 2 Test Test Correlations Test 2 Listening Test 2 Reading Test 2 Test 1 Listening.390 Test 1 Reading.483 Test T-tests Paired sample t-tests Wilcoxon Signed Rank Test Listening t value Sig. 2 tailed.000 Reading t value Sig. 2 tailed.000

19 Findings and Discussion Research question (3): the low validity and low reliability of the assessment scale 1. Assessment of a sample of written performances N Rating process (Statistics) Written Interaction Written Interaction Test 1 Test 2 Valid Missing 0 0 Mean Mode Std. deviation Range Inter-rater reliability 16 out of 55 are good in both cases (.7 or upwards) 2. Validation of the assessment scale 11 out of 11 raters slightly acceptable Supplementary assessment criteria effective communication

20 Final considerations Lack of culture in Language Testing and Assessment Low test validity (construct) Lack of fairness and formalization of assessment procedures Decentralised assessment procedures Test developers not properly trained Test as an instrument of power for integrative purposes (seemingly) No reflection on the educational needs Need for monitoring and validation studies of the assessment tools Collaboration with a guarantee of quality to check assessment procedures Promotion of studies on the impact of language policies on the social integration of adult migrants

21 Thank you! Policy and Practice in Language Testing and Assessment Paola MASILLO, PhD