Comparing Reliability and Item Difficulty of Multiple-Choice and Translation Questions on a Test of Grammatical Knowledge

Document Type : research article

Authors

1 Ph.D. in TEFL, English Department, Faculty of Foreign Languages and Literatures, University of Tehran, & Institute for Advanced studies in Basic Sciences, Tehran, I.R. Iran

2 Associate Professor, English Language and Literature, Faculty of Foreign Languages and Literatures, University of Tehran, Tehran, I.R. Iran

Abstract

Multiple-choice questions are commonly employed for testing language
knowledge. However, since in the recently proposed systematic approach to
item writing, using various item formats is desirable and there is very few
researches on the possibility of using translation items for giving variety to
items on a test. The present study aims at comparing reliability and item
difficulty of translation and multiple-choice item formats on a test of
grammatical knowledge. First, the specifications of the tests were composed of
two sections: one with translation items and the other with multiple-choice
items, and then reviewed by eight English teachers with some experience in
language testing. Then, the specifications were revised according to the
feedback. The test for the present study was prepared, and administered to 158
English learners with mixed proficiency levels. The data was analyzed using
both classical test theory and Rasch model. The results indicated that both item
formats had good reliability (r translation section = 0.88, r multiple-choice
section = 0.84). Also, generally, the items showed good fit in Rasch model.
Furthermore, an independent t-test did not find a significant difference between
two formats in terms of item difficulty (t= 1.696, df= 58, P>0.95). Therefore,
based on the finding of the study, it is suggested that, in a systematic approach
to item writing, translation item format can be employed alongside multiple –
choice item format for measuring grammatical knowledge.

Keywords


Alderson, J.C., Clapham, C. and Wall, D. (1995). Language Test Construction and Evaluation. Cambridge University Press: Cambridge.
Assessment Reform Group (2002). Assessment for learning: 10 principles. Retrieved November 2012 from http://assessmentreformgroup.files.wordpress.com/2012/01/10principles_english.pdf. Cambridge, UK: University of Cambridge. School of Education.
Bachman, L. F., and Palmer, A. S. (2010). Language Testing in Practice: Oxford: Oxford University Press.
------------ (1996). Language Testing in Practice: Oxford: Oxford University Press.
Black, P., Harrison, C., Lee, C., Marshall, B., and William, D. (2004). Working inside the black box: Assessment for learning in the classroom. Phi Delta Kappan, 86(1), 8–21.
Brown J.D. and Hudson, T.D. (1998). The alternatives in language assessment. TEOSL Quarterly, 32 653-75.
Buck, G. (2001). Assessing listening. Cambridge, UK: Cambridge University Press.
Cook, G. (2010). Translation in Language Teaching. Oxford: Oxford University Press.
Davidson, F. (2012). Releasability of Language Test Specifications. Japan Language Testing Association (JLTA) Journal, 15, 1-23.
Davidson, F. and Lynch, B.K. (2002). Testcraft: A teacher’s guide to writing and using language test specification. New Haven: Yale University Press.
DeMars, C. (2010). Item Response Theory (Understanding statistics: measurement). Oxford University Press: Oxford.
Farhady, H. (2007). Teaching and testing EFL in Iran: global trends and local dilemmas. TELL, 2 75-98.
Fulcher, G. and Davidson, F. (2007). Language Testing and Assessment: An Advanced Resource Book. Oxford: Routledge.
Johnson, K. and Johnson, H. (1999). Encyclopedic dictionary of applied linguistics. Oxford: Blackwell Publishers.
Kennedy, C.A., Wilson, M., Draney, K., Tutunciyan S., and Vorp, R. (2006). ConstructMap software. Berkeley Evaluation and Assessment Research (BEAR) Center. Berkeley, CA.
Li, J. (2006). Introducing audit trails to the world of language testing. Unpublished master’s thesis. University of Illinois at Urbana-Champaign, USA: Division of English as an International Language.
Linacre, J.M. (2010). A User's Guide to WINSTEPS®. Retrieved May 2, 2010 from http://www.winsteps.com/ 
Linacre, J.M. (2010b) Winsteps® (Version 3.70.0) [Computer Software]. Beaverton, Oregon:Winsteps.com.
------------ (2002). What do infit and outfit, mean-square and standardized mean? Rasch Measurement Transactions 16:2 p.878.
------------ (1994). Sample Size and Item Calibration Stability. Rasch Measurement Transactions 1994 7:4.
McNamara, T.F. (1996). Measuring second language performance. Longman: Harlow.
McNamara, T. and Roever, C. (2006). Language testing: the social dimension. London, UK: Blackwell Publishing.
Morizot, J., Ainsworth, A.T. and Reise, S.P. (2007). Toward modern psychometrics: Application of item response theory models in personality research. In R. W. Robins, R. C. Fraley, & R. F. Krueger (Eds.), Handbook of Research Methods in Personality Psychology (pp. 407-423). New York: Guilford.
Purpura, J. E. (2004). Assessing grammar. Cambridge: Cambridge University Press.
Shin, D. (2012). Item writing and writers. In G, Fulcher and F, Davidson (Eds.): Routledge handbook of language testing. London: Routledge.
Spolsky, B. (2010). Language testing in historical and future perspective. In E., Shohamy and N., Hornberger (eds). Encyclopedia of language and education (2nd edition) Volume 7: Language testing and assessment: Springer: New York.
-------------- (1998). Sociolinguistics. Oxford: Oxford university press.
Turner, C.E. (2012). Classroom assessment. In G, Fulcher and F, Davidson (Eds.): Routledge handbook of language testing. London: Routledge.
Widdowson, H.G. (2003). Defining issues in English language teaching. Oxford: Oxford University Press.
William, D. (2011). What is assessment for learning? Studies in Educational Evaluation (37), 3-14.