Alderson, J.C., Clapham, C. and Wall, D. (1995). Language Test Construction and Evaluation. Cambridge University Press: Cambridge.
Assessment Reform Group (2002). Assessment for learning: 10 principles. Retrieved November 2012 from http://assessmentreformgroup.files.wordpress.com/2012/01/10principles_english.pdf. Cambridge, UK: University of Cambridge. School of Education.
Bachman, L. F., and Palmer, A. S. (2010). Language Testing in Practice: Oxford: Oxford University Press.
------------ (1996). Language Testing in Practice: Oxford: Oxford University Press.
Black, P., Harrison, C., Lee, C., Marshall, B., and William, D. (2004). Working inside the black box: Assessment for learning in the classroom. Phi Delta Kappan, 86(1), 8–21.
Brown J.D. and Hudson, T.D. (1998). The alternatives in language assessment. TEOSL Quarterly, 32 653-75.
Buck, G. (2001). Assessing listening. Cambridge, UK: Cambridge University Press.
Cook, G. (2010). Translation in Language Teaching. Oxford: Oxford University Press.
Davidson, F. (2012). Releasability of Language Test Specifications. Japan Language Testing Association (JLTA) Journal, 15, 1-23.
Davidson, F. and Lynch, B.K. (2002). Testcraft: A teacher’s guide to writing and using language test specification. New Haven: Yale University Press.
DeMars, C. (2010). Item Response Theory (Understanding statistics: measurement). Oxford University Press: Oxford.
Farhady, H. (2007). Teaching and testing EFL in Iran: global trends and local dilemmas. TELL, 2 75-98.
Fulcher, G. and Davidson, F. (2007). Language Testing and Assessment: An Advanced Resource Book. Oxford: Routledge.
Johnson, K. and Johnson, H. (1999). Encyclopedic dictionary of applied linguistics. Oxford: Blackwell Publishers.
Kennedy, C.A., Wilson, M., Draney, K., Tutunciyan S., and Vorp, R. (2006). ConstructMap software. Berkeley Evaluation and Assessment Research (BEAR) Center. Berkeley, CA.
Li, J. (2006). Introducing audit trails to the world of language testing. Unpublished master’s thesis. University of Illinois at Urbana-Champaign, USA: Division of English as an International Language.
Linacre, J.M. (2010). A User's Guide to WINSTEPS®. Retrieved May 2, 2010 from http://www.winsteps.com/
Linacre, J.M. (2010b) Winsteps® (Version 3.70.0) [Computer Software]. Beaverton, Oregon:Winsteps.com.
------------ (2002). What do infit and outfit, mean-square and standardized mean? Rasch Measurement Transactions 16:2 p.878.
------------ (1994). Sample Size and Item Calibration Stability. Rasch Measurement Transactions 1994 7:4.
McNamara, T.F. (1996). Measuring second language performance. Longman: Harlow.
McNamara, T. and Roever, C. (2006). Language testing: the social dimension. London, UK: Blackwell Publishing.
Morizot, J., Ainsworth, A.T. and Reise, S.P. (2007). Toward modern psychometrics: Application of item response theory models in personality research. In R. W. Robins, R. C. Fraley, & R. F. Krueger (Eds.), Handbook of Research Methods in Personality Psychology (pp. 407-423). New York: Guilford.
Purpura, J. E. (2004). Assessing grammar. Cambridge: Cambridge University Press.
Shin, D. (2012). Item writing and writers. In G, Fulcher and F, Davidson (Eds.): Routledge handbook of language testing. London: Routledge.
Spolsky, B. (2010). Language testing in historical and future perspective. In E., Shohamy and N., Hornberger (eds). Encyclopedia of language and education (2nd edition) Volume 7: Language testing and assessment: Springer: New York.
-------------- (1998). Sociolinguistics. Oxford: Oxford university press.
Turner, C.E. (2012). Classroom assessment. In G, Fulcher and F, Davidson (Eds.): Routledge handbook of language testing. London: Routledge.
Widdowson, H.G. (2003). Defining issues in English language teaching. Oxford: Oxford University Press.
William, D. (2011). What is assessment for learning? Studies in Educational Evaluation (37), 3-14.