Test Fairness Analysis in Reading Comprehension PhD Nationwide Admission Test Items under CDA

Document Type : research article

Authors

1 PhD Candidate in Applied Linguistics, Islamic Azad University, Central Tehran Branch, Tehran, Iran

2 Assistant Professor of Applied Linguistics, Islamic Azad University, Science and Research Branch, Tehran, Iran

3 Associate Professor of Applied Linguistics, Islamic Azad University, Central Tehran Branch, Tehran, Iran

4 Assistant Professor of Assessment and Measurement, Kharazmi University, Tehran, Iran

Abstract

During the past few decades, documentation of test takers’ proficiency level has been accomplished through large-scale assessments most importantly Cognitive Diagnostic Assessment (CDA) in order to provide skills mastery profile of test takers in fine-grained detailed information. Accordingly, the present study attempts to scrutinize reading comprehension test items of a high-stakes test under CDA. To delve into this issue, DAF was used to detect the probability of mastery of attributes among test takers, and DIF was applied to show item performance among different candidates in terms of gender. Thus, the participants of this study were 3220 females and males attending PhD national admission test in Iran. Through adopting sequential exploratory mixed method design, GDINA model was run by the application of R-studio package. Results of the study revealed that test items suspected DIF against female. In the end, the findings of this study were discussed in light of their implications for language testing community to perceive potential social harm which derived from biased test items in PhD national admission exams.

Keywords


خدایی، ابراهیم. (1388). الف. بررسی عوامل مؤثر بر قبولی در آزمون کارشناسی ارشد. فصلنامه پژوهش و برنامه‌ریزی در آموزش عالی. شماره 54، صص 34–19. http://journal.irphe.ac.ir                
    نیستانی، محمدرضا. (1391). برنامه ریزی آموزشی راهبردهای بهبود کیفیت در سطح یک واحد (مدرسه، واحد دانشگاهی و آموزش مجازی) اصفهان: آموخته.
Amirian, S. M. R., Alavi, S. M., & Fidalgo, A. M. (2014). Detecting gender DIF
   with an English proficiency test in EFL context. Iranian Journal of Language
   Testing, 4(2).
Barati, H., & Ahmadi, A. R. (2010). Gender–based DIF across the subject area:
   A study of the Iranian National University Entrance Exam. The Journal of    
   Teaching Language Skills (JTLS), 2(3), 1–22.
   study-of-the-iranian-national-university-entrance-exam
Barnes, B. J., & Wells, C. S. (2009). Differential item functional analysis by
    gender and race of the national doctoral program survey. International Journal
    of Doctoral Studies, 4, 77–96.
Chen, J., Torre, J., & Zhang, Z. (2013). Relative and absolute fit evaluation in
   cognitive diagnosis modeling. Journal of Educational Measurement, 50(2),
    123–140.
De La Torre, J. (2009). A cognitive diagnosis model for cognitively based
    multiple-choice options. Applied Psychological Measurement, 33, 163–183.
DiBello, L. V., Roussos, L., & Stout, W. F. (2007). Review of cognitively
   diagnostic assessment and a summary of psychometric models. In: Rao CR,
   Sinharay S (eds) Handbook of statistics, vol 26. Amsterdam, Elsevier, pp 979–
   1030.
Doudeen, H., & Annabi, H. (2008). Sex-Related Differential Item Functioning
    (DIF) Analysis of TIMSS. Educational Sciences, 35(697).
Embretson (Whitely), S. E. (1983). Construct validity: Construct representation
   versus nomothetic span. Psychological Bulletin, 93, 179-197.
Falmagne, J. C., &Doignon, J. P. (1988). A class of stochastic procedures for
   assessment of knowledge. British Journal of Mathematical and Statistical
   Psychology, 41, 1–23. https://doi.org/10.1111/j.2044-8317
Finch, H. (2005). The MIMIC method as a method for detecting DIF:
    Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio.
   Applied Psychological Measurement, 29, 278–295.
Gao, L., & Rogers, W. T. (2010). Use of tree-based regression in the analyses of L2 reading test items. Language Testing, 28(2), 1–28.
Haagenars, J., & McCutcheon, A. (2002). Applied latent class analysis.
   Cambridge: Cambridge University Press.https://books.google.com/books
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of
   item response theory. Newbury Park, CA: Sage Publications.
Hartz, S. M. (2002). A bayesian framework for the unified model for assessing
   cognitive abilities: Blending theory with practicality. Unpublished doctoral
   dissertation, University of Illinois at Urbana-Champaign.
Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the
   Mantel- Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity
   (pp. 129–145). Hillsdale, NJ: Lawrence Erlbaum.
Hou, L., de la Torre, J., & Nandakumar, R. (2014). Differential item functioning
   assessment in cognitive diagnosis modeling: Applying Wald test to investigate
   DIF for DINA model. Journal of Educational Measurement, 51, 98–125.
Jang, E. E. (2005). A validity narrative: Effects of reading skills diagnosis on
   teaching and  learning in the context of NG TOEFL. Unpublished doctoral
   dissertation, University of Illinois, Urbana-Champaign.
Jang, E. E. (2009). Cognitive diagnostic assessment of L2 reading
   comprehension ability: Validity arguments for Fusion Model application to
   LanguEdge assessment. Language Testing, 26(1), 31–73.
Leighton, j., & Gierl, M. (Eds). (2007). Cognitive diagnostic assessment for
   education: Theory and applications. Cambridge University Press.
Li, F. M. (2008). A modified higher-order DINA model for detecting differential
   item functioning and differential attribute functioning. Unpublished doctoral
   dissertation, University of Georgia.
Li, H. (2011). A cognitive diagnostic analysis of the MELAB reading test. Spaan
  Fellow, 9, 17– 46.
Li, H. & Suen, H. K.  (2013). Constructing and validating a Q-matrix for
   cognitive diagnostic analyses of a reading test, Educational Assessment, 18(1),   
Li, X. & Wang, W. C. (2015). Assessment of differential Iiem functioning under
   cognitive diagnosis models: The DINA model example. Journal of Educational
   Measurement, 52, 28–54. https://doi.org/10.1111/jedm.12061
Lim, Y. (2015). Cognitive diagnostic model comparisons. PhD Dissertation
   submitted to the Georgia Institute of Technology.
  2015.pdf
Lord, F. M. (1980). Applications of item response theory to practical testing
  problems. Hills-dale, NJ: Lawrence Erlbaum.
Mantel, N. (1963). Chi-square tests with one degree of freedom: Extensions of
   the Mantel-Haenszel procedure. Journal of the American Statistical
   Association, 58, 690–700. https://doi.org/10.1080/01621459.1963.10500879
Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data
   from retrospective studies of disease. Journal of the National Cancer Inst, 22,
Pae, T. I. (2004). Gender effect on reading comprehension with Korean EFL
   learners. System, 32(2), 265–281. https://doi.org/10.1016/j.system.2003.09.009
Penfield, R. D., & Camilli, G. (2007). “Differential item functioning and item
   bias”.  In C.R. Rao & S. Sinharay (Vol. Eds.), Handbook of statistics, Vol. 26
   (pp. 125 – 167), Elsevier. https://doi.org/10.1016/S0169-7161(06)26005-X
Ranjbaran, F., & Alavi, S. M. (2017). Developing a reading comprehension test
    for cognitive diagnostic assessment: A RUM analysis. Studies in Educational
    Evaluation, 55, 167–179.
Roever, C. (2007). DIF in the Assessment of second language pragmatics.
   Language Assessment Quarterly, 4(2), 165–189.
Rupp, A. A., & J., Templin. (2008). Unique characteristics of diagnostic
    classification models: a comprehensive review of the current state-of-the-art.
    Meas Interdiscip Res Perspect, 6, 219–262.
Rupp, A. A, Templin, J, & R. A., Henson. (2010). Diagnostic measurement:
   theory, methods, and applications. Guilford, New York.
Shanmugam, S. K. S., & Lan, O. S. (2014). The validity of administering
   bilingual  mathematics test among malasian bilingual students using
   Differential Item Function (DIF). Asia Pacific Journal of Educators and
Shealy, R., & Stout, W. F. (1993a). An item response theory model for test bias.
   In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 197–
   329). Hillsdale, NJ: Lawrence Erlbaum.
Shealy, R., & Stout, W. F. (1993b). A model-based standardization approach that
   separates true bias/DIF from group differences and detects test bias/DTF as
   well as item bias/DIF. Psychometrika, 58, 159–194.
Snow, R. E., & Lohman, D. F. (1989). Implications of cognitive psychology for
   educational measurement. American Council on Education.
Song, X., Cheng, L., & Klinger, D. (2015). DIF investigations across groups of
   gender and academic background in a large-scale high-stakes language test.
Swaminathan, H. & Rogers, H. J. (1990). Detecting differential item functioning
   using logistic regression procedures. Journal of Educational Measurement, 27,
Tatsuoka, K. K. (1983). Rule space: an approach for dealing with misconception
   based on item response theory. Journal of Educational Measurement, 20(4),
Tatsuoka, K. K. (1990). Toward an integration of item-response theory and
   cognitive error diagnosis. In N. Fredericksen, R. Glaser, A. Lesgold, & M. G.
   Shafto (Eds.), Diagnostic monitoring of skill and knowledge acquisition (pp.
   453–488). Hillsdale, NJ: Erlbaum.
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in
    the study of group differences in trace lines. In H. Wainer & H. I. Braun
    (Eds.), Test validity (pp. 147–169). Hillsdale NJ: Erlbaum.
Young, J. W., Morgan, R., Rybinski, P., Steinberg, J., & Wang, Y. (2013).
    Assessing the Test Information Function and Differential Item Functioning for
    the TOEFL Junior® Standard Test. ETS Research Report Series, 1, i-27.
Zumbo, B. D. (1999). A Handbook on the theory and methods of differential item
   functioning (DIF): Logistic regression modeling as a unitary framework for
   binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of
   Human Resources Research and Evaluation, Department of National Defense.
Zumbo, B. D. (2007). Three generations of DIF analysis: Considering where
   it has been, where it is now, and where it is going. Language Assessment
   Quarterly, 4(2), 223–233. https://doi.org/10.1080/15434300701375832