Gender in DIF in the Reading Comprehension Section of the University Entrance Examination for the applicants into English programs at Iranian Universities Using Diagnostic classification Modeling Approach

Document Type : research article

Author

English Department, Vali-e-Asr University of Rafsanjan, Iran

Abstract

The present study investigates )DIF) in the reading comprehension items of the university entrance examination for the applicants into the English programs at Iranian universities. A diagnostic classification (DCM)-based method of DIF detection was used. A comparison was also made between the DCM-based method and the traditional Mantel-Hanzel method in which the matching variable was total score. To this end item responses of 10000 test takers who took the test in 2014 were analyzed using the CDM and difR packages in R. based on the DCM approach 1 item and based on the Mantel-Hanzel approach 2 items were flagged for moderate DIF. As to the construct validity of the test under study, it can be concluded that the results are generalizable across gender. A methodological implication of the present study is that when the matching variable in DIF detection methods is total score, more items are likely to be flagged for DIF, as compared with when the attribute profiles of the test takers are used as the matching variable.

Keywords


Ahmadi, A., Darabi, A. (2016). Gender differential item functioning on a national field-    specific test: The case of PhD entrance exam of TEFL in Iran. Iranian Journal of       Language Teaching Research, 4(1), 63-82.
Alavi, S. M., Rezaee, A. & Amirian, S. M. R. (2011).Academic discipline DIF in an English
language proficiency test. Journal of English Language Teaching and Learning, 5(7): 39-
65
Alderman, D. L., & Holland, P. W. (1981). Item performance across native language groups on
the Test of English as a Foreign Language (TOEFL Research Report No. 9). Princeton,
NJ: Educational Testing Service.
Amirian, S. M. R., Alavi, S. M., & Fidalgo, A. M. (2014). Detecting gender DIF with an English
proficiency test in EFL context. Iranian Journal of Language Testing, 4(2), 187-203.
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. H. H. Wainer (Ed.), Differential item functioning (pp. 3-23). Hillsdale, NJ, England: Lawrence Erlbaum Associates, Inc.
Bailey, K. (1999). Washback in language testing (TOEFL Monograph Series 15). Princeton, NJ: Educational Testing Service.
Barati, H., & Ahmadi, A., R. (2010). Gender-based DIF across the Subject Area: A Study of the Iranian National University Entrance Exam. Journal of Teaching Language Skills, 2(3), 1-26.
Barati, H., Ketabi, S., Ahmadi, A.(2006).Differential item functioning in high stakes tests: the effect of field of study. IJAL, 19(2), 27-42.
Breland, H., Lee, Y.-W., Najarian, M., & Muraki, E. (2004). An analysis of the TOEFL CBT
writing prompt difficulty and comparability of different gender groups (TOEFL Research
Report No. 76). Princeton, NJ: Educational Testing Service.
Bridgeman, B. and Wendler, C. 1991. Gender differences in predictors of college mathematics performance and in college mathematics classes. Journal of Educational Psychology, 83(2): 275–284.
Carlton, S. T &Harris, A. M. (1992). Patterns of gender differences on mathematics items on the Scholastic Aptitude Test. Applied Measurement in Education, 6(2), 137-151. doi: 10.1207/s15324818ame06023
 
Curley, W. and Schmitt, A. P. 1993: Revising SAT-Verbal items to eliminate Differential Item Functioning. College Board Report 93-2. New York: College Entrance Examination Board
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and
standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp.
35–66). Hillsdale, NJ: Erlbaum
Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to
assessing unexpected differential item functioning on the Scholastic Aptitude Test.
Journal of Educational Measurement, 23, 355–368.
Fidalgo, A. M., Alavi, S. M. & Amirian, S. M. R. (2014). Strategies for testing statistical and   practical significance in detecting DIF with logistic regression models. Language Testing, first published on April 11, 2014 as doi:10.1177/0265532214526748 pp. 1-19
Hemmati, Baghaei, & Bemani (2016). Cognitive diagnostic modeling of L2 reading comprehension ability: providing feedback on the reading performance of Iranian candidates for the university entrance examination, International Journal of Language Testing, 6, 92-100.
Hou, L., de la Torre, J. d., & Nandakumar, R. (2014). Differential item functioning assessment in cognitive diagnostic modeling: Application of the Wald test to investigate DIF in the DINA model. Journal of Educational Measurement, 51(1), 98-125.
Jang, E. E. (2005). A validity narrative: Effects of reading skills diagnosis on teaching and learning in the context of NG TOEFL (Doctoral dissertation). University of Illinois at Urbana–Champaign.
Jang, E. E. (2009). Cognitive diagnostic assessment of L2 reading comprehension ability: Validity arguments for fusion model application to LanguEdge assessment. Language Testing, 26, 31-73.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size
measure with the logistic regression procedure for DIF detection. Applied Measurement in
Education, 14
(4), 329–349.
Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258-272.
Kanarek, EA. (1988). Gender differences in freshman performance and their relationship to use of the SAT in admissions. Paper read at the annual meeting of the Regional Association for Institutional Research. October, at Providence, RI
Karami, H. (2011). Detecting gender bias in a language proficiency test. International Journal of Language Studies, 5(2).
Lawrence, I. M. and Curley, W. E. 1989: Differential Item Functioning for males and females on SAT-Verbal Reading subscore items: follow-up study. Educational Testing Service Research Report 89-22. Princeton, NJ: ETS
Lawrence, I. M. , Curley, W. E. and McHale, F. J. 1988: Differential item functioning for males and females on SAT verbal reading subscore items. Report No. 88-4. New York: College Entrance Examination Board.
Lee, Y.-W., Breland, H., & Muraki, E. (2005). Comparability of TOEFL CBT writing prompts
for different native language groups. International Journal of Testing, 5, 131–158.
Li, F. (2008). A modified higher-order DINA model for detecting differential item functioning
and differential attribute functioning
(Doctoral dissertation). University of Georgia, Athens.
Li, H. (2011). A cognitive diagnostic analysis of the MELAB
reading test. Spaan Fellow, 9, 17-46.
Liu, O. L., Schedl, M., Malloy, J., & Kong, N. (2009). Does content knowledge affect TOEFL
iBT™ reading performance? A confirmatory approach to differential item functioning
(Research Report No. RR-09-29). Princeton, NJ: Educational Testing Service.
Ma, W. & de la Torre, J. (2017). GDINA: The generalized DINA model framework. R package version 1.4.2. Retrived from https://CRAN.R-project.org/package=GDINA
Messick, S. (1996). Validity of performance assessments. In G. W. Phillips (Ed.), Technical issues in large-scale performance assessment (pp. 1-18). Washington, DC: U.S. Department of Education, National Center for Education Statistics.
Milewski, G. B., & Baron, P. A. (2002, April). Extending DIF methods to inform aggregate
report on cognitive skills
. Paper presented at the meeting of the National Council on Measurement in Education, New Orleans, LA.
Ravand, H. (2016). Application of a cognitive diagnostic model to a high-stakes reading comprehension test. Journal of Psychoeducational Assessment, 34(8), 782-799. http://dx.doi.org/10.1177/0734282915623053
Ravand, H., & Firoozi, T. (2016). Examining construct validity of the master’s UEE using the Rasch model and the six aspects of the Messick's framework. International Journal of Language Testing, 6(1).
Ravand, H., & Robitzsch, A. (2015). Cognitive diagnostic modeling using R. Practical Assessment, Research & Evaluation, 20(11), 1–12.
Rezaee, A., & Shabani, E. (2010). Gender differential item functioning analysis of the University
of Tehran English Proficiency Test. Pazhuhesh-e Zabanha-ye Khareji, 56, 89-108.
Robitzsch, A., Kiefer, T., George, A. C., & Uenlue, A. (2017). CDM: Cognitive diagnosis modeling. R package version 6.0-101. https://CRAN.R-project.org/package=CDM
Ryan, K., & Bachman, L. (1992). Differential item functioning on two tests of EFL proficiency.
Language Testing, 9, 12–29.
Salehi, M., & Tayebi, A. (2012). Differential item functioning (DIF) in terms of gender in the
reading comprehension subtest of a high stakes test. Iranian Journal of Applied
Language Studies, l4
(1), 135-168
Schmitt, A. and Dorans, N. (1990). Differential item functioning for minority examinees on the SAT . Journal of Educational Measurement 27, 67-81 .
Shealy, R., & Stout, W. F. (1993a). An item response theory model for test bias. In P. W.
Holland & H. Wainer (Eds.), Differential item functioning (pp. 197–329). Hillsdale, NJ:
Lawrence Erlbaum.
Tatsuoka, K. K. (1983). Rule-space: An approach for dealing with misconceptions based on
item response theory. Journal of Educational Measurement, 20, 345–354.
Wall, D., & Horák, T. (2008). The impact of changes in the TOEFL examination on teaching and learning in Central and Eastern Europe—Phase 2, coping with change. Princeton, NJ: ETS
Zhang, W. (2006). Detecting differential item functioning using the DINA Model (Doctoral
dissertation). University of North Carolina, Greensboro.