Ahmadi, A., Darabi, A. (2016). Gender differential item functioning on a national field- specific test: The case of PhD entrance exam of TEFL in Iran. Iranian Journal of Language Teaching Research, 4(1), 63-82.
Alavi, S. M., Rezaee, A. & Amirian, S. M. R. (2011).Academic discipline DIF in an English
language proficiency test. Journal of English Language Teaching and Learning, 5(7): 39-
65
Alderman, D. L., & Holland, P. W. (1981). Item performance across native language groups on
the Test of English as a Foreign Language (TOEFL Research Report No. 9). Princeton,
NJ: Educational Testing Service.
Amirian, S. M. R., Alavi, S. M., & Fidalgo, A. M. (2014). Detecting gender DIF with an English
proficiency test in EFL context. Iranian Journal of Language Testing, 4(2), 187-203.
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. H. H. Wainer (Ed.), Differential item functioning (pp. 3-23). Hillsdale, NJ, England: Lawrence Erlbaum Associates, Inc.
Bailey, K. (1999). Washback in language testing (TOEFL Monograph Series 15). Princeton, NJ: Educational Testing Service.
Barati, H., & Ahmadi, A., R. (2010). Gender-based DIF across the Subject Area: A Study of the Iranian National University Entrance Exam. Journal of Teaching Language Skills, 2(3), 1-26.
Barati, H., Ketabi, S., Ahmadi, A.(2006).Differential item functioning in high stakes tests: the effect of field of study. IJAL, 19(2), 27-42.
Breland, H., Lee, Y.-W., Najarian, M., & Muraki, E. (2004). An analysis of the TOEFL CBT
writing prompt difficulty and comparability of different gender groups (TOEFL Research
Report No. 76). Princeton, NJ: Educational Testing Service.
Bridgeman, B. and Wendler, C. 1991. Gender differences in predictors of college mathematics performance and in college mathematics classes. Journal of Educational Psychology, 83(2): 275–284.
Carlton, S. T &Harris, A. M. (1992). Patterns of gender differences on mathematics items on the Scholastic Aptitude Test. Applied Measurement in Education, 6(2), 137-151. doi: 10.1207/s15324818ame06023
Curley, W. and Schmitt, A. P. 1993: Revising SAT-Verbal items to eliminate Differential Item Functioning. College Board Report 93-2. New York: College Entrance Examination Board
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and
standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp.
35–66). Hillsdale, NJ: Erlbaum
Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to
assessing unexpected differential item functioning on the Scholastic Aptitude Test.
Journal of Educational Measurement, 23, 355–368.
Fidalgo, A. M., Alavi, S. M. & Amirian, S. M. R. (2014). Strategies for testing statistical and practical significance in detecting DIF with logistic regression models. Language Testing, first published on April 11, 2014 as doi:10.1177/0265532214526748 pp. 1-19
Hemmati, Baghaei, & Bemani (2016). Cognitive diagnostic modeling of L2 reading comprehension ability: providing feedback on the reading performance of Iranian candidates for the university entrance examination, International Journal of Language Testing, 6, 92-100.
Hou, L., de la Torre, J. d., & Nandakumar, R. (2014). Differential item functioning assessment in cognitive diagnostic modeling: Application of the Wald test to investigate DIF in the DINA model. Journal of Educational Measurement, 51(1), 98-125.
Jang, E. E. (2005). A validity narrative: Effects of reading skills diagnosis on teaching and learning in the context of NG TOEFL (Doctoral dissertation). University of Illinois at Urbana–Champaign.
Jang, E. E. (2009). Cognitive diagnostic assessment of L2 reading comprehension ability: Validity arguments for fusion model application to LanguEdge assessment. Language Testing, 26, 31-73.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size
measure with the logistic regression procedure for DIF detection. Applied Measurement in
Education, 14(4), 329–349.
Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258-272.
Kanarek, EA. (1988). Gender differences in freshman performance and their relationship to use of the SAT in admissions. Paper read at the annual meeting of the Regional Association for Institutional Research. October, at Providence, RI
Karami, H. (2011). Detecting gender bias in a language proficiency test. International Journal of Language Studies, 5(2).
Lawrence, I. M. and Curley, W. E. 1989: Differential Item Functioning for males and females on SAT-Verbal Reading subscore items: follow-up study. Educational Testing Service Research Report 89-22. Princeton, NJ: ETS
Lawrence, I. M. , Curley, W. E. and McHale, F. J. 1988: Differential item functioning for males and females on SAT verbal reading subscore items. Report No. 88-4. New York: College Entrance Examination Board.
Lee, Y.-W., Breland, H., & Muraki, E. (2005). Comparability of TOEFL CBT writing prompts
for different native language groups. International Journal of Testing, 5, 131–158.
Li, F. (2008). A modified higher-order DINA model for detecting differential item functioning
and differential attribute functioning (Doctoral dissertation). University of Georgia, Athens.
Li, H. (2011). A cognitive diagnostic analysis of the MELAB
reading test. Spaan Fellow, 9, 17-46.
Liu, O. L., Schedl, M., Malloy, J., & Kong, N. (2009). Does content knowledge affect TOEFL
iBT™ reading performance? A confirmatory approach to differential item functioning
(Research Report No. RR-09-29). Princeton, NJ: Educational Testing Service.
Messick, S. (1996). Validity of performance assessments. In G. W. Phillips (Ed.), Technical issues in large-scale performance assessment (pp. 1-18). Washington, DC: U.S. Department of Education, National Center for Education Statistics.
Milewski, G. B., & Baron, P. A. (2002, April). Extending DIF methods to inform aggregate
report on cognitive skills. Paper presented at the meeting of the National Council on Measurement in Education, New Orleans, LA.
Ravand, H., & Firoozi, T. (2016). Examining construct validity of the master’s UEE using the Rasch model and the six aspects of the Messick's framework. International Journal of Language Testing, 6(1).
Ravand, H., & Robitzsch, A. (2015). Cognitive diagnostic modeling using R. Practical Assessment, Research & Evaluation, 20(11), 1–12.
Rezaee, A., & Shabani, E. (2010). Gender differential item functioning analysis of the University
of Tehran English Proficiency Test. Pazhuhesh-e Zabanha-ye Khareji, 56, 89-108.
Ryan, K., & Bachman, L. (1992). Differential item functioning on two tests of EFL proficiency.
Language Testing, 9, 12–29.
Salehi, M., & Tayebi, A. (2012). Differential item functioning (DIF) in terms of gender in the
reading comprehension subtest of a high stakes test. Iranian Journal of Applied
Language Studies, l4(1), 135-168
Schmitt, A. and Dorans, N. (1990). Differential item functioning for minority examinees on the SAT . Journal of Educational Measurement 27, 67-81 .
Shealy, R., & Stout, W. F. (1993a). An item response theory model for test bias. In P. W.
Holland & H. Wainer (Eds.), Differential item functioning (pp. 197–329). Hillsdale, NJ:
Lawrence Erlbaum.
Tatsuoka, K. K. (1983). Rule-space: An approach for dealing with misconceptions based on
item response theory. Journal of Educational Measurement, 20, 345–354.
Wall, D., & Horák, T. (2008). The impact of changes in the TOEFL examination on teaching and learning in Central and Eastern Europe—Phase 2, coping with change. Princeton, NJ: ETS
Zhang, W. (2006). Detecting differential item functioning using the DINA Model (Doctoral
dissertation). University of North Carolina, Greensboro.