OPTIMISING ASSESSMENT SYSTEM IN THE ESP COURSE THROUGH THE USE of THE METHODS OF DIFFERENTIAL ITEM FUNCTIONING AND DIFFERENTIAL TEST FUNCTIONING IN FINAL TEST DESIGN

Authors

DOI:

https://doi.org/10.35433/pedagogy.2(101).2020.156-165

Keywords:

higher education, academic performance, English for Specific Purposes, assessment system, test design, language testing, differential item functioning method, differential test functioning method.

Abstract

The purpose of the research was to examine how the use of the methods of Differential item functioning and Differential test functioning contribute to the quality of the final assessment (FT-ESP) in the English for Specific Purposes course delivered to the graduate students at tertiary institutions. The study relies on two interventions intended to identify the correlation between the test design and the academic performance of the students in the ESP course through using Pearson’s correlation coefficient of the answered versus the unanswered questions. The first intervention test was similarly structured as the one for the second intervention and consisted of the same number of items. In the first intervention, a regular final ESP test was administered. In the second intervention, the originally designed test, which validity and reliability was analyzed using the methods of DIF and DTF, was performed. The test included three sub-domains such as: reading comprehension (15 items), structure (15 items), and compositional analysis (15 items). It has been found that the use of the methods of DIF and DTF boosts the quality of the assessment system in the ESP course delivered to the graduate students at tertiary institutions. It is advisable that the first step in DIF analyses be related to the use of statistical methods to detect the DIF items. It is also advisable to examine the effects of other potential factors on DIF such as item order and mother tongue effects along with unintended content specific factors to explain DIF effect in the context of language testing. The findings also imply that neither of methods addresses the issue of measurement bias, which might occur in tests, because it is complicated and cannot be addressed adequately using simple statistical or classical test theory methods. Further studies are needed to identify the ways of improving the assessment of speaking skills of the graduates of the tertiary institutions.

References

Camilli, G. (2006). Test fairness. In Educational Measurement,4th ed., 221-256. Westport: American Council on Education & Praeger Publishers [In English].

Camilli, G., & Penfield, D. (1997). Variance estimation for differential test functioning based on Mantel-Haenszel statistics. Journal of Educational Measurement, 34, 123-139. DOI: https://doi.org/10.1111/j.1745-3984.1997.tb00510.x [in English].

Gierl, M., Bisanz, J., Bisanz, G., Boughton, K., & Khaliq, S. (2005). Illustrating the utility of differential bundle functioning analyses to identify and interpret group differences in achievement tests. Educational Measurement: Issues and Practice, 20 (2), 26-36. DOI: https://doi.org/10.1111/j.1745-3992.2001.tb00060.x [in English].

Guo, H., Robin, F. & Dorans, N. (2017). Detecting Item Drift in Large Scale Testing. Journal of Educational Measurement, 54 (3), 265-284. DOI: https://doi.org/10.1111/jedm.12144 [in English].

Hunter, C. (2014). A simulation study comparing two methods of evaluating differential test functioning (DTF): DFIT and the Mantel-Haenszel/Liu-Agresti variance. Doctoral Dissertation, Georgia State University, Atlanta, GA, United States. Retrieved from: https://scholarworks.gsu.edu/cgi/viewcontent.cgi?article=1132&context=eps_diss [in English].

Jang, E.E., & Roussos, L. (2009). Integrative analytic approach to detecting and interpreting L2 vocabulary DIF. International Journal of Testing, 9 (3), 238-259. DOI: https://doi.org/10.1080/15305050903107022 [in English].

Kim, S-H. & Cohen, A.S. (1995). A Comparison of Lord's Chi-Square, Raju's Area Measures, and the Likelihood Ratio Test on Detection of Differential Item Functioning. Applied Measurement in Education, 8 (4), 291-312. DOI: https://doi.org/10.1207/s15324818ame0804_2 [in English].

Martinková, P., Drabinová, A., Liaw, Y., Sanders, E.A., McFarland, J.L., & Price, R.M. (2017). Checking Equity: Why Differential Item Functioning Analysis Should Be a Routine Part of Developing Conceptual Assessments. CBE-Life Sciences Education, 16 (2), 1-13. DOI: https://doi.org/10.1187/cbe.16-10-0307 [in English].

Penfield, R. & Algina, J. (2006). A generalized DIF effect variance estimator for measuring unsigned differential test functioning in mixed-format tests. Journal of Educational Measurement, 43, 295-312. DOI: https://doi.org/10.1111/j.1745-3984.2006.00018.x [in English].

Raju, N., Van der Linden, W., & Fleer, P. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19 (4), 353-368. DOI: https://doi.org/10.1177/014662169501900405 [in English].

Zhu, X., & Aryadoust, V. (2020). An investigation of mother tongue differential item functioning in a high-stakes computerized academic reading test. Computer Assisted Language Learning, 33, 1-24. DOI: https://doi.org/10.1080/09588221.2019.1704788 [in English].

Wedman, J. (2018). Reasons for Gender-Related Differential Item Functioning in a College Admissions Test. Scandinavian Journal of Educational Research, 62 (6), 959-970. DOI: 10.1080/00313831.2017.1402365 [in English].

Stark, S., Chernyshenko, O. S., & Drasgow, F. (2004). Examining the Effects of Differential Item (Functioning and Differential) Test Functioning on Selection Decisions: When Are Statistically Significant Effects Practically Important? Journal of Applied Psychology, 89 (3), 497-508. DOI:10.1037/0021-9010.89.3.497 [in English].

Downloads