Paper 1: LXPER Index: A Curriculum-specific Text Readability Assessment Model for EFL Students in Korea
Abstract: Automatic readability assessment is one of the most important applications of Natural Language Processing (NLP) in education. Since automatic readability assessment allows the fast selection of appropriate reading material for readers at all levels of proficiency, it can be particularly useful for the English education of English as Foreign Language (EFL) students around the world. However, most readability assessment models are developed for the native readers of English and have low accuracy for texts in non-native English Language Training (ELT) curriculum. We introduce LXPER Index, which is a readability assessment model for non-native EFL readers in the ELT curriculum of Korea. To measure LXPER Index, we use the mixture of 22 features which we prove to be significant in text readability assessment. We also introduce the Text Corpus of the Korean ELT Curriculum (CoKEC-text), which is the first collection of English texts from a non-native country’s ELT curriculum with each text’s target grade level labeled. In addition, we assembled the Word Corpus of the Korean ELT Curriculum (CoKEC-word), which is a collection of words from the Korean ELT curriculum with word difficulty labels. Our experiments show that our new model, trained with CoKEC-text, significantly improves the accuracy of automatic readability assessment for texts in the Korean ELT curriculum. The methodology used in this research can be applied to other ELT curricula around the world.
Keywords: Natural language processing; machine learning; text readability assessment; EFL (English as Foreign Language) education