- 영문명
- Lexical Features in Automated Assessment of Korean Writing Proficiency: A Corpus-Based Machine Learning Approach
- 발행기관
- 국제한국어교육학회
- 저자명
- 안현수(Hyunsoo Ahn)
- 간행물 정보
- 『한국어교육』36권 2호, 129~164쪽, 전체 36쪽
- 주제분류
- 어문학 > 한국어와문학
- 파일형태
- 발행일자
- 2025.05.31

국문 초록
This study investigates and validates lexical features for the automated assessment of Korean learners’ writing proficiency. Using Korean Language learners’ corpus (n=14,992), features were extracted across five categories: lexical diversity, lexical density, lexical difficulty, morpheme usage, and text length. Pearson correlation analysis demonstrated statistically significant correlations between these features and learner proficiency levels. Classification models using Random Forest and XGBoost were trained to evaluate predictive performance. Lexical difficulty and morpheme usage emerged as the most effective predictors, with features such as the number of Level 4 vocabulary types, the ratio of Level 1 words, and sentence-final endings showing high importance. Combining all feature types, the integrated model achieved 87.9% accuracy with the XGBoost model, outperforming individual models and confirming that lexical difficulty and grammatical complexity are key indicators in automated writing assessment. This research highlights the potential of linguistic features in developing reliable, high-accuracy machine learning models for assessing second language writing proficiency.
영문 초록
목차
1. 서론
2. 선행 연구
3. 연구 방법
4. 자질 중요도 분석
5. 결론
참고문헌
키워드
해당간행물 수록 논문
참고문헌
최근 이용한 논문
교보eBook 첫 방문을 환영 합니다!
신규가입 혜택 지급이 완료 되었습니다.
바로 사용 가능한 교보e캐시 1,000원 (유효기간 7일)
지금 바로 교보eBook의 다양한 콘텐츠를 이용해 보세요!
