In-Context Learning을 통한 서·논술형 평가에서의 대형 언어 모델과 교사 간 채점 및 피드백 정합성 향상 방안 | sam

HOME
학술논문
- 학술논문
사회과학
- 인문학
- 사회과학
- 자연과학
- 공학
- 의약학
- 농수해양
- 예술체육
- 복합학
- 경제경영
- 법학
- 어문학
교육학

학술논문

In-Context Learning을 통한 서·논술형 평가에서의 대형 언어 모델과 교사 간 채점 및 피드백 정합성 향상 방안

이용수 0

영문명: Enhancing Alignment Between Large Language Models and Teacher in Open-Ended Assessment through In-Context Learning
발행기관: 한국교원대학교 뇌·AI기반교육연구소
저자명: 이주영 이은아 김강래
간행물 정보: 『Brain, Digital, & Learning』제15권 제3호, 375~401쪽, 전체 27쪽
주제분류: 사회과학 > 교육학
파일형태: PDF
발행일자: 2025.09.30

6,040원

구매일시로부터 72시간 이내에 다운로드 가능합니다.
이 학술논문 정보는 (주)교보문고와 각 발행기관 사이에 저작물 이용 계약이 체결된 것으로, 교보문고를 통해 제공되고 있습니다.

1:1 문의

국문 초록

This study investigates the effectiveness of in-context learning (ICL) in enhancing the agreement between human teachers and large language models (LLMs) in the context of open-ended assessments. Using a dataset of 485 student responses to six open-ended questions from Korean, Technology, and Social Studies subjects administered in 2024, teacher-generated scores and feedback were collected alongside LLM-generated outputs under varying ICL conditions. Specifically, we provided GPT-4.1 with 0 to 20 examples in prompts to examine whether increasing example count improves agreement between the model and human raters. Quadratic Weighted Kappa (QWK) was used to assess score alignment, and BERTScore measured semantic similarity between teacher and model feedback. Regression and mixed-effects analyses revealed that increasing the number of examples generally improved alignment up to a certain threshold. The strongest improvements occurred with fewer than six examples, beyond which the benefits plateaued or even declined. Additionally, prompt length negatively moderated the effect of example count, suggesting that longer prompts may reduce the model’s capacity to focus on relevant information. These results provide practical guidance for teachers using LLMs in openended assessments. Including teacher-generated examples in prompts helps models align more closely with human scoring and feedback. However, the optimal number of examples depends on the type of question and expected answer length: more examples benefit shorter responses, while fewer examples (five or fewer) are more effective for longer or more complex answers.

영문 초록

키워드

In-context learning few-shot learning few-shot prompting large language models automated scoring feedback alignment ai-assisted assessment reliability in educational assessment

국문 초록

영문 초록

목차

키워드

해당간행물 수록 논문

참고문헌

관련논문

사회과학 > 교육학분야 BEST

사회과학 > 교육학분야 NEW

최근 이용한 논문

APA

MLA