GPT 기반 창의성 평가 조건 탐색: 전문가 평가와의 비교를 중심으로 | sam

HOME
학술논문
- 학술논문
사회과학
- 인문학
- 사회과학
- 자연과학
- 공학
- 의약학
- 농수해양
- 예술체육
- 복합학
- 경제경영
- 법학
- 어문학
교육학

학술논문

GPT 기반 창의성 평가 조건 탐색: 전문가 평가와의 비교를 중심으로

이용수 30

영문명: GPT-based Creativity Assessment: Focusing on Comparison with Human Experts
발행기관: 한국창의력교육학회
저자명: 이해주(Haeju Lee) 정진민(Jin Min Chung) 김성연(Sungyeun Kim)
간행물 정보: 『창의력교육연구』제25권 제3호, 1~24쪽, 전체 24쪽
주제분류: 사회과학 > 교육학
파일형태: PDF
발행일자: 2025.09.30

5,680원

구매일시로부터 72시간 이내에 다운로드 가능합니다.
이 학술논문 정보는 (주)교보문고와 각 발행기관 사이에 저작물 이용 계약이 체결된 것으로, 교보문고를 통해 제공되고 있습니다.

1:1 문의

국문 초록

생성형 인공지능(AI)의 발전은 창의성 연구 영역에 새로운 변화를 불러오고 있으며, 특히 창의적 산물을 효과적으로 평가할 수 있는 AI 활용 가능성에 관한 연구가 증가하고 있다. 지금까지 창의적 산물 평가는 대부분 전문가의 합의적 평가 기법(CAT)을 사용해 왔다. 그러나 CAT는 높은 신뢰도를 확보하는 반면, 평가 과정에 많은 시간과 비용이 소모된다는 어려움이 있다. 따라서 전문가 평가를 보완하거나 대체할 수 있는 AI 기반 평가에 대한 탐색이 요구된다. 본 연구에서는 중학생 99명을 대상으로 실시한 창의적 제목 붙이기 과제에 대해 멀티모달 대규모 언어모델(LLM)인 GPT-4.1과 GPT-4o의 일관성은 Pearson 상관계수, Spearman 순위 상관계수, 평균제곱근오차를 지수로 제시하였다. 분석 결과를 바탕으로 CAT와 일관성이 높은 최적의 모델, 프롬프트 유형, 온도 조건을 도출하였으며, 창의성 교육 및 평가 현장에서 GPT 기반 평가를 효과적으로 활용할 수 있는 실용적인 지침을 제공하였다. 또한 본 연구는 CAT의 평가 원리를 반영한 AI 기반 평가 설계 및 적용을 위한 기초 자료로 활용될 수 있다.

영문 초록

The development of generative artificial intelligence (AI) is significantly reshaping creativity research, particularly regarding its potential for effectively assessing creative products. Evaluations of creative outputs have traditionally relied on expert-based Consensual Assessment Technique (CAT); however, CAT demands substantial time and resources to achieve high reliability. Thus, it has become necessary to investigate AI-driven evaluation methods capable of complementing or substituting expert assessments. This study compared creativity evaluations conducted using GPT-4.1 and GPT-4o—multi-modal large language models (LLMs)—on creative title-generation tasks by 99 middle school students with CAT evaluations provided by six creativity experts. Specifically, evaluations were repeatedly conducted under varying conditions of model, prompt type, and temperature. Agreement among GPT evaluations was measured by percentage agreement and intraclass correlation (ICC), whereas consistency with CAT evaluations was examined using Pearson’s r, Spearman’s rho, and root mean square error (RMSE). The analyses revealed optimal GPT model, prompt, and temperature settings consistent with CAT, providing practical guidelines for GPT-based creativity assessments. This study contributes foundational insights for designing and implementing AI-based evaluations that align with CAT principles.

키워드

멀티모달 대규모 언어모델 생성형 인공지능 창의성 합의적 평가기법 GPT 기반 평가 Multi-modal Large Language Model Generative Artificial Intelligence Creativity Consensual Assessment Technique (CAT) GPT-based Assessment

국문 초록

영문 초록

목차

키워드

해당간행물 수록 논문

참고문헌

관련논문

사회과학 > 교육학분야 BEST

사회과학 > 교육학분야 NEW

최근 이용한 논문

APA

MLA