LLM의 윤리성 강화를 위한 한국어 비윤리 텍스트 필터링 모델 | sam

학술논문

LLM의 윤리성 강화를 위한 한국어 비윤리 텍스트 필터링 모델

이용수 0

영문명: Korean Unethical Text Filtering Model for Enhancing the Ethicality of Large Language Models (LLMs)
발행기관: 한국전자통신학회
저자명: 정유남(Yu-Nam Cheong) 김종찬(Jong-Chan Kim) 신광성(Kwang-Seong Shin)
간행물 정보: 『한국전자통신학회 논문지』제20권 제5호, 1061~1070쪽, 전체 10쪽
주제분류: 공학 > 전자/정보통신공학
파일형태: PDF
발행일자: 2025.10.30

이용권 구매하기

이용가능 이용불가

sam무제한 이용권 으로 학술논문 이용이 가능합니다.
이 학술논문 정보는 (주)교보문고와 각 발행기관 사이에 저작물 이용 계약이 체결된 것으로, 교보문고를 통해 제공되고 있습니다. 1:1 문의

국문 초록

대형 언어모델(LLM)의 윤리성 문제가 대두되는 가운데, 본 연구는 한국어 환경에 적합한 비윤리 표현 필터링 모델을 개발하였다. 총 36만여 개의 한국어 대화 데이터를 기반으로 KoBERT 및 KoELECTRA 모델을 활용하여 7개 유형(차별, 혐오, 비난, 폭력, 범죄, 선정, 욕설)의 비윤리 표현에 대해 이진 및 다중 라벨 분류를 수행했다. LoRA 기법을 적용해 전체 파라미터의 0.1%만 학습하면서도 성능 향상을 달성했다. 그 결과, KoELECTRA + LoRA 모델은 이진 분류에서 정확도 93.1%, F1-score 0.930을 기록하며 최고 성능을 보였고, 다중 라벨 분류에서는 Micro F1 0.858, Macro F1 0.816의 성능을 달성했다. 본 모델은 한국어 LLM의 안전성 확보와 온라인 커뮤니케이션의 질 향상에 기여할 수 있으며, 향후 클래스 불균형 및 실시간 처리 문제를 개선할 여지가 있다.

영문 초록

As ethical concerns surrounding large language models (LLMs) grow, this study presents a filtering model tailored to detect unethical expressions in the Korean language environment. Utilizing approximately 360,000 Korean conversational data samples, we developed binary and multi-label classifiers for seven categories of unethical content: discrimination, hate, censure, violence, crime, sexually explicit content, and profanity. By applying the Low-Rank Adaptation (LoRA) technique, we achieved parameter-efficient training, updating only 0.1% of the total parameters while enhancing performance. As a result, the KoELECTRA + LoRA model achieved the highest performance in binary classification with an accuracy of 93.1% and an F1-score of 0.930. In multi-label classification, the KoELECTRA model reached a Micro F1-score of 0.858 and a Macro F1-score of 0.816.This model contributes to enhancing the safety of Korean LLMs and promoting healthier online communication. Future improvements may focus on addressing class imbalance and optimizing real-time processing.

키워드

비윤리 텍스트 탐지 대형 언어모델 Unethical Text Detection KoELECTRA LoRA(Low-Rank Adaptation) Large Language Models

국문 초록

영문 초록

목차

키워드

해당간행물 수록 논문

참고문헌

최근 이용한 논문

APA

MLA