사전학습 언어모델의 은닉 표현을 활용한 욕설 문장 탐지 : 주파수 특성과 의미 벡터 기반 비교 실험 | sam

학술논문

사전학습 언어모델의 은닉 표현을 활용한 욕설 문장 탐지 : 주파수 특성과 의미 벡터 기반 비교 실험

이용수 0

영문명: Leveraging LLM Hidden States for Offensive Text Detection : Frequency-Domain Signals versus. Semantic Vectors
발행기관: 한국산업기술융합학회(구. 산업기술교육훈련학회)
저자명: 황원용(Won-Yong Hwang) 김효관(Hyo-Kwan Kim)
간행물 정보: 『산업기술연구논문지』제30권 3호, 69~78쪽, 전체 10쪽
주제분류: 공학 > 산업공학
파일형태: PDF
발행일자: 2025.09.30

4,000원

구매일시로부터 72시간 이내에 다운로드 가능합니다.
이 학술논문 정보는 (주)교보문고와 각 발행기관 사이에 저작물 이용 계약이 체결된 것으로, 교보문고를 통해 제공되고 있습니다.

1:1 문의

국문 초록

최근 대형 언어 모델(Large Language Model, LLM)의 발전은 텍스트 생성과 이해 능력에서 주목할 만한 성과를 보여주고 있다. 하지만 이러한 모델들이 내부적으로 욕설이나 공격적 언어를 어떻게 표현하고 구별하는지는 아직 충분히 탐구되지 않았다. 본 연구는 모델의 출력 없이 내부 hidden state만으로 욕설 문장과 일반 문장을 구별할 수있는지 두 가지 접근법으로 비교하였다. 내부 hidden state에 대해 푸리에 변환(Fast Fourier Transform)을 적용하여 고주파 특성을 분석하는 주파수 기반 방법과 전체 토큰 hidden state의 평균을 벡터로 삼아 cosine 유사도로 문장을 판별하는 의미 기반 방법이다. 실험 결과 주파수 기반 방법은 분류 성능이 낮았으나, 의미 기반 접근은 명확한 분리 성능을 보여주었으며, 이는 LLM 내부 표현 공간에 공격성 정보가 내재되어 있음을 시사한다. 이 연구는 신호처리와 의미 표현 분석 간의 접점을 탐색하며, 사회 언어적 분류 문제에 대한 새로운 해석 가능성을 제시한다.

영문 초록

Recent advances in large language models (LLMs) have demonstrated remarkable capabilities in text generation and understanding. However, how these models internally represent and distinguish offensive or abusive language remains underexplored. This study investigated whether such language can be detected using only the internal hidden states of LLMs, without relying on the model's output. We compared two approaches: (1) a frequency-based method that applied fast Fourier transform (FFT) to the hidden state to extract high-frequency features, and (2) a semantics-based method that averaged all token hidden states into a single vector and classified sentences via cosine similarity. The frequency-based method yielded low classification performance; however, the semantics-based approach exhibited clear separation between classes, suggesting that LLMs encode implicit signals of verbal aggression in their internal representations. This study highlights the intersection between signal processing and semantic representation analysis, providing new perspectives for socially sensitive language classification.

키워드

대형 언어 모델 은닉 상태 표현 욕설 탐지 문장 주파수 분석 문장 의미 기반 분류 Large Language Model (LLM) Hidden State Representations Abusive Language Detection Frequency-Based Analysis Semantic Similarity Classification

국문 초록

영문 초록

목차

키워드

해당간행물 수록 논문

참고문헌

관련논문

공학 > 산업공학분야 BEST

공학 > 산업공학분야 NEW

최근 이용한 논문

APA

MLA