한국어 뉴스 데이터의 AI 생성 여부를 판별하기 위한 임베딩 방법과 머신러닝 모형의 적용에 관한 사례연구 | ebook

학술논문

한국어 뉴스 데이터의 AI 생성 여부를 판별하기 위한 임베딩 방법과 머신러닝 모형의 적용에 관한 사례연구

이용수 0

영문명: A Case Study on the Application of Embedding Methods and Machine Learning Models to Determine Whether Korean News Data is AI-Generated
발행기관: 한국자료분석학회
저자명: 이인규(In-Gyu Lee) 강현철(Hyuncheol Kang)
간행물 정보: 『Journal of The Korean Data Analysis Society (JKDAS)』Vol.27 No.3, 797~807쪽, 전체 11쪽
주제분류: 자연과학 > 통계학
파일형태: PDF
발행일자: 2025.06.30

4,120원

구매일시로부터 72시간 이내에 다운로드 가능합니다.
이 학술논문 정보는 (주)교보문고와 각 발행기관 사이에 저작물 이용 계약이 체결된 것으로, 교보문고를 통해 제공되고 있습니다.

1:1 문의

국문 초록

4차 산업 혁명 시대에 접어들면서 AI와 로봇을 포함한 첨단 IT 기술이 빠르게 발전하고 있으며, 이에 따라 AI 서비스 경험률도 최근 급격히 증가하고 있다. AI 기술이 점점 우리 생활에 밀접해짐에 따라 생성형 AI가 미치는 영향도 커지고 있으며, 그중 하나가 AI가 생성한 뉴스 콘텐츠의 확산이다. AI가 작성한 뉴스는 독자들에게 편리함을 제공하지만, 동시에 가짜 뉴스 및 정보 조작 등의 문제를 야기할 수 있어 이에 대한 판별이 중요한 과제가 되었다. 본 연구는 AI 생성 뉴스 데이터를 판별하는 효과적인 방법을 찾기 위해 다양한 머신러닝 기법을 적용하여 분석을 진행하였다. 본 연구에서는 TF-IDF, Doc2Vec, roBERTa와 같은 임베딩 기법을 활용하였으며, 로지스틱회귀모형, 서포트벡터머신, 의사결정나무, XGBoost, 랜덤포레스트 등의 분류 모형을 비교하였다. 분석을 위해 AI-Hub에서 제공한 실제 한국어 뉴스 데이터를 활용하였으며, AI 생성 뉴스 데이터는 KULLM 모델을 이용하여 직접 생성하였다. 분석 결과에서 roBERTa 기반 모형이 가장 높은 정확도를 기록하며 AI 생성 뉴스 판별에 효과적인 것으로 나타났다. 본 연구를 통해 AI 생성 뉴스의 특징을 분석하고, 효과적인 판별 방법을 제시함으로써 가짜 뉴스 및 정보 도용 문제 해결에 기여할 수 있을 것으로 기대된다.

영문 초록

As we enter the era of the 4th industrial revolution, cutting-edge IT technologies including AI and robots are rapidly developing, and the AI service experience rate has also been rapidly increasing recently. As AI technology becomes increasingly closely related to our lives, the influence of generative AI is also increasing, and one of them is the spread of AI-generated news content. News written by AI provides convenience to readers, but at the same time, it can cause problems such as fake news and information manipulation, so discerning them has become an important task. This study conducted an analysis by applying various machine learning techniques to find an effective method for discerning AI-generated news data. In this study, we utilized embedding techniques such as TF-IDF, Doc2Vec, and roBERTa, and compared classification models such as logistic regression model, support vector machine, decision tree, XGBoost, and random forest. For the analysis, we used actual Korean news data provided by AI-Hub, and AI-generated news data was directly generated using the KULLM model. The analysis results showed that the roBERTa-based model recorded the highest accuracy and was effective in identifying AI-generated news. It is expected that this study will contribute to solving the problems of fake news and information theft by analyzing the characteristics of AI-generated news and suggesting an effective identification method.

국문 초록

영문 초록

목차

키워드

해당간행물 수록 논문

참고문헌

관련논문

자연과학 > 통계학분야 BEST

자연과학 > 통계학분야 NEW

최근 이용한 논문

APA

MLA