본문 바로가기

추천 검색어

실시간 인기 검색어

학술논문

A Study on Learning Method for Korean Speech Data Using Limited Computing Resource

이용수  25

영문명
발행기관
한국인공지능학회
저자명
JeHyung TAK Kyuhyun CHOI Hyunsik NA Minsung KIM
간행물 정보
『인공지능연구』Vol.13 No. 2, 17~21쪽, 전체 5쪽
주제분류
복합학 > 과학기술학
파일형태
PDF
발행일자
2025.06.30
무료

구매일시로부터 72시간 이내에 다운로드 가능합니다.
이 학술논문 정보는 (주)교보문고와 각 발행기관 사이에 저작물 이용 계약이 체결된 것으로, 교보문고를 통해 제공되고 있습니다.

1:1 문의
논문 표지

국문 초록

In light of the increasing concerns over carbon emissions and power supply issues in the field of artificial intelligence, this study aims to conduct fine-tuning of a large language model (LLM) on Korean spoken language data using small-scale computing resources, and to evaluate the performance of the resulting supervised model. This research proposes an efficient method to limit computing resource usage and conducts the training based on such limited infrastructure.Subsequently, Korean spoken language data was collected. The dataset was designed to enable the model to understand a wide range of questions and provide appropriate answers. It consists of general knowledge sentence generation data, book summary information, academic paper summary data, and document summarization data. Due to the phonological changes, frequent subject omission, and honorifics that are unique to the Korean language, it is difficult to achieve satisfactory performance using existing English-based LLM training methods alone.This study distinguishes itself from prior works by selectively leveraging a dataset that reflects the linguistic characteristics of Korean, thereby proposing a language-specialized fine-tuning data strategy. For methodology, we conducted LLM fine-tuning using LoRA (Low-Rank Adaptation of Large Language Models) via Unsloth, based on the open-source Llama-3.1-8B-Instruct AI model. As a result, the model fine-tuning in this study achieved an average score of 43.33 on the Open Ko-LLM Leaderboard. Notably, it scored 61.17 on Ko-Winogrande, which assesses logical reasoning, and 58.3 on Ko-GSM8k, which evaluates mathematical problem-solving skills—demonstrating competitive performance compared to other open-source models. These results suggest a practical alternative to large-scale resource-based models in terms of both resource efficiency and linguistic suitability

영문 초록

목차

1. Introduction
2. Related Work
3. Computing Resources and Data
4. Tokenizer Configure Method
5. Conclusion
References

키워드

해당간행물 수록 논문

참고문헌

교보eBook 첫 방문을 환영 합니다!

신규가입 혜택 지급이 완료 되었습니다.

바로 사용 가능한 교보e캐시 1,000원 (유효기간 7일)
지금 바로 교보eBook의 다양한 콘텐츠를 이용해 보세요!

교보e캐시 1,000원
TOP
인용하기
APA

JeHyung TAK,Kyuhyun CHOI,Hyunsik NA,Minsung KIM. (2025).A Study on Learning Method for Korean Speech Data Using Limited Computing Resource. 인공지능연구, 13 (2), 17-21

MLA

JeHyung TAK,Kyuhyun CHOI,Hyunsik NA,Minsung KIM. "A Study on Learning Method for Korean Speech Data Using Limited Computing Resource." 인공지능연구, 13.2(2025): 17-21

결제완료
e캐시 원 결제 계속 하시겠습니까?
교보 e캐시 간편 결제