학술논문
A Study on Learning Method for Korean Speech Data Using Limited Computing Resource
이용수 25
- 영문명
- 발행기관
- 한국인공지능학회
- 저자명
- JeHyung TAK Kyuhyun CHOI Hyunsik NA Minsung KIM
- 간행물 정보
- 『인공지능연구』Vol.13 No. 2, 17~21쪽, 전체 5쪽
- 주제분류
- 복합학 > 과학기술학
- 파일형태
- 발행일자
- 2025.06.30
무료
구매일시로부터 72시간 이내에 다운로드 가능합니다.
이 학술논문 정보는 (주)교보문고와 각 발행기관 사이에 저작물 이용 계약이 체결된 것으로, 교보문고를 통해 제공되고 있습니다.
국문 초록
In light of the increasing concerns over carbon emissions and power supply issues in the field of artificial intelligence, this study aims to conduct fine-tuning of a large language model (LLM) on Korean spoken language data using small-scale computing resources, and to evaluate the performance of the resulting supervised model. This research proposes an efficient method to limit computing resource usage and conducts the training based on such limited infrastructure.Subsequently, Korean spoken language data was collected. The dataset was designed to enable the model to understand a wide range of questions and provide appropriate answers. It consists of general knowledge sentence generation data, book summary information, academic paper summary data, and document summarization data. Due to the phonological changes, frequent subject omission, and honorifics that are unique to the Korean language, it is difficult to achieve satisfactory performance using existing English-based LLM training methods alone.This study distinguishes itself from prior works by selectively leveraging a dataset that reflects the linguistic characteristics of Korean, thereby proposing a language-specialized fine-tuning data strategy. For methodology, we conducted LLM fine-tuning using LoRA (Low-Rank Adaptation of Large Language Models) via Unsloth, based on the open-source Llama-3.1-8B-Instruct AI model. As a result, the model fine-tuning in this study achieved an average score of 43.33 on the Open Ko-LLM Leaderboard. Notably, it scored 61.17 on Ko-Winogrande, which assesses logical reasoning, and 58.3 on Ko-GSM8k, which evaluates mathematical problem-solving skills—demonstrating competitive performance compared to other open-source models. These results suggest a practical alternative to large-scale resource-based models in terms of both resource efficiency and linguistic suitability
영문 초록
목차
1. Introduction
2. Related Work
3. Computing Resources and Data
4. Tokenizer Configure Method
5. Conclusion
References
해당간행물 수록 논문
- HEMA: A Hippocampus-Inspired Extended Memory Architecture for Long-Context AI Conversations
- Performance Comparisons of Bio-Inspired Optimization Algorithms for Grid Synchronization
- Adaptive Movement and Formation Coordinated Control for Flying Ad-Hoc Networks (FANETs) in Dynamic Environments
- A Study on Learning Method for Korean Speech Data Using Limited Computing Resource
- Keypoint-based Distortion Correction and Data Augmentation for High-angle License Plate Recognition
참고문헌
관련논문
복합학 > 과학기술학분야 BEST
- Culinary Narratives on the Global Stage: Analyzing K-Food's Cultural Capital through Netflix's 'Black and White Chef
- The Sociocultural Meaning of Zero-Calorie Beverage Consumption: A Qualitative Study on Health Perceptions and Beverage Choices Among Young Adults in South Korea
- Functional Food Potential of Cyclic Dipeptides from Lactobacillus plantarum: Inhibition of Breast Cancer via Cancer Stem Cell Regulation
복합학 > 과학기술학분야 NEW
더보기최근 이용한 논문
교보eBook 첫 방문을 환영 합니다!
신규가입 혜택 지급이 완료 되었습니다.
바로 사용 가능한 교보e캐시 1,000원 (유효기간 7일)
지금 바로 교보eBook의 다양한 콘텐츠를 이용해 보세요!