- 영문명
- 발행기관
- 한국공공가치학회
- 저자명
- Aaditya Yadav Min Seo Park Ikshita Yadav
- 간행물 정보
- 『Journal of Public Value』Vol. 9, 85~98쪽, 전체 14쪽
- 주제분류
- 사회과학 > 사회복지학
- 파일형태
- 발행일자
- 2025.06.30

국문 초록
Purpose: This study investigates whether text-based embedding techniques—originally designed for natural language processing— can be effectively applied to numerical data.
Method: We transform numerical datasets into space-separated strings and encode them using five embedding techniques: DistilBERT, TF-IDF, Doc2Vec, Multilingual-e5, and SFR-Mistral. To manage the resulting high-dimensional vectors, we reduce their dimensionality using both local and global configurations of UMAP. Clustering algorithms—including K-Means, Agglomerative, BIRCH, GMM, Genie, K-Medoids, K-Modes, LDA, MiniBatch K-Means, and Spectral Co-Clustering—are applied to these embeddings and compared against two baselines: clustering on raw numerical data and on UMAP-reduced numerical data. Performance is evaluated using Normalized Clustering Accuracy across a diverse set of benchmark datasets.
Results: While text-based embeddings do not universally outperform traditional methods, several configurations—especially those using Multilingual-e5 and SFR-Mistral—demonstrate consistent improvements in clustering accuracy. In certain cases, embeddingbased approaches yield dramatic gains (over 500% increase in NCA compared to raw data). Algorithms such as K-Means, K- Medoids, and Spectral Co-Clustering benefit most from the transformed representations. Visual analyses on datasets like Graves Dense, Ring, and ZigZag show enhanced cluster separability and balanced densities after embedding.
Conclusion: Textual embeddings can serve as a viable alternative preprocessing strategy for numerical clustering tasks, offering substantial improvements in specific contexts. These findings encourage further research into hybrid embedding techniques tailored for numerical data, potentially involving training specialized models or integrating with tabular-focused architectures to capitalize fully on the observed benefits.
영문 초록
목차
1. Introduction
2. Related Work
3. Methodology
4. Experimental Setup
5. Results
6. Visual Analysis of Embeddings
7. Discussion
8. Conclusion
9. References
키워드
해당간행물 수록 논문
참고문헌
최근 이용한 논문
교보eBook 첫 방문을 환영 합니다!
신규가입 혜택 지급이 완료 되었습니다.
바로 사용 가능한 교보e캐시 1,000원 (유효기간 7일)
지금 바로 교보eBook의 다양한 콘텐츠를 이용해 보세요!
