본문 바로가기

추천 검색어

실시간 인기 검색어

학술논문

Applying Textual Embeddings for Numerical Data Clustering

이용수 0

영문명
발행기관
한국공공가치학회
저자명
Aaditya Yadav Min Seo Park Ikshita Yadav
간행물 정보
『Journal of Public Value』Vol. 9, 85~98쪽, 전체 14쪽
주제분류
사회과학 > 사회복지학
파일형태
PDF
발행일자
2025.06.30
이용가능 이용불가
  • sam무제한 이용권 으로 학술논문 이용이 가능합니다.
  • 이 학술논문 정보는 (주)교보문고와 각 발행기관 사이에 저작물 이용 계약이 체결된 것으로, 교보문고를 통해 제공되고 있습니다. 1:1 문의
논문 표지

국문 초록

Purpose: This study investigates whether text-based embedding techniques—originally designed for natural language processing— can be effectively applied to numerical data. Method: We transform numerical datasets into space-separated strings and encode them using five embedding techniques: DistilBERT, TF-IDF, Doc2Vec, Multilingual-e5, and SFR-Mistral. To manage the resulting high-dimensional vectors, we reduce their dimensionality using both local and global configurations of UMAP. Clustering algorithms—including K-Means, Agglomerative, BIRCH, GMM, Genie, K-Medoids, K-Modes, LDA, MiniBatch K-Means, and Spectral Co-Clustering—are applied to these embeddings and compared against two baselines: clustering on raw numerical data and on UMAP-reduced numerical data. Performance is evaluated using Normalized Clustering Accuracy across a diverse set of benchmark datasets. Results: While text-based embeddings do not universally outperform traditional methods, several configurations—especially those using Multilingual-e5 and SFR-Mistral—demonstrate consistent improvements in clustering accuracy. In certain cases, embeddingbased approaches yield dramatic gains (over 500% increase in NCA compared to raw data). Algorithms such as K-Means, K- Medoids, and Spectral Co-Clustering benefit most from the transformed representations. Visual analyses on datasets like Graves Dense, Ring, and ZigZag show enhanced cluster separability and balanced densities after embedding. Conclusion: Textual embeddings can serve as a viable alternative preprocessing strategy for numerical clustering tasks, offering substantial improvements in specific contexts. These findings encourage further research into hybrid embedding techniques tailored for numerical data, potentially involving training specialized models or integrating with tabular-focused architectures to capitalize fully on the observed benefits.

영문 초록

목차

1. Introduction
2. Related Work
3. Methodology
4. Experimental Setup
5. Results
6. Visual Analysis of Embeddings
7. Discussion
8. Conclusion
9. References

키워드

해당간행물 수록 논문

참고문헌

최근 이용한 논문
교보eBook 첫 방문을 환영 합니다!

신규가입 혜택 지급이 완료 되었습니다.

바로 사용 가능한 교보e캐시 1,000원 (유효기간 7일)
지금 바로 교보eBook의 다양한 콘텐츠를 이용해 보세요!

교보e캐시 1,000원
TOP
인용하기
APA

Aaditya Yadav,Min Seo Park,Ikshita Yadav. (2025).Applying Textual Embeddings for Numerical Data Clustering. Journal of Public Value, (), 85-98

MLA

Aaditya Yadav,Min Seo Park,Ikshita Yadav. "Applying Textual Embeddings for Numerical Data Clustering." Journal of Public Value, (2025): 85-98

sam 이용권 선택
님이 보유하신 이용권입니다.
차감하실 sam이용권을 선택하세요.