본문 바로가기

추천 검색어

실시간 인기 검색어

학술논문

Applying Textual Embeddings for Numerical Data Clustering

이용수  0

영문명
발행기관
한국공공가치학회
저자명
Aaditya Yadav Min Seo Park Ikshita Yadav
간행물 정보
『Journal of Public Value』Vol. 9, 85~98쪽, 전체 14쪽
주제분류
사회과학 > 사회복지학
파일형태
PDF
발행일자
2025.06.30
4,480

구매일시로부터 72시간 이내에 다운로드 가능합니다.
이 학술논문 정보는 (주)교보문고와 각 발행기관 사이에 저작물 이용 계약이 체결된 것으로, 교보문고를 통해 제공되고 있습니다.

1:1 문의
논문 표지

국문 초록

Purpose: This study investigates whether text-based embedding techniques—originally designed for natural language processing— can be effectively applied to numerical data. Method: We transform numerical datasets into space-separated strings and encode them using five embedding techniques: DistilBERT, TF-IDF, Doc2Vec, Multilingual-e5, and SFR-Mistral. To manage the resulting high-dimensional vectors, we reduce their dimensionality using both local and global configurations of UMAP. Clustering algorithms—including K-Means, Agglomerative, BIRCH, GMM, Genie, K-Medoids, K-Modes, LDA, MiniBatch K-Means, and Spectral Co-Clustering—are applied to these embeddings and compared against two baselines: clustering on raw numerical data and on UMAP-reduced numerical data. Performance is evaluated using Normalized Clustering Accuracy across a diverse set of benchmark datasets. Results: While text-based embeddings do not universally outperform traditional methods, several configurations—especially those using Multilingual-e5 and SFR-Mistral—demonstrate consistent improvements in clustering accuracy. In certain cases, embeddingbased approaches yield dramatic gains (over 500% increase in NCA compared to raw data). Algorithms such as K-Means, K- Medoids, and Spectral Co-Clustering benefit most from the transformed representations. Visual analyses on datasets like Graves Dense, Ring, and ZigZag show enhanced cluster separability and balanced densities after embedding. Conclusion: Textual embeddings can serve as a viable alternative preprocessing strategy for numerical clustering tasks, offering substantial improvements in specific contexts. These findings encourage further research into hybrid embedding techniques tailored for numerical data, potentially involving training specialized models or integrating with tabular-focused architectures to capitalize fully on the observed benefits.

영문 초록

목차

1. Introduction
2. Related Work
3. Methodology
4. Experimental Setup
5. Results
6. Visual Analysis of Embeddings
7. Discussion
8. Conclusion
9. References

키워드

해당간행물 수록 논문

참고문헌

교보eBook 첫 방문을 환영 합니다!

신규가입 혜택 지급이 완료 되었습니다.

바로 사용 가능한 교보e캐시 1,000원 (유효기간 7일)
지금 바로 교보eBook의 다양한 콘텐츠를 이용해 보세요!

교보e캐시 1,000원
TOP
인용하기
APA

Aaditya Yadav,Min Seo Park,Ikshita Yadav. (2025).Applying Textual Embeddings for Numerical Data Clustering. Journal of Public Value, (), 85-98

MLA

Aaditya Yadav,Min Seo Park,Ikshita Yadav. "Applying Textual Embeddings for Numerical Data Clustering." Journal of Public Value, (2025): 85-98

결제완료
e캐시 원 결제 계속 하시겠습니까?
교보 e캐시 간편 결제