OAK

도서관 로그인

검색

SUNGSHIN Repository 대학원 일반대학원 통계학과 학위논문

A Study on an Explainable Paper Classification System Using Topic Modeling and XAI

Metadata Downloads

Alternative Title: 토픽모델링과 XAI를 활용한 설명 가능한 논문 분류 시스템에 관한 연구

Abstract: Accuracy and interpretability are two critical factors of document classification systems. While accuracy evaluates how well a classifier can correctly predict unseen data, interpretability focuses on how easily humans can understand the model and its rationale for assigning labels to instances. An effective classification system should not only maintain high accuracy but also provide users with intuitive and comprehensive insights to support decision-making. This study proposes an innovative explainable paper classification system that incorporates topic modeling and an explainable artificial intelligence (XAI) technique. The proposed system utilizes latent semantic analysis (LSA) for topic modeling and applies Shapley additive explanations (SHAP) to improve transparency and comprehensibility of the classification outcomes. The system offers three key advantages in the interpretability of classification results: corpus-level, document-level, and word-level. The effectiveness of the proposed system is validated using the Web of Science dataset, specifically focusing on the nanomaterial field. Its performance is further evaluated on a large-scale dataset from the Semantic Scholar database.|정확도와 해석 가능성은 문서 분류 시스템의 두 가지 핵심 요소이다. 정확도는 분류기가 보지 못한 데이터를 얼마나 잘 예측할 수 있는지를 평가하며, 해석 가능성은 모델의 작동 방식을 인간이 얼마나 쉽게 이해할 수 있는지와 각 데이터가 특정 레이블로 할당된 이유를 설명하는 능력을 의미한다. 효과적인 분류 시스템은 높은 정확도를 유지함과 동시에 사용자에게 직관적이고 포괄적인 통찰을 제공하여 의사결정을 지원해야 한다. 본 연구는 토픽 모델링과 설명 가능한 인공지능 기법을 결합한 새로운 설명 가능한 논문 분류 시스템을 제안한다. 본 시스템은 토픽 모델링을 위해 잠재 의미 분석을 활용하며, 분류 결과의 투명성과 이해도를 높이기 위해 SHAP(Shapley additive explanations)를 적용한다. 본 시스템은 세 가지 주요 해석 수준(말뭉치 수준, 문서 수준, 단어 수준)에서 분류 결과의 해석을 제공한다. 본 시스템의 유효성은 Web of Science 데이터셋을 사용하여, 특히 나노소재 분야를 중심으로 검증되었으며, 추가로 Semantic Scholar 데이터베이스의 대규모 데이터셋을 활용하여 성능을 평가하였다.

Author(s): 신나경

Issued Date: 2025

Awarded Date: 2025-02

Type: Dissertation

URI: https://repository.sungshin.ac.kr/handle/2025.oak/1338
http://dcollection.sungshin.ac.kr/common/orgView/000000015241

Alternative Author(s): SHIN NAKYUNG

Affiliation: 성신여자대학교 일반대학원

Department: 일반대학원 통계학과

Advisor: 정호현

Table Of Contents: I. Introduction 1
II. Proposed method 5
2.1 Overview 5
2.2 Latent Semantic Analysis 7
2.3 Multilayer Perceptron 8
2.4 Shapley Additive Explanations 9
III. Related work 13
IV. Experiments 15
4.1 Dataset 15
4.2 Experimental procedure 15
4.3 Comparative study 18
4.3.1 Embedding methods 18
4.3.2 Classification models 20
4.3.3 Evaluation metrics 22
4.4 Results 24
4.4.1 Comparison results 24
4.4.2 Explanations of classification results 27
V. Applications to Semantic Scholar 38
5.1 Dataset 38
5.2 Comparison results 38
VI. Conclusion 44

Degree: Master

Publisher: 성신여자대학교 일반대학원

Appears in Collections:: 통계학과 > 학위논문

메타데이터 간략히 보기메타데이터 전체 보기

공개 및 라이선스

공개 구분공개
엠바고2025-02-20

qrcode

트윗하기

OAK SUNGSHIN Repository는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.