OAK

도서관 로그인

검색

SUNGSHIN Repository 대학원 일반대학원 컴퓨터학과 학위논문

SVM을이용한외래어인식

Metadata Downloads

Alternative Title: Foreign Words Identification Using Support Vector Machines

Abstract: 한국어 텍스트에서 발견되는 외래어의 수는 점점 증가하는 추세에 있다. 외래어는 대체로 고유명사나 전문용어로, 생산적인 어휘 유형이어서 미등록어 문제를 일으키며, 음차 표기 또한 단일하지 않아서 정보검색에서 색인어 불일치 문제를 일으켜 재현율에 영향을 미치고 있다.
따라서, 본 논문에서는 SVM을 사용하여 외래어를 인식하는 방법을 제시한다. 외래어 인식 문제는 외래어와 순수 한국어 명사의 분류로 재정의하였다. 음절 정보와 음소 정보, 선별된 음소 정보와 선별된 음절 정보를 자질 벡터 생성에 사용하며, 학습 자질 벡터 9000개에 대해 SVM 학습을 수행하고, 테스트 자질 벡터 1000개에 대해 SVM 분류를 수행한다.
평가 결과, 벡터 생성에 반영되는 정보에 따라 정밀도 88.65%, 정확도 90.69%, 재현율 86.14%, F-measure(β=1) 88.35를 갖는 베이스라인에 비해 정밀도 약 2-5%, 정확도 약 3-6%, 재현율 약 0.5-3%, F-measure 약 1.5-4.5의 성능향상을 보여주었다. 가장 좋은 성능을 보여준 실험은 음절 정보와 선별된 음소 정보, 선별된 음절 정보를 반영하여 자질 벡터를 생성한 실험으로 10-fold cross-validation 테스트에서 정밀도 93.06%, 정확도 96.55%, 재현율 89.30%, F-measure(β=1) 92.78을 나타냈다.|Foreign words are often found in Korean texts. Most foreign words are proper nouns or technical terms, which are not in a dictionary. The variety of transliteration causes index term mismatch problem in Korean information retrieval, so that it influences recall of information retrieval.
This thesis proposes a SVM approach for foreign words identification in Korean texts. We consider the foreign words identification problem as a classification problem. Syllable information, phoneme information, selected phoneme information and selected syllable information are used in providing input vectors for SVM. 9000 training feature vectors are used for SVM learning and 1000 test feature vectors for classification by SVM.
Compared with the baseline, the proposed method improved the accuracy by 2-5%, the precision by 3-6%, the recall by 0.5-3%, and the F-measure by 1.5-4.5, depending on feature selection. The experiment with syllable information, phoneme information, selected phoneme information and selected syllable information showed the best performance. This experiment showed 93.06% accuracy, 96.55% precision, 89.30% recall and 92.78 F-measure(β=1) on 10-fold cross-validation tests.

Author(s): 권미영

Issued Date: 2005

Awarded Date: 2006-02

Type: Dissertation

URI: https://repository.sungshin.ac.kr/handle/2025.oak/1893
http://210.125.93.15/jsp/common/DcLoOrgPer.jsp?sItemId=000000002152

Alternative Author(s): Kwon, Mi-Young

Affiliation: 성신여자대학교 일반대학원

Department: 일반대학원 전산학과

Advisor: 심광섭

Table Of Contents: Ⅰ. 서론 = 1
Ⅱ. 관련 연구 = 3
1. 한국어정보처리에서 외래어 관련 연구 = 3
2. SVM = 6
Ⅲ. SVM을 이용한 외래어 인식 = 13
1. 자질 선택과 표현 = 13
2. 실험 데이터 구성 = 16
3. 학습 및 분류 과정 = 17
Ⅳ. 실험 및 평가 = 20
1. 평가 척도 = 20
2. 평가 방법 = 21
3. 베이스라인 설정 = 22
4. 성능 평가 = 23
4.1 학습 데이터 크기에 따른 성능 비교 = 23
4.2 자질 선택에 따른 성능 비교 = 24
5. 결과 분석 = 28
Ⅴ. 결론 및 향후 과제 = 31
참고문헌 = 34
ABSTRACT = 36

Degree: Master

Publisher: 성신여자대학교

Appears in Collections:: 컴퓨터학과 > 학위논문

메타데이터 간략히 보기메타데이터 전체 보기

공개 및 라이선스

공개 구분공개
엠바고2006-05-30

qrcode

트윗하기

OAK SUNGSHIN Repository는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.