통계 기반 한국어 형태소 분석기의 성능 개선
- Alternative Title
- Improving the Performance of Statistical Korean Morphological Analyzer
- Abstract
- Statistical Korean morphological analysis is a brand-new approach in that it does not require a manually built machine-readable morphology dictionary. Instead, it uses statistical information that is acquired from POS-tagged corpus. The acquisition of statistical information is fully automated, so that no human intervention is required in the process. This is a good side of the statistical approach to Korean morphological analysis. The bad side of the approach is its low precision, meaning that the number of false positives is relatively high. In order to improve the precision, this paper proposes a method of filtering false positives. The proposed method introduces two types of dictionaries, one-syllable-morpheme dictionary and josa-eomi dictionary, which are automatically constructed when statistical information is collected from the POS-tagged corpus. To evaluate the performance of the proposed method, 10-fold cross-validation is performed with 10 million eojeol Sejong POS-tagged corpus. The experimental results show that the precision has been improved by 5%.
- Author(s)
- 심광섭
- Issued Date
- 2016-02-01
- Type
- Article
- DOI
- I410-ECN-0102-2016-000-000708579
- URI
- http://repository.sungshin.ac.kr/handle/2025.oak/7811
https://kiss.kstudy.com/Detail/Ar?key=3419915
- Publisher
- 성신여자대학교 인문과학연구소
- ISSN
- 2005-0933
-
Appears in Collections:
- 인문과학연구소 > 학술논문
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.