OAK

통계 기반 한국어 형태소 분석기의 성능 개선

Metadata Downloads
Alternative Title
Improving the Performance of Statistical Korean Morphological Analyzer
Abstract
Statistical Korean morphological analysis is a brand-new approach in that it does not require a manually built machine-readable morphology dictionary. Instead, it uses statistical information that is acquired from POS-tagged corpus. The acquisition of statistical information is fully automated, so that no human intervention is required in the process. This is a good side of the statistical approach to Korean morphological analysis. The bad side of the approach is its low precision, meaning that the number of false positives is relatively high. In order to improve the precision, this paper proposes a method of filtering false positives. The proposed method introduces two types of dictionaries, one-syllable-morpheme dictionary and josa-eomi dictionary, which are automatically constructed when statistical information is collected from the POS-tagged corpus. To evaluate the performance of the proposed method, 10-fold cross-validation is performed with 10 million eojeol Sejong POS-tagged corpus. The experimental results show that the precision has been improved by 5%.
Author(s)
심광섭
Issued Date
2016-02-01
Type
Article
DOI
I410-ECN-0102-2016-000-000708579
URI
http://repository.sungshin.ac.kr/handle/2025.oak/7811
https://kiss.kstudy.com/Detail/Ar?key=3419915
Publisher
성신여자대학교 인문과학연구소
ISSN
2005-0933
Appears in Collections:
인문과학연구소 > 학술논문
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.