OAK

앙상블 사전학습 모델 기반의 SNS 문맥인지 이모티콘 추천

Metadata Downloads
Alternative Title
Context aware emoji recommendation for SNS based on emsemble pre-training model
Abstract
이모티콘 추천은 수천 개의 이모티콘 중에서 사용자가 원하는 적절한 이모티콘을 용이하게 찾도록 도와주는 중요한 태스크이다. 기존의 이모티콘추천 방법들은 채팅 플랫폼을 대상으로 하며 사용자들이 많이 사용하는 감정 이모티콘 위주로 추천한다. 그러나 인스타그램 등 SNS 플랫폼에서는 감정 전달보다는 업로드한 짧은 게시글의 내용을 보완하거나 강조하는 용도로
이모티콘을 사용하는 경향이 있다. 이 연구에서는 SNS 플랫폼에서 한국어 게시글의 문맥을 파악하여 이모티콘을 추천하는 새로운 방법론을 제안한다. 이모티콘 추천 문제에 계층적 KoBERT를 도입하여 한국어 게시글의 문맥을 파악하고 이에 적합한 다양한 이모티콘을 추천한다. 314개 이모티콘 카테고리에 속하는 616개의 이모티콘 추천은 SNS 게시글의 함축적인 단문을 보다 정확하게 전달하는 데 유용하다.
인스타그램 게시글을 수집하여 실제 세계를 반영하는 데이터셋을 구성하고 각 텍스트에 삽입되어 있는 이모티콘의 계층적 카테고리를 학습하기 위해 계층적 KoBERT 모델을 구축한다. 실험 결과에서 DNN, LSTM,Bi-LSTM, GRU 모델과 비교하여 계층적 KoBERT 모델이 이모티콘 추천에서 높은 성능을 보이는 것을 검증하였다.
또한, 성능 향상을 위해 KoBART, KoGPT2, KoELECTRA,KcELECTRA 등의 한국어 전이학습 모델을 추가로 도입하여 모델 성능을 비교 분석하였고 스태킹 앙상블 기법을 적용하였다.|Emoticon recommendation is a critical task that assists users in easily finding the appropriate emoticon among thousands available.
Traditional emoticon recommendation methods target chat platforms and primarily recommend emoticons that users frequently use to express emotions. However, on social media platforms like Instagram, emoticons are often used more to supplement or emphasize the content of short posts rather than to convey emotions.
This study proposes a new methodology for recommending emoticons by understanding the context of Korean posts on social media platforms.
We introduce hierarchical KoBERT into the emoticon recommendation problem to understand the context of Korean posts and recommend a variety of suitable emoticons. Recommending 616 emoticons from 314 categories proves useful in conveying the implicit meanings of short social media posts more accurately.
We construct a dataset that reflects the real world by collecting Instagram posts and build a hierarchical KoBERT model to learn the hierarchical categories of emoticons inserted in each text.
Experimental results have validated that the hierarchical KoBERT model outperforms DNN, LSTM, Bi-LSTM, and GRU models in emoticon recommendation.
Additionally, for performance improvement, we introduce additional Korean transfer learning models such as KoBART, KoGPT2,
KoELECTRA, KcELECTRA, and compare their performances and apply stacking ensemble techniques.
Author(s)
김지현
Issued Date
2023
Awarded Date
2023-08
Type
Dissertation
URI
https://repository.sungshin.ac.kr/handle/2025.oak/4744
http://dcollection.sungshin.ac.kr/common/orgView/000000014778
Department
일반대학원 미래융합기술공학과
Advisor
변혜원
Table Of Contents
논문개요
Ⅰ. 서론 ·······································································································1
1. 연구 배경 ····························································································1
2. 연구 목적 ····························································································4
3. 논문 기여 및 구성 ············································································5
Ⅱ. 관련연구 ······························································································6
1. 이모티콘 추천 시스템 ······································································6
2. 사전학습 언어 모델 ··········································································9
3. 앙상블 분류기 ··················································································16
Ⅲ. 시스템구성 ························································································ 24
Ⅳ. 데이터 구성 및 전처리 ································································· 26
1. 데이터 수집 ······················································································26
2. 데이터 전처리 ··················································································28
3. 이모티콘 계층적 클러스터링 ······················································· 32
4. 데이터 라벨링 및 분석 ································································· 40
Ⅴ. 모델 설계 및 학습 ··········································································42
1. 사전학습 언어 모델 학습 ····························································· 42
2. 계층적 모델 및 이모티콘 추천 ··················································· 47
3. 스태킹 앙상블 모델 학습 ····························································· 49
Ⅵ. 실험 설계 및 결과 ··········································································53
1. 성능지표 ···························································································· 53
2. 하이퍼파라미터 실험 ······································································55
3. 모델 성능 실험 환경 및 설계 ····················································· 57
4. 모델 성능 실험 결과 및 분석 ····················································· 60
4-1. 계층적 KoBERT 모델 성능 평가 ······································60
4-2. 스태킹 앙상블을 적용한 계층적 모델 성능 평가 ········· 63
VII. 결론 ·································································································70
참고문헌
ABSTRACT
Degree
Master
Publisher
성신여자대학교 일반대학원
Appears in Collections:
미래융합기술공학과 > 학위논문
공개 및 라이선스
  • 공개 구분공개
  • 엠바고2023-08-25
파일 목록

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.