SynC2S: An Efficient Method for Synthesizing Tabular Data With a Learnable Pre-Processing
- Abstract
- There has been a growing demand to access large public datasets to extract valuable insights or enhance their services. However, this also involves risks, such as privacy breaches and unauthorized data exposure. Data synthesis has emerged as a popular technique to address privacy preservation and data usability simultaneously. Recently, numerous methods based on deep learning have been developed, while a clear understanding of their effectiveness is still insufficient, and the necessity for more efficient frameworks persists. In this study, we propose an efficient and theoretically principled method based on a deep generative model to effectively generate high-quality synthetic tabular data. First, we introduce a novel technique called C2Smap –a learnable pre-processing method that automatically transforms continuous distributions into simpler and easily generatable forms. We then develop a conditional generative model with a hierarchical structure and its corresponding learning framework, called HCIWAE, to successfully capture imbalanced categorical distributions. Combining these two components, we coin our method Synthetic data generation with C2Smap (SynC2S) . Through comprehensive experimental analyses, we demonstrate the superiority and efficiency of SynC2S in generating synthetic data compared to other rec
- Author(s)
- 김동하; 김지우; 박세리; 고준성
- Issued Date
- 2025-01-09
- Type
- Article
- Keyword
- 통계학
- DOI
- 10.1109/ACCESS.2024.3472706
- URI
- http://repository.sungshin.ac.kr/handle/2025.oak/8621
- Publisher
- IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
- ISSN
- 2169-3536
-
Appears in Collections:
- 수리통계데이터사이언스학부 > 학술논문
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.