Abstract
In the era of Artificial Intelligence (AI), where data plays a pivotal role, researchers are increasingly leveraging synthetic data to address privacy concerns, mitigate data scarcity, and enhance model robustness. This approach is particularly promising in critical domains such as healthcare, finance, government, and autonomous systems, where diverse and representative datasets are essential for effective AI training. The integration of data from multiple sources or parties in the context of big data can significantly enrich the available information. However, the data contributed by each party often exhibits distinct characteristics, leading to highly imbalanced distributions. This challenge introduces an additional layer of complexity known as the double imbalance problem, characterized by imbalances both within individual parties and across multiple parties. To address these challenges, we propose a novel generative adversarial network (GAN) framework incorporating distributed discriminators and dual attention mechanisms. Our approach utilizes a single generator to synthesize data conditioned on multiple parties, with each party maintaining its own Critic and dataset to ensure privacy preservation. We introduce local and global attention mechanisms, along with gradient-casting techniques during training, to effectively address the dual imbalance issues prevalent in multi-party data synthesis. The local attention mechanism addresses imbalances within individual parties, while the global attention mechanism targets imbalances across parties, resulting in a more stable generative model in the presence of highly imbalanced data distributions. To validate our approach, we conducted empirical experiments using six real-world tabular datasets, deliberately setting up dual imbalance scenarios across various intra- and inter-party contexts. We evaluated the utility of the synthetic data generated by multiple parties by assessing its efficacy in machine learning tasks. The results demonstrate that our distributed GAN with dual attention mechanisms outperforms existing generative models in addressing these challenges.
| Original language | English |
|---|---|
| Article number | 108166 |
| Journal | Future Generation Computer Systems |
| Volume | 176 |
| DOIs | |
| Publication status | Published - Mar 2026 |
Keywords
- Attention mechanism
- Imbalanced data
- Multi-party data synthesis
- Synthetic data
- Tabular data
Fingerprint
Dive into the research topics of 'A distributive and attentive generative model for multi-party data synthesis in highly imbalanced data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver