An Audio Generation Model Based On Empirical Mode Decomposition And Generative Adversarial Networks For Enhancing Voice Quality And DiversityID: 2363 Abstract :This Paper Presents A Novel Audio Generation Framework Called EMDGAN, Which Integrates Improved Complete Ensemble Empirical Mode Decomposition (ICEEMD) With Generative Adversarial Networks (GANs) To Enhance Speech Quality And Diversity. The Proposed System Decomposes Speech Signals Into Intrinsic Mode Functions (IMFs) Before Adversarial Training, Allowing The Model To Better Capture Non-stationary And Nonlinear Characteristics Of Speech. Unlike Conventional WaveGAN, The Proposed Architecture Employs Multiple Generators Corresponding To Decomposed Signal Components And A Discriminator Optimized Using WGANGP Loss. Objective Evaluation Using Inception Score (IS) And Fréchet Inception Distance (FID), Along With Subjective Mean Opinion Score (MOS) Testing, Confirms Improved Clarity And Diversity. Furthermore, A Two-stage Filtering Process Is Introduced To Automatically Select High-quality Generated Samples. Experimental Results Demonstrate That EMDGAN Outperforms WaveGAN In Both Perceptual Quality. Keywords— GAN, ICEEMD, Audio Generation, Speech Enhancement, Data Augmentation, WGAN-GP. |
Published:02-4-2026 Issue:Vol. 26 No. 4 (2026) Page Nos:226-231 Section:Articles License:This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. How to CiteDR.S. Lakshmikantha Reddy, Boya Mahalakshmi, Gajjala Venkat Varshitha, A Narendra, Dudkula Vali, An audio generation model based on empirical mode decomposition and generative adversarial networks for enhancing voice quality and diversity , 2026, International Journal of Engineering Sciences and Advanced Technology, 26(4), Page 226-231, ISSN No: 2250-3676. |