Building multimodal generative audio systems for more immersive listening experiences.
Focus: Multi-Modal Generative AI · Speech Synthesis · Spatial Audio
I am an undergraduate student at Turing Class, Chu Kochen Honors College, Zhejiang University, majoring in Artificial Intelligence. I currently work in the Audio Research Team at Zhejiang University, under the supervision of Prof. Zhou Zhao.
My research interest lies in Multi-Modal Generative AI, with a particular focus on Speech Synthesis and Spatial Audio Generation. My work aims to build immersive auditory experiences through advanced generative modeling (e.g., Flow Matching). I have papers published or under review at top-tier venues including NeurIPS, ACM MM, and ACL.
I am always open to potential collaborations and opportunities. Feel free to reach out.
🔥 News
📝 Publications

CSAVocoder: A Causal Spatial Audio Vocoder Towards Real-Time Spatial Audio Generation
#Zhiyuan Zhu, #Han Wang, et al.
- We introduce CSAVocoder, a strictly causal streaming neural vocoder. It features a Spatial Adaptor to fuse pose information and a Spatial Consistency Discriminator to explicitly supervise inter-channel phase and level differences.
- The model achieves high-fidelity waveform reconstruction while preserving precise spatial rendering, all within a constant memory overhead suitable for real-time streaming.

Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios
#Changhao Pan, #Rui Yang, #Han Wang, et al.
- We propose LFS-Bench, a standardized benchmark decomposing “long-form quality” into acoustics, semantics, and expressiveness. It includes 1,101 samples spanning 17 diverse scenarios (e.g., dialogues, audiobooks).
- Our extensive experiments reveal that current SOTA models still struggle significantly with consistency and hierarchy in highly expressive scenarios compared to real recordings.
[A Multimodal Evaluation Framework for Spatial Audio Playback Systems: From Localization to Listener Preference]
Changhao Pan, Wenxiang Guo, Yu Zhang, Zhiyuan Zhu, Zhetao Chen, Han Wang, Zhou Zhao ACM MM 2025
[MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations]
Wenxiang Guo, Changhao Pan, Zhiyuan Zhu, Xintong Hu, Yu Zhang, Li Tang, Rui Yang, Han Wang, et al. NeurIPS 2025
🎖 Honors and Awards
- 2024 First-class Scholarship in Zhejiang University
- 2025 Second-class Scholarship in Zhejiang University
- 2025 National Student Research Training Program
📖 Education
- 2023.08 - Present Undergraduate, Chu Kochen Honors College, Zhejiang University
💻 Internships
- 2024.04 – 2025.12 Research Assistant, Audio Research Team, Zhejiang University.
Under the supervision of Prof. Zhou Zhao. - 2026.01 – Present MLE Intern (TTS), VUI Lab, Hangzhou.
Mentored by Mengxiao Bi; under the supervision of Prof. Yanmin Qian.