I am an undergraduate student at Turing Class, Chu Kochen Honors College, Zhejiang University, majoring in Artificial Intelligence. Currently, I work in the Audio Research Team at Zhejiang University, under the supervision of Prof. Zhou Zhao.
My research interest lies on Multi-Modal Generative AI, with a particular focus on Speech Synthesis and Spatial Audio Generation. My work aims to build immersive auditory experiences through advanced generative modeling (e.g., Flow Matching). Currently, I have several papers published or under review at top-tier venues including NeurIPS, ACM MM, and ACL.
I am always open to potential collaborations and seeking opportunities to push the boundaries of AI. Contact me for any exciting projects or discussions!
🔥 News
- 2026.1 🎉🎉 Started my internship at Luna Lab(宇生月伴)as a text-to-speech model researcher!
- 2025.12: 🎉🎉 Submitted two papers to ACL 2026
📝 Publications

CSAVocoder: A Causal Spatial Audio Vocoder Towards Real-Time Spatial Audio Generation
#Zhiyuan Zhu, #Han Wang, et al.
- We introduce CSAVocoder, a strictly causal streaming neural vocoder. It features a Spatial Adaptor to fuse pose information and a Spatial Consistency Discriminator to explicitly supervise inter-channel phase and level differences.
- The model achieves high-fidelity waveform reconstruction while preserving precise spatial rendering, all within a constant memory overhead suitable for real-time streaming.

Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios
#Changhao Pan, #Rui Yang, #Han Wang, et al.
- We propose LFS-Bench, a standardized benchmark decomposing “long-form quality” into acoustics, semantics, and expressiveness. It includes 1,101 samples spanning 17 diverse scenarios (e.g., dialogues, audiobooks).
- Our extensive experiments reveal that current SOTA models still struggle significantly with consistency and hierarchy in highly expressive scenarios compared to real recordings.
- [A Multimodal Evaluation Framework for Spatial Audio Playback Systems: From Localization to Listener Preference] Changhao Pan, Wenxiang Guo, Yu Zhang, Zhiyuan Zhu, Zhetao Chen, Han Wang, Zhou Zhao ACM MM 2025
- [MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations] Wenxiang Guo, Changhao Pan, Zhiyuan Zhu, Xintong Hu, Yu Zhang, Li Tang, Rui Yang, Han Wang, et al. NeurIPS 2025
🎖 Honors and Awards
- 2024 First-class Scholarship in Zhejiang University
- 2025 Second-class Scholarship in Zhejiang University
- 2025 National Student Research Training Program
📖 Educations
- 2023.8 - now Undergraduate, Chu Kochen Honors College, Zhejiang Univeristy
💻 Internships
- 2024.4 - 2025.12 Research Assisant in Audio Research Team at Zhejiang University . Advisor: Prof. Zhou Zhao 赵洲.
- 2026.1 - now MLE Intern Algorithm Luna Lab, Hangzhou(https://vuilabs.cn/)