I am an undergraduate student at Turing Class, Chu Kochen Honors College, Zhejiang University, majoring in Artificial Intelligence. Currently, I work in the Audio Research Team at Zhejiang University, under the supervision of Prof. Zhou Zhao.

My research interest lies on Multi-Modal Generative AI, with a particular focus on Speech Synthesis and Spatial Audio Generation. My work aims to build immersive auditory experiences through advanced generative modeling (e.g., Flow Matching). Currently, I have several papers published or under review at top-tier venues including NeurIPS, ACM MM, and ACL.

I am always open to potential collaborations and seeking opportunities to push the boundaries of AI. Contact me for any exciting projects or discussions!

🔥 News

  • 2026.1  🎉🎉 Started my internship at Luna Lab(宇生月伴)as a text-to-speech model researcher!
  • 2025.12:  🎉🎉 Submitted two papers to ACL 2026

📝 Publications

ACL 2026(Under Review)
sym

CSAVocoder: A Causal Spatial Audio Vocoder Towards Real-Time Spatial Audio Generation

#Zhiyuan Zhu, #Han Wang, et al.

  • We introduce CSAVocoder, a strictly causal streaming neural vocoder. It features a Spatial Adaptor to fuse pose information and a Spatial Consistency Discriminator to explicitly supervise inter-channel phase and level differences.
  • The model achieves high-fidelity waveform reconstruction while preserving precise spatial rendering, all within a constant memory overhead suitable for real-time streaming.
ACL 2026(Under Review)
sym

Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios

#Changhao Pan, #Rui Yang, #Han Wang, et al.

  • We propose LFS-Bench, a standardized benchmark decomposing “long-form quality” into acoustics, semantics, and expressiveness. It includes 1,101 samples spanning 17 diverse scenarios (e.g., dialogues, audiobooks).
  • Our extensive experiments reveal that current SOTA models still struggle significantly with consistency and hierarchy in highly expressive scenarios compared to real recordings.
  • [A Multimodal Evaluation Framework for Spatial Audio Playback Systems: From Localization to Listener Preference] Changhao Pan, Wenxiang Guo, Yu Zhang, Zhiyuan Zhu, Zhetao Chen, Han Wang, Zhou Zhao ACM MM 2025
  • [MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations] Wenxiang Guo, Changhao Pan, Zhiyuan Zhu, Xintong Hu, Yu Zhang, Li Tang, Rui Yang, Han Wang, et al. NeurIPS 2025

🎖 Honors and Awards

  • 2024 First-class Scholarship in Zhejiang University
  • 2025 Second-class Scholarship in Zhejiang University
  • 2025 National Student Research Training Program

📖 Educations

  • 2023.8 - now Undergraduate, Chu Kochen Honors College, Zhejiang Univeristy

💻 Internships

  • 2024.4 - 2025.12 Research Assisant in Audio Research Team at Zhejiang University . Advisor: Prof. Zhou Zhao 赵洲.
  • 2026.1 - now MLE Intern Algorithm Luna Lab, Hangzhou(https://vuilabs.cn/)