My name is Zhang Yuanhang (Chinese: “张远航”). I received my bachelor’s degree in Computer Science from University of Chinese Academy of Sciences in 2019. Currently, I am a fourth-year Ph.D. student at VIPL Group, Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS) , supervised by Prof. Shiguang Shan and Prof. Shuang Yang. My main research interests are deep learning applications that combine vision and speech, as well as multi-modal and self-supervised learning. I have also worked on interdisciplinary projects with Prof. Qun Zheng from Department of Foreign Languages, University of Chinese Academy of Sciences, where we attempt to build an automated spoken English rating system based on interpretable discourse coherence indicators.

This website features a collection of lecture notes and reference material I gathered or compiled while studying at UCAS, which I hope future students find useful. Sometimes I also write short surveys of recent research advances, or post Chinese translations for academic contents that are of interest. I honed much of my English skills working as a translator and proofreader at ACI Chinese Fansub Group (ACICFG) over more than a decade.

Updates

  • Feb. 27, 2024: One paper on audio-visual speech representation learning accepted at CVPR 2024.
  • Jun. 19, 2022: Our team (ICTCAS-UCAS) is a repeat winner of this year’s AVA Active Speaker Task at ActivityNet 2022 with a score of 94.47% mAP.
  • Jul. 2021: One paper on active speaker detection accepted for oral presentation at ACM MM 2021. It forms the basis of our ActivityNet submission.
  • Jun. 19, 2021: Our team (ICTCAS-UCAS-TAL) won this year’s AVA Active Speaker Task at ActivityNet 2021 with a score of 93.44% mAP – a 50% relative error reduction over the previous best score! You can watch our talk here. Cool work with my new teammate Susan :-)
  • Sep. 29, 2020: Third prize at the “Gulian Cup” Classical Chinese Literature Named Entity Recognition Challenge, hosted at CCL 2020.
  • Jan. 30, 2020: Two papers accepted at FG 2020. Do we really need pre-cropped lip regions for visual speech recognition? See: [SyncedReview] [Paper]
  • Aug. 9, 2019: Our lip reading system has been awarded “Innovation Star” at the First China Artificial Intelligence Summit in Xiamen, China. [China Science Daily]
  • Jun. 17, 2019: Jingyun and I won runner-up for the active speaker detection task at AVA Challenge 2019!
  • Apr. 4, 2019: We have an oral paper accepted at FG 2019.
  • Oct. 20, 2018: Our lip reading system is demonstrated at Season 2 of the “AI Mission” program, aired at prime time on China Central Television. [Video] [Henan Daily]