Zheng-yuan Yang

I am currently a Researcher at Microsoft. I received my Ph.D. degree in Computer Science at University of Rochester, Rochester, NY, advised by Prof. Jiebo Luo. I did my bachelors at the University of Science and Technology of China. I've received Twitch Research Fellowship and the ICPR 2018 Best Industry Related Paper Award. My research interests involve the intersection of computer vision and natural language processing.


Email  /  CV  /  Github  /  Google Scholar  /  LinkedIn

  • [2021/09]   Can GPT-3 benefit multimodal tasks? We provide an empirical study of GPT-3 for knowledge-based VQA, named PICa. We show that prompting GPT-3 via the use of image captions with only 16 examples surpasses supervised sota by an absolute +8.6 points on the OK-VQA dataset (from 39.4 to 48.0).
  • [2021/07]   Two papers accepted by ICCV 2021 (The SAT paper was selected as Oral).
  • [2021/06]   We are the winner of TextCaps Challenge 2021 and ReferIt3D Challenge 2021. Welcome to check the related TAP and SAT papers.
  • [2021/06]   I defensed my Ph.D. dissertation "Visual Grounding: Building Cross-Modal Visual-Text Alignment" and will join Microsoft as a Researcher.
  • [2021/05]   I am selected as one of Outstanding Reviewer for CVPR 2021.
  • [2021/02]   Two papers accepted by CVPR 2021 (The TAP paper was selected as Oral).

  • Research

    My current research mainly focues on vision+language. I've also worked on human-centered visual understanding, such as human action recognition and parsing. Representative works are highlighted.

  • Click for zooming up.
  • An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
    Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, Lijuan Wang
    Technical report
    [PDF] [Code] [Bibtex]

    Can GPT-3 benefit multimodal tasks? We provide an empirical study of GPT-3 for knowledge-based VQA, named PICa.

  • Click for zooming up.
  • SAT: 2D Semantics Assisted Training for 3D Visual Grounding
    Zhengyuan Yang, Songyang Zhang, Liwei Wang, Jiebo Luo
    ICCV 2021. (oral presentation)
    [PDF] [Code] [Bibtex] [Benchmarks]

    Boosting 3D visual grounding by using training-time 2D semantics.

    #1 in referit3d CVPR 2021 challenge.

  • Click for zooming up.
  • TransVG: End-to-End Visual Grounding with Transformers
    Jiajun Deng, Zhengyuan Yang, Tianlang Chen, Wengang Zhou, Houqiang Li
    ICCV 2021.
    [PDF] [Code] [Bibtex]

    A transformer-based framework for visual grounding.

  • Click for zooming up.
  • TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
    Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Florencio, Lijuan Wang, Cha Zhang, Lei Zhang, Jiebo Luo
    CVPR 2021. (oral presentation)
    [PDF] [Code] [Poster] [Video] [Bibtex]

    We propose Text-Aware Pre-training (TAP) for Text-VQA and Text-Caption tasks.

    #1 in TextCaps CVPR 2021 challenge.

    Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation
    Liwei Wang, Jing Huang, Yin Li, Kun Xu, Zhengyuan Yang, Dong Yu
    CVPR 2021.
    [PDF] [Code] [Bibtex]

    A weakly supervised visual grounding method that removes the need of object detection at test time.

  • Click for zooming up.
  • Grounding-Tracking-Integration
    Zhengyuan Yang, Tushar Kumar, Tianlang Chen, Jingsong Su, Jiebo Luo
    IEEE T-CSVT.
    [PDF] [Annotations] [Demo1] [Demo2] [Bibtex]

    A simple yet effective modular framework for tracking by natural language specification.

  • Click for zooming up.
  • Improving One-stage Visual Grounding by Recursive Sub-query Construction
    Zhengyuan Yang, Tianlang Chen, Liwei Wang, Jiebo Luo
    ECCV 2020.
    [PDF] [Code] [Slides] [Video] [Bibtex]

    Improving one-stage visual grounding by addressing previous weaknesses in modeling long and complex queries.

  • Click for zooming up.
  • Dynamic Context-guided Capsule Network for Multimodal Machine Translation
    Huan Lin, Fandong Meng, Jinsong Su, Yongjing Yin, Zhengyuan Yang, Yubin Ge, Jie Zhou, Jiebo Luo
    ACMMM 2020. (oral presentation)
    [PDF] [Code] [Bibtex]

    we propose a novel Dynamic Context-guided Capsule Network (DCCN) for multimodal machine translation.

  • Click for zooming up.
  • A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation
    Yongjing Yin, Fandong Meng, Jinsong Su, Chulun Zhou, Zhengyuan Yang, Jie Zhou, Jiebo Luo
    ACL 2020.
    [PDF] [Bibtex]

    Multi-modal neural machine translation (NMT) with fine-grained cross-modality semantic correspondence.

  • Click for zooming up.
  • A Fast and Accurate One-Stage Approach to Visual Grounding
    Zhengyuan Yang, Boqing Gong, Liwei Wang, Wenbing Huang, Dong Yu, Jiebo Luo
    ICCV 2019. (oral presentation) (187/4303=4.3%)
    [PDF] [Code] [Slides] [Poster] [Bibtex]

    A simple, fast, and accurate one-stage approach to visual grounding. 10 times faster and 7~20% higher in accuracy.

  • Click for zooming up.
  • Weakly Supervised Body Part Parsing with Pose based Part Priors
    Zhengyuan Yang, Yuncheng Li, Linjie Yang, Ning Zhang, Jiebo Luo
    ICPR 2020.
    [PDF] [Demo] [Poster] [Slides] [Video] [Bibtex]

    Weakly-supervised body part parsing that achieves comparable results to the fully-supervised method with a same backbone.

  • Click for zooming up.
  • Pose-based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation
    Zhengyuan Yang, Amanda Kay, Yuncheng Li, Wendi Cross, Jiebo Luo
    ICPR 2020.
    [PDF] [Poster] [Slides] [Video] [Bibtex]

    A pose-based body language recognition framework for body language recognition and emotion interpretation.

  • Click for zooming up.
  • Attentive Relational Networks for Mapping Images to Scene Graphs
    Mengshi Qi, Weijian Li, Zhengyuan Yang, Yunhong Wang, Jiebo Luo
    CVPR 2019.
    [PDF] [Bibtex]

    A novel Attentive Relational Network for scene graph generation.

  • Click for zooming up.
  • Action Recognition with Spatio-Temporal Visual Attention on Skeleton Image Sequences
    Zhengyuan Yang, Yuncheng Li, Jianchao Yang, Jiebo Luo
    ICPR 2018; IEEE T-CSVT
    [PDF] [UCF-Motion-Joints] [Bibtex]

    A CNN-based approach for skeleton-based action recognition. SOTA on both clean 3D joints and noisy 2D estimated keypoints.

  • Click for zooming up.
  • Human-Centered Emotion Recognition in Animated GIFs with Facial Landmarks
    Zhengyuan Yang, Yixuan Zhang, Jiebo Luo
    ICME 2019.
    [PDF] [Data] [Bibtex]

    Focusing on human faces to improve emotion recognition.

    End-to-end Multi-Modal Multi-Task Vehicle Control for Self-Driving Cars with Visual Perception
    Zhengyuan Yang, Jerry Yu, Junjie Cai, Jiebo Luo
    ICPR 2018. Best Industry Related Paper Award (BIRPA) (1/1258=0.08%)
    [PDF] [Demo] [Bibtex]

    Building a prototype that controls the self-driving car's steering angle and speed. Check out the demo that we recorded in the vehicle!

    Internship

    Microsoft, Redmond, WA
    May - Aug 2020. Advisor: Yijuan Lu, Jianfeng Wang, Xi Yin.
    Project: Text-aware pre-training for Text-VQA and Text-Caption.

    Tencent AI Lab, Bellevue, WA
    Jan - Apr 2019. Advisor: Boqing Gong, Liwei Wang.
    Project: Visual Grounding with Natural Language Quires.

    SnapChat, Venice, CA
    May - Aug 2018. Advisor: Yuncheng Li, Linjie Yang, Ning Zhang.
    Project: Weakly Supervised Human Part Parsing.

    SAIC Innovation Center, San Jose, CA
    Jun - Aug 2017. Advisor: Jerry Yu.
    Project: Steering Angle Control with End-to-end Neural Networks.

    Awards

  • Winner of CVPR 2021 TextCaps Challenge
  • Winner of CVPR 2021 ReferIt3D Challenge
  • Twitch Research Fellowship
  • Best Industry Related Paper Award at ICPR 2018
  • Publications
  • Zhengyuan Yang, Songyang Zhang, Liwei Wang, Jiebo Luo, "SAT: 2D Semantics Assisted Training for 3D Visual Grounding," International Conference on Computer Vision (ICCV), Oct 2021. (oral presentation) [PDF]
  • Jiajun Deng, Zhengyuan Yang, Tianlang Chen, Wengang Zhou, Houqiang Li, "TransVG: End-to-End Visual Grounding with Transformers," International Conference on Computer Vision (ICCV), Oct 2021. [PDF]
  • Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Florencio, Lijuan Wang, Cha Zhang, Lei Zhang, Jiebo Luo, "TAP: Text-Aware Pre-training for Text-VQA and Text-Caption," Conference on Computer Vision and Pattern Recognition (CVPR), June 2021. (oral presentation) [PDF]
  • Liwei Wang, Jing Huang, Yin Li, Kun Xu, Zhengyuan Yang, Dong Yu, "Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation," Conference on Computer Vision and Pattern Recognition (CVPR), June 2021. [PDF]
  • Zhengyuan Yang, Tianlang Chen, Liwei Wang, Jiebo Luo, "Improving One-stage Visual Grounding by Recursive Sub-query Construction," European Conference on Computer Vision (ECCV), Glasgow, UK, August 2020. [PDF][Code]
  • Huan Lin, Fandong Meng, Jinsong Su, Yongjing Yin, Zhengyuan Yang, Yubin Ge, Jie Zhou, Jiebo Luo, "Dynamic Context-guided Capsule Network for Multimodal Machine Translation," ACM Multimedia Conference (ACMMM), Seattle, WA, October 2020. (oral presentation) [PDF][Code]
  • Yongjing Yin, Fandong Meng, Jinsong Su, Chulun Zhou, Zhengyuan Yang, Jie Zhou, Jiebo Luo, "A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation," Annual Meeting of the Association for Computational Linguistics (ACL), Seattle, WA, July 2020. [PDF]
  • Zhengyuan Yang, Tushar Kumar, Tianlang Chen, Jingsong Su, Jiebo Luo, "Grounding-Tracking-Integration," IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT). [PDF]
  • Zhengyuan Yang, Boqing Gong, Liwei Wang, Wenbing Huang, Dong Yu, Jiebo Luo, "A Fast and Accurate One-Stage Approach to Visual Grounding," International Conference on Computer Vision (ICCV), Seoul, South Korea, October 2019. (oral presentation) [PDF][Code]
  • Zhengyuan Yang, Yuncheng Li, Linjie Yang, Ning Zhang, Jiebo Luo, "Weakly Supervised Body Part Parsing with Pose based Part Priors," International Conference on Pattern Recognition (ICPR), Millan, Italy, January, 2020. [PDF] [Demo]
  • Zhengyuan Yang, Amanda Kay, Yuncheng Li, Wendi Cross, Jiebo Luo, "Pose-based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation," International Conference on Pattern Recognition (ICPR), Millan, Italy, January, 2020. [PDF]
  • Mengshi Qi, Weijian Li, Zhengyuan Yang, Yunhong Wang, Jiebo Luo, "Attentive Relational Networks for Mapping Images to Scene Graphs," Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, June 2019. [PDF]
  • Zhengyuan Yang, Yixuan Zhang, Jiebo Luo, "Human-Centered Emotion Recognition in Animated GIFs with Facial Landmarks," International Conference on Multimedia and Expo (ICME), Shanghai, China, July 2019. [PDF] [Data]
  • Zhengyuan Yang, Yuncheng Li, Jianchao Yang, Jiebo Luo, "Action Recognition with Spatio-Temporal Visual Attention on Skeleton Image Sequences," IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT). [PDF] [Data]
  • Zhengyuan Yang, Yuncheng Li, Jianchao Yang, Jiebo Luo, "Action Recognition with Visual Attention on Skeleton Images," International Conference on Pattern Recognition (ICPR), Beijing, China, August 2018. (oral presentation). [PDF]
  • Zhengyuan Yang, Yixuan Zhang, Jerry Yu, Junjie Cai, Jiebo Luo, "End-to-end Multi-Modal Multi-Task Vehicle Control for Self-Driving Cars with Visual Perceptions," International Conference on Pattern Recognition (ICPR), Beijing, China, August 2018. (oral presentation) Best Industry Related Paper Award (BIRPA). [PDF] [Demo]
  • Zhengyuan Yang, Wendi Cross, Jiebo Luo, "Personalized pose estimation for body language understanding," International Conference on Image Processing (ICIP), Beijing, China, September 2017. (oral presentation)
  • Service

  • Outstanding Reviewer, CVPR 2021
  • Journal Reviewer: TIP, TMM, TCybernetics, TCSVT, Pattern Recognition, Neurocomputing, TBioCAS, IEEE Access.
  • Conference Reviewer: CVPR, ICCV, NeurIPS, ICLR, ICML, ACL, EMNLP, AAAI, ACCV, WACV, ICME, ICIP.
  • Teaching

    TA CS246/446 - Spring 2018
    Machine Learning

    TA CS172 - Fall 2017
    Datastructures and Algorithms

    TA CS242 - Spring 2017
    Intro to Artificial Intelligence


    © 2021 Zhengyuan Yang. All rights reserved.
    Template borrowed from Jon Barron. Thanks!