Publications

Google Scholar

2026

ICML

Twins: Learn to Predict Unified Representations with Focal Loss

Kaixiong Gong*, Xin Cai*, Bin Lin, Hao Wang, Yunlong Lin, Mingzhe Zheng, Bohao Li, Jian-Wei Zhang, Miles Yang, Zhao Zhong, Liefeng Bo, Xiangyu Yue

International Conference on Machine Learning (ICML), 2026

ICML

SpaceVista: All-Scale Visual Spatial Reasoning from mm to km

Peiwen Sun, Shiqiang Lang, Dongming Wu, Yi Ding, Kaituo Feng, Huadai Liu, Zhen Ye, Rui Liu, Yun-Hui Liu, Jianan Wang, Xiangyu Yue

International Conference on Machine Learning (ICML), 2026

ICML

Elastic Diffusion Transformer

Jiangshan Wang, Zeqiang Lai, Jiarui Chen, Jiayi Guo, Hang Guo, Xiu Li, Xiangyu Yue, Chunchao Guo

International Conference on Machine Learning (ICML), 2026

ICML

MVISTA-4D: View-Consistent 4D World Model with Test-Time Action Inference for Robotic Manipulation

Jiaxu Wang, Yicheng Jiang, Tianlun He, Jingkai Sun, Qiang Zhang, Junhao He, Jiahang Cao, Zesen Gan, Mingyuan Sun, Qiming Shao, Xiangyu Yue

International Conference on Machine Learning (ICML), 2026

ICML

VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning

Qunzhong Wang, Jie Liu, Jiajun Liang, Yilei Jiang, Yuanxing Zhang, Yaozhi Zheng, Xintao Wang, Pengfei Wan, Xiangyu Yue, Jiaheng Liu

International Conference on Machine Learning (ICML), 2026

ICML

MIND: Multi-rationale INtegrated Discriminative Reasoning Framework for Multi-modal Large Models

Chuang Yu, Jinmiao Zhao, Mingxuan Zhao, Yunpeng Liu, Xiujun Shu, Yuanhao Feng, Bo Wang, Xiangyu Yue

International Conference on Machine Learning (ICML), 2026

RSS

RISE: Self-Improving Robot Policy with Compositional World Model

Jiazhi Yang*, Kunyang Lin*, Jinwei Li, Wencong Zhang, Tianwei Lin, Longyan Wu, Zhizhong Su, Hao Zhao, Ya-Qin Zhang, Li Chen, Ping Luo, Xiangyu Yue, Hongyang Li

Robotics: Science and Systems (RSS), 2026

ICRA

Learning Structural Latent Points for Efficient Visual Representations in Robotic Manipulation

Yicheng Jiang*, Jiaxu Wang*, Junhao He, Zesen Gan, Junhao Li, Qiang Zhang, Jingkai Sun, Jiahang Cao, Mingyuan Sun, Xiangyu Yue, Qiming Shao

IEEE International Conference on Robotics and Automation (ICRA), 2026

AAAI

SpatialLogic-Bench: A Diagnostic Benchmark for Task-Oriented Spatiotemporal Reasoning

Xiaoda Yang, Shenzhou Gao, Can Wang, Jiahe Zhang, Menglan Tang, Jingyang Xue, Sheng Liu, Peijian Zhang, Yao Mu, Xiangyu Yue

AAAI Conference on Artificial Intelligence (AAAI), 2026

ACL Oral

Probing Audio-Visual Reasoning in Multimodal Language Models through the Lens of Audio

Kaixiong Gong*, Kaituo Feng*, Bohao Li*, Yibing Wang, Mofan Cheng, Shijia Yang, Jiaming Han, Benyou Wang, Yutong Bai, Zhuoran Yang, Xiangyu Yue

Annual Meeting of the Association for Computational Linguistics (ACL), 2026 (Oral)

ACL

AdaTooler-V: Adaptive Tool-Use for Images and Videos

Chaoyang Wang, Kaituo Feng, Dongyang Chen, Zhongyu Wang, Zhixun Li, Sicheng Gao, Meng Meng, Xu Zhou, Manyuan Zhang, Yuzhang Shang, Xiangyu Yue

Findings of the Association for Computational Linguistics (ACL Findings), 2026

ACL

Exploring Reasoning Reward Model for Agents

Kaixuan Fan, Kaituo Feng, Manyuan Zhang, Tianshuo Peng, Zhixun Li, Yilei Jiang, Shawn Chen, Peng Pei, Xunliang Cai, Xiangyu Yue

Findings of the Association for Computational Linguistics (ACL Findings), 2026

ACL

Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models

Hao Wang*, Hao Gu*, Hongming Piao, Kaixiong Gong, Yuxiao Ye, Xiangyu Yue, Sirui Han, Yike Guo, Dapeng Wu

Annual Meeting of the Association for Computational Linguistics (ACL), 2026

CVPR

StyleDoctor: Towards Specialist Reward Model for Style-centric Generation Tasks

Xilin He, Xiaole Xian, Xiangyu Yue, Muhammad Haris Khan

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2026

CVPR

LATTICE: Democratize High-Fidelity 3D Generation at Scale

Zeqiang Lai, Yunfei Zhao, Zibo Zhao, Haolin Liu, Qingxiang Lin, Jingwei Huang, Chunchao Guo, Xiangyu Yue

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2026

CVPR Highlight

NaTex: Seamless Texture Generation as Latent Color Diffusion

Zeqiang Lai, Yunfei Zhao, Zibo Zhao, Xin Yang, Xin Huang, Jingwei Huang, Xiangyu Yue, Chunchao Guo

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2026 (Highlight)

CVPR

3D-Aware Multi-Task Learning with Cross-View Correlations for Dense Scene Understanding

Xiaoye Wang, Chen Tang, Xiangyu Yue, Wei-Hong Li

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2026

CVPR

OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models

Zhenyu Wu, Jingjing Xie, Zehao Li, Bowen Yang, Qiushi Sun, Zhaoyang Liu, Zhoumianze Liu, Yu Qiao, Xiangyu Yue, Zun Wang, Zichen Ding

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2026

CVPR

Transition Models: Rethinking the Generative Learning Objective

Zidong Wang, Yiyuan Zhang, Xiaoyu Yue, Xiangyu Yue, Yangguang Li, Wanli Ouyang, Lei Bai

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2026

CVPR

Language Does Matter for Cross-Domain Few-Shot Visual Feature Enhancement

Fei Zhou, Xiwen Zhang, Qingqing Qiu, Lei Zhang, Wei Wei, Chen Ding, Yi Zhang, Liang Li, Xiangyu Yue, Yanning Zhang

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2026

CVPR

MMBench-GUI: A Unified Hierarchical Evaluation Framework for Multi-Platform GUI Agents

Xuehui Wang, Zhenyu Wu, JingJing Xie, Zichen Ding, Bowen Yang, Zehao Li, Zhaoyang Liu, Qingyun Li, Xuan Dong, Zhe Chen, Weiyun Wang, Xiangyu Zhao, Jixuan Chen, Haodong Duan, Tianbao Xie, Chenyu Yang, Shiqian Su, Yue Yu, Yanting Zhang, Xiangyu Yue, Weijie Su, Xizhou Zhu, Wei Shen, Jifeng Dai, Wenhai Wang

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2026

CVPR

OneThinker: All-in-one Reasoning Model for Image and Video

Kaituo Feng, Manyuan Zhang, Hongyu Li, Kaixuan Fan, Shuang Chen, Yilei Jiang, Dian Zheng, Peiwen Sun, Yiyuan Zhang, Haoze Sun, Yan Feng, Peng Pei, Xunliang Cai, Xiangyu Yue

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2026

CVPR

VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning

Minghong Cai, Qiulin Wang, Zongli Ye, Wenze Liu, Quande Liu, Weicai Ye, Xintao Wang, Pengfei Wan, Kun Gai, Xiangyu Yue

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Findings, 2026

CVPR

Evolve Vision-Language-Action Model into an Agent with On-the-fly Tool-use

Ding Yi, Yanzhao Yu, Xili Dai, Xianbiao Qi, Peiwen Sun, Xueqian Wang, Xiangyu Yue, Jianan Wang

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Findings, 2026

ICLR

PreciseCache: Precise Feature Caching for Efficient and High-fidelity Video Generation

Jiangshan Wang, Kang Zhao, Jiayi Guo, Jiayu Wang, Hang Guo, Chenyang Zhu, Xiu Li, Xiangyu Yue

International Conference on Learning Representations (ICLR), 2026

ICLR

SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward

Kaixuan Fan*, Kaituo Feng*, Haoming Lyu, Dongzhan Zhou, Xiangyu Yue

International Conference on Learning Representations (ICLR), 2026

ICLR

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

Zhaoyang Liu, Jingjing Xie, Zichen Ding, Zehao Li, Bowen Yang, Zhenyu Wu, Xuehui Wang, Qiushi Sun, Shi Liu, Weiyun Wang, Shenglong Ye, Qingyun Li, Zeyue Tian, Gen Luo, Xiangyu Yue, Biqing Qi, Kai Chen, Bowen Zhou, Yu Qiao, Qifeng Chen, Wenhai Wang

International Conference on Learning Representations (ICLR), 2026

ICLR

MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence

Sihan Yang*, Runsen Xu*, Yiman Xie, Sizhe Yang, Mo Li, Jingli Lin, Chenming Zhu, Xiaochen Chen, Haodong Duan, Xiangyu Yue, Dahua Lin, Tai Wang, Jiangmiao Pang

International Conference on Learning Representations (ICLR), 2026

ICLR

Consistent Noisy Latent Rewards for Trajectory Preference Optimization in Diffusion Models

Xiaole Xian, Xilin He, Wenting Chen, Wenshuang Liu, Wenqi Mu, Yancheng He, Liang Li, Yi Zhang, Xiangyu Yue

International Conference on Learning Representations (ICLR), 2026

2025

NeurIPS

Video-R1: Reinforcing Video Reasoning in MLLMs

Kaituo Feng, Kaixiong Gong, Bohao Li, Zonghao Guo, Yibing Wang, Tianshuo Peng, Junfei Wu, Xiaoying Zhang, Benyou Wang, Xiangyu Yue

NeurIPS 2025 Most Influential Paper Top 10

Advances in Neural Information Processing Systems (NeurIPS), 2025

NeurIPS

Native-Resolution Image Synthesis

Zidong Wang, Lei Bai, Xiangyu Yue, Wanli Ouyang, Yiyuan Zhang

Advances in Neural Information Processing Systems (NeurIPS), 2025

NeurIPS Highlight

ReSim: Reliable World Simulation for Autonomous Driving

Jiazhi Yang, Kashyap Chitta, Shenyuan Gao, Long Chen, Yuqian Shao, Xiaosong Jia, Hongyang Li, Andreas Geiger, Xiangyu Yue, Li Chen

Advances in Neural Information Processing Systems (NeurIPS), 2025 (Highlight)

NeurIPS

Learning to Integrate Diffusion ODEs by Averaging the Derivatives

Wenze Liu, Xiangyu Yue

Advances in Neural Information Processing Systems (NeurIPS), 2025

NeurIPS

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Jiaming Han, Hao Chen, Yang Zhao, Hanyu Wang, Qi Zhao, Ziyan Yang, Hao He, Xiangyu Yue, Lu Jiang

Advances in Neural Information Processing Systems (NeurIPS), 2025

NeurIPS

Fira: Can We Achieve Full-rank Training of LLMs under Low-rank Constraint?

Xi Chen, Kaituo Feng, Changsheng Li, Xunhao Lai, Xiangyu Yue, Ye Yuan, Guoren Wang

Advances in Neural Information Processing Systems (NeurIPS), 2025

ICCV

From Easy to Hard: Progressive Active Learning Framework for Infrared Small Target Detection with Single Point Supervision

Chuang Yu, Jinmiao Zhao, Yunpeng Liu, Sicheng Zhao, Yimian Dai, Xiangyu Yue

IEEE/CVF International Conference on Computer Vision (ICCV), 2025

ICCV

Chimera: Improving Generalist Model with Domain-Specific Experts

Tianshuo Peng, Mingsheng Li, Jiakang Yuan, Hongbin Zhou, Renqiu Xia, Renrui Zhang, Lei Bai, Song Mao, Bin Wang, Aojun Zhou, Botian Shi, Tao Chen, Bo Zhang, Xiangyu Yue

IEEE/CVF International Conference on Computer Vision (ICCV), 2025

ICCV

CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation

Jianyu Wu, Yizhou Wang, Xiangyu Yue, Xinzhu Ma, Jinyang Guo, Dongzhan Zhou, Wanli Ouyang, Shixiang Tang

IEEE/CVF International Conference on Computer Vision (ICCV), 2025

ICCV

FairGen: Enhancing Fairness in Text-to-Image Diffusion Models via Self-Discovering Latent Directions

Yilei Jiang, Wei-Hong Li, Yiyuan Zhang, Minghong Cai, Xiangyu Yue

IEEE/CVF International Conference on Computer Vision (ICCV), 2025

ICCV Highlight

Unleashing Vecset Diffusion Model for Fast Shape Generation

Zeqiang Lai, Yunfei Zhao, Zibo Zhao, Haolin Liu, Fuyun Wang, Huiwen Shi, Xianghui Yang, Qingxiang Lin, Jingwei Huang, Yuhong Liu, Jie Jiang, Chunchao Guo, Xiangyu Yue

IEEE/CVF International Conference on Computer Vision (ICCV), 2025

ICCV

HypDAE: Hyperbolic Diffusion Autoencoders for Hierarchical Few-shot Image Generation

Lingxiao Li, Kaixuan Fan, Boqing Gong, Xiangyu Yue

IEEE/CVF International Conference on Computer Vision (ICCV), 2025

ICCV

SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data

Xilin He, Cheng Luo, Xiaole Xian, Bing Li, Muhammad Haris Khan, Zongyuan Ge, Weicheng Xie, Siyang Song, Linlin Shen, Bernard Ghanem, Xiangyu Yue

IEEE/CVF International Conference on Computer Vision (ICCV), 2025

ICCV

Scaling Omni-modal Pretraining with Multimodal Context: Advancing Universal Representation Learning Across Modalities

Yiyuan Zhang, Handong Li, Jing Liu, Xiangyu Yue

IEEE/CVF International Conference on Computer Vision (ICCV), 2025

ICCV

Breaking the Encoder Barrier for Seamless Video-Language Understanding

Handong Li, Yiyuan Zhang, Longteng Guo, Xiangyu Yue, Jing Liu

IEEE/CVF International Conference on Computer Vision (ICCV), 2025

ICCV

Learning Beyond Still Frames: Scaling Vision-Language Models with Video

Yiyuan Zhang, Handong Li, Jing Liu, Xiangyu Yue

IEEE/CVF International Conference on Computer Vision (ICCV), 2025

CVPR

RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models

Haoran Hao*, Jiaming Han*, Changsheng Li, Yu-Feng Li, Xiangyu Yue

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025

CVPR

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation

Minghong Cai, Xiaodong Cun, Xiaoyu Li, Wenze Liu, Zhaoyang Zhang, Yong Zhang, Ying Shan, Xiangyu Yue

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025

CVPR

SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance

Peishan Cong*, Ziyi Wang*, Yuexin Ma, Xiangyu Yue

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025

CVPR

UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines

Chen Tang, Xinzhu Ma, Encheng Su, Xiufeng Song, Xiaohong Liu, Wei-Hong Li, Lei Bai, Wanli Ouyang, Xiangyu Yue

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025

2024

Preprint

Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations

Yiyuan Zhang, Xiaohan Ding, Xiangyu Yue

Preprint, 2024

CVPR

OneLLM: One Framework to Align All Modalities with Language

Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024

CVPR

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Yiyuan Zhang, Xiaohan Ding, Kaixiong Gong, Yixiao Ge, Ying Shan, Xiangyu Yue

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024

CVPR

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

Xiaohan Ding, Yiyuan Zhang, Yixiao Ge, Sijie Zhao, Lin Song, Xiangyu Yue, Ying Shan

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024

2023

Preprint

Meta-Transformer: A Unified Framework for Multimodal Learning

Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, Xiangyu Yue

Preprint, 2023

Preprint

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei Zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, Yu Qiao

Preprint, 2023