XPENG Showcases Physical AI & World Model at CVPR 2026

XPENG INVITED TO CVPR FOR THE THIRD TIME, SHOWCASING CHINA'S ADVANCES IN PHYSICAL AI TO THE WORLD

2026-06-03

- XPENG at CVPR 2026: XPENG participated in CVPR 2026, where Dr. Xianming Liu, Head of General Intelligence Center at XPENG, delivered a keynote at the inaugural Workshop on Deployment of Foundation Models for Embodied AI (WDFM-EAI), sharing insights alongside global leaders including Tesla, NVIDIA, and Waymo.
- World Model Debut: XPENG unveiled its complete technical blueprint for the Physical-World Foundation Model, highlighting breakthroughs in Deliberative Reasoning, Controllable Generation, and Long-Horizon Forecasting.
- VLA 2.0 Mass Production: The VLA 2.0 has entered formal mass production, achieving an industry milestone with over 50% assisted driving mileage share within its first month of rollout.
- Physical AI Scaling: XPENG confirmed the successful validation of the Scaling Law, delivering a 4,360% surge in single-job training efficiency.
- Global Expansion: XPENG is accelerating the large-scale deployment of its three core Physical AI applications: VLA 2.0, Robotaxi, and Humanoid Robots.

Denver, June 3, 2026 — XPENG (NYSE: XPEV, HKEX: 9868), a leading China-based high-tech company, kicks off its key presence at CVPR 2026 (The IEEE/CVF Conference on Computer Vision and Pattern Recognition), which opens its annual session in Denver, Colorado. Dr. Xianming Liu, Head of General Intelligence Center at XPENG, spoke at the inaugural Workshop on Deployment of Foundation Models for Embodied AI (WDFM-EAI). He shared insights with global counterparts, including Tesla, NVIDIA and Waymo, as well as academic experts from elite institutes such as the University of California and the University of Toronto. This marks XPENG's third invitation to feature on the renowned international academic platform.

Among the most influential international academic gatherings, CVPR has long charted the developmental trajectory of cutting-edge technologies spanning artificial intelligence, autonomous driving and robotics. Centered on leading-edge industry themes, the 2026 conference debuts the dedicated embodied AI foundational model deployment workshop, where Dr. Xianming Liu unveiled XPENG's full technical roadmap for its world model for the first time. In addition, XPENG's research paper titled DrivePTS: A Progressive Learning Framework with Textual and Structural Enhancement for Driving Scene Generation has been accepted for publication at this year's CVPR.

Dr. Xianming Liu, Head of General Intelligence Center at XPENG, presented XPENG's Physical AI technology system on-site at CVPR 2026

From Technical Concept to Mass Production: VLA2.0 Achieves Key Technical Breakthrough

During the conference, Dr. Xianming Liu delivered a keynote titled Building the World Model for Autonomous Driving, in which he systematically deconstructed the iterative evolution path of XPENG's physical AI technology system: from concept validation and technical refinement to full-scale mass production and deployment.

XPENG first unveiled its in-house foundational model development framework at CVPR 2025. Just one year on, XPENG has secured a pivotal technical leap: its VLA2.0 ADAS, built atop the self-developed foundational model, has gone into formal mass production, marking an industry-leading closed loop from cutting-edge pre-research to large-scale commercial deployment. Boasting more secure, smoother and higher-efficiency performance than conventional model architectures, VLA2.0 has reshaped user experiences with advanced driver assistance. It set an industry milestone of over 50% assisted driving mileage share within its first month of OTA rollout, establishing a new benchmark for China's advanced assisted driving sector.

The VLA 2.0 was designed from the ground up for Level 4 autonomous driving, enabling a unified software architecture that spans both L2 and L4. Built on the XPENG GX platform, China's first fully self-developed Robotaxi has rolled off the production line with an effective onboard computing power of 3,000 TOPS.

XPENG's Robotaxi has rolled out in mass production, featuring VLA 2.0 foundation model and delivering L4 autonomous driving capabilities

First Disclosure of the Complete Technical Blueprint for World Models, Advancing Foundation Models for the Physical World

Centered on world model development, XPENG has gradually established a sophisticated technical system. At CVPR 2026, Dr. Xianming Liu unveiled XPENG's complete technical roadmap for its world model for the first time.

At CVPR, Dr. Xianming Liu further elaborated on XPENG's Physical AI layout and introduced the world model as another core pillar of the brand's foundational model system. He noted XPENG is developing a world model capable of Deliberative Reasoning, Controllable Generation, and Long-Horizon Forecasting. Instead of replacing or competing against one another, the world model and VLA2.0 complement each other by leveraging diverse training signals to boost the model's comprehension of the physical environment and its corresponding execution capabilities within real-world scenarios. XPENG's Physical-World Foundation Model, serving as both the VLA 2.0 and the World Model. Essentially, they are pursuing the same thing: iteratively scaling up model parameters, training datasets and task complexity to cultivate a robust foundational model tailored for the physical world.

Learning from Humans and Learning from the Real World

Dr. Xianming Liu explained that within XPENG's foundational model architecture, VLA2.0 learns primarily from human driving behaviors. It unifies modeling across video streams, human commands and vehicle motion outputs to master rational decision-making amid complicated traffic conditions. And the world model learns the underlying laws of the physical world via predicting future states and scene evolution, enabling controllable generation, long-horizon forecasting and causal reasoning. VLA2.0 teaches the model how to act, while the world model empowers it to understand how the surrounding world evolves after each action.

Combined, the two technologies aim to build a Physical AI foundational model that deeply perceives the real world and executes safe action.

Deliberative Reasoning, Controllable Generation and Long-Horizon Forecasting

XPENG identifies three indispensable capabilities for a high-performance world model: deliberative reasoning, controllable generation and long-horizon forecasting. This embodies intelligence and serves as the prerequisite for deploying World Models in autonomous driving. XPENG's R&D team has recently published a suite of world-model-focused research papers, including X-World, X-Foresight and X-Cache, detailing the company's R&D methodologies built around these core competencies.

X-World generates physically plausible future video sequences under specified motion inputs while sustaining robust controllability and stability throughout continuous generation. It has already been deployed across closed-loop simulation testing, online reinforcement learning and synthetic data generation workflows.

X-Foresight, fully integrated into VLA2.0's architecture, jointly predicts multi-view future imagery and ego-vehicle actions within a unified token space, delivering core decision support for VLA2.0's vehicle control logic.

X-Cache cuts roughly 70% of redundant computation with negligible image quality degradation and accelerates inference for the world model's denoising backbone by up to around 2.7x.

XPENG is also set to publish a technical report on X-Mind, which elaborates on the model's deliberative reasoning mechanism and visually illustrates intermediate inference behind driving decisions. Interpretability is critical for autonomous driving software performance tuning, building end-user trust and accelerating iterative model upgrades.

XPENG Technical Blueprint for World Models of the Physical World

Continuing Validation of Scaling Law: XPENG Accelerates Large-Scale Rollout of Physical AI
Pushing the limits of the Scaling Law has long been a core pursuit of XPENG's R&D team. Over the past one to two years, the team has consistently boosted foundational model performance by scaling up model size, computing power and training data. At present, VLA2.0 boasts billions of parameters and is trained on hundreds of millions of video clips, with over four trillion tokens consumed per model iteration.

In the 12 months ending March this year, XPENG's cluster delivered delivered a 1,010% uplift in per-GPU training efficiency and a 4,360% gain in single-job training efficiency, while GPU hardware utilization climbed from 40% to 90%, matching benchmarks set by top-tier global AI firms.

He Xiaopeng, Chairman and CEO of XPENG, previously commented: “The successful launch of the initial VLA2.0 build has validated the capability gains brought by scaling up dataset volume and model parameters, further cementing our belief in the Scaling Law for physical-world AI.” This has further strengthened XPENG's confidence in continuing to invest in Physical AI and their large-scale deployment.

XPENG is pushing ahead with the mass-production rollout of its major Physical AI applications: VLA2.0, Robotaxi and Humanoid Robots.

As VLA2.0 continues to evolve and upgrade its core capabilities, its integrated competency framework spanning environmental perception, reasoning & decision-making, and motion execution is rapidly expanding into a broader spectrum of embodied intelligence scenarios. XPENG's IRON humanoid robot has made steady progress on hardware and software development for its mass-production-ready version and is poised to enter the joint hardware-software integration phase. The company targets formal mass production by the end of this year, with the robots set to start working as in-store shopping guides at XPENG's offline retail outlets starting Q1 2027.

Applications of Physical AI are at a critical stage of transitioning from mass deployment to scaled growth. XPENG is fully committed to advancing the mass deployment and global expansion of its major Physical AI applications, the VLA 2.0, Robotaxi, and Humanoid Robots, to continuously create greater value for users worldwide.

Appendix: XPENG World Model Related Academic Papers
X-World Paper: https://arxiv.org/pdf/2603.19979
X-World Official Site: https://x-world-1.github.io/
X-Cache Paper: https://arxiv.org/abs/2604.20289
X-Cache Official Site: https://x-cache-1.github.io/en/
X-Foresight Paper: https://arxiv.org/abs/2605.24892
X-Foresight Official Site: https://x-foresight-1.github.io/en/

---
About XPENG
Founded in 2014, XPENG is a leading Chinese AI-driven mobility company that designs, develops, manufactures, and markets Smart EVs, catering to a growing base of tech-savvy consumers. With the rapid advancement of AI, XPENG aspires to become a global leader in AI mobility, with a mission to drive the Smart EV revolution through cutting-edge technology, shaping the future of mobility.
To enhance the customer experience, XPENG develops its full-stack advanced driver-assistance system (ADAS) technology and intelligent in-car operating system in-house, along with core vehicle systems such as the powertrain and electrical/electronic architecture (EEA). Headquartered in Guangzhou, China, XPENG also operates key offices in Beijing, Shanghai, Silicon Valley, and Amsterdam. Its Smart EVs are primarily manufactured at its facilities in Zhaoqing and Guangzhou, Guangdong province.
XPENG is listed on the New York Stock Exchange (NYSE: XPEV) and Hong Kong Exchange (HKEX: 9868).
For more information, please visit https://www.xpeng.com/.

Contacts:
For Media Enquiries:
XPENG PR Department
Email: pr@xiaopeng.com
Source: XPENG

Press Contact