From Reality to Robustness: Inverted AI Leverages NVIDIA Cosmos and AV 2.0 Innovations
Wed May 28 2025196 viewsAt Inverted AI, our mission is to push autonomous driving beyond today's limitations—and into a future where training data isn’t just real, but realistic, controllable, and complete. As AV 2.0 demands better generalization and robustness than ever before, we’re proud to announce our strategic use of NVIDIA Cosmos to generate the next generation of AV training data controlled by our world-class behavior models.
Real Data Meets Synthetic Control
While many AV companies still chase diminishing returns from billions of human-driven miles, we understand that both coverage and realistic solutions to critical scenarios are essential. Inverted AI’s generative AI behavior models—trained on a proprietary in-house data moat of over a million miles of high-precision, nearly noise-free driving logs collected via drones flown in 30+ countries around the world—represents a strategically optimal path to finding models that generalize in human-like ways in terms of generating realistic, human-like solutions to critical driving scenarios. The key is generalization. Inverted AI's object-list representation models make it possible to find models that generalize in a way that produces safe, human-like, dynamically achievable solutions to complex edge cases, rare interactions, and difficult scenarios.
Enter Cosmos: NVIDIA's composable simulation stack and platform of openly available world foundation models. Cosmos Transfer model enables controlled generation of complex, photorealistic scenes with semantic fidelity and physics-aware structure. Integrated with Inverted AI's models, it allows us to condition scenarios not only on camera-level inputs but also on AV 1.0-style controllable object-list representations—precisely the domain of our greatest strengths.
HD Maps Without Borders
The Cosmos-based workflow involves using Inverted AI's Imagining The Road Ahead™(ITRA) model to produce simulated scenes in CARLA—which, thanks to the efforts of German Ros and his team—is natively integrated via Inverted AI APIs. This pathway allows for rich, multi-modal control over scene generation—combining and benefiting from segmentation, depth, edges, and 3D structure. Here we use our INITIALIZE and DRIVE apis along with CARLA and our scenario builder tool to generate a NCAP++ scenario solution. Here, the depth video and a text prompt is provided to Cosmos Transfer to generate the two AV 2.0 training data instances.
A key next-generation enabler of Cosmos workflows is the recent HD map transfer from NVIDIA Research. This framework allows for HD map control over scene generation—requiring only rough 3D structure of objects and lane markers. This allows synthetic data reflecting real-world driving patterns to be generated in AV 1.0 representations where control is more straightforward, and more importantly, rewards are easy to define and compute. At Inverted AI we have more than a million miles of this kind of log data which we can render in a manner compatible with HD map transfer.
We also have the ability to generate and solve an effectively unlimited number of safety critical scenarios and represent them in this way. Doing this directly leverages the thousands of labeled HD maps with complete traffic light annotations in our internal dataset and the ability of our deep generative behavior models to solve complex safety critical scenarios in all of those locations and more.
While this particular variant of video generative style transfer is not yet officially supported, there is plenty of reason to be optimistic about where this is all headed. Prompted with HD Map and a text prompt like "The weather is rainy with a dense, overcast sky that casts a uniform, cool light over the scene. Modern cars move steadily along the smooth road, which is punctuated by visible cracks in the pavement. In the low areas, puddles form distinct mirror-like reflections of the gray sky. There is a green traffic light." Here is Inverted AI driving all the agents in a Cosmos rendering of a safety critical set of NCAP++ scenarios.
Simulating What Really Matters
Behavior is the bottleneck. That’s why Inverted AI has incorporated both ITRA (above via the Inverted AI DRIVE endpoint) and DJINN, a diffusion-based joint trajectory generator that allows conditional rollout of multiple-agent behaviors under diverse constraints (arXiv:2309.12508). Want to evaluate NCAP++ scenarios with truck cut-ins, unprotected left turns, or vulnerable road users? These models enable realistic, diverse, and actionable behavioral generation - ITRA reactively for closed-loop V&V; DJINN for offline AV 2.0 synthetic data generation.
We believe very strongly that the evidence from rejection-sampling-fine-tuning (TITRATED (OpenReview 2024) in the AV context) combined with hyper-perceptually-realistic synthetic AV 2.0 data controlled by human-like, safe data-driven planners is the solution path for AV 2.0 to progress.
If Billions of Miles Were Enough, AVs Would Be Safe by Now
Tesla has famously logged billions of real-world miles. So why do we still see disengagements and seriously problematic behavior? The answer is simple: even with that amount of data, one-in-a-million scenarios don't occur that often. Real-world logs can't economically guarantee coverage of edge cases, generalization to new cities, or robustness to novel interactions. The capital expenditure required to get to Tesla-level data collection is phenomenal and scaling this to cover everything is next to impossible. NVIDIA Cosmos + Inverted AI behavior models solve the last mile of end-to-end AV 2.0 challenge, providing rich, safe, detailed solutions to all of the problematic one in a million scenarios, filling in and balancing the missing edge-case end-to-end AV 2.0 data hole.
In short, our ITRA-powered, Cosmos-enhanced, DJINN-instructed simulators generate precisely the scenarios needed: rare, unsafe, high-stakes, or legally critical. We're not chasing miles—we provide solutions—the right behaviors at the right times.
AV 2.0 Is Not About More Data. It's About Simulating Good and Safe Behaviors in Uncommon and Safety-Critical Situations.
Inverted AI’s deep generative behavior models are not mere predictors or even planners. They are the toolchain for producing diverse, human-plausible, safety-critical behavior patterns, grounded in physical realism and social context.
We believe that ITRA + Cosmos + DJINN sets the new gold standard for AV 2.0’s need for intent-aware, controllable, high-quality datasets,
And we’re just getting started.