Winning the Waymo Sim Agents Challenge

Fri Aug 09 2024661 views

We love open science and leaderboards. Congratulations to Wei Wu, Xiaoxin Feng, Ziyan Gao, Yuheng Kan of SenseTime Research and Tsinghua University, the "winners" of the 2024 Waymo Sim Agents Challenge.

We at inverted.ai would love to have participated, but, Waymo's license terms continue to be a problem. Here's the critical bit of the Sim Agent Challenge license terms:

License to Submissions and Technical Reports: Upon participating in the WOD Challenges, submitting anything to the leaderboard, or submitting (or being a member of a Team whose Team Leader (defined below) submits) anything to the leaderboard, You grant to Sponsor a perpetual, irrevocable, royalty-free, worldwide, nonexclusive license under Your intellectual property rights to make, use, import, publish, reproduce, display, perform, distribute, adapt, edit, modify, translate, and create derivative works based upon Your Submission (including any docker image and supporting code, e.g., to enable automated evaluation) and Technical Report, or any portion thereof (including Your name and likeness as shown and conveyed in the Submission or Technical Report), and any works, products and services that incorporate the foregoing or combine the foregoing with other Submissions and Technical Reports, or portions thereof, in any manner, in connection with the WOD Challenges and for other advertising, marketing, promotional, commercial, and business, and educational purposes. For the avoidance of doubt, the license above includes the right for Sponsor to sell, offer for sale, or sublicense Sponsor's works, products or services, even though such works, products, or services may combine, incorporate, or otherwise process Your Submission, Technical Report, and the intellectual property rights therein, in connection with the activities above.

These make it suicidal for companies like ours to participate (and not ideal for academic teams hoping to commercialize their work). This means people can be pretty strongly misled about the current state of AV sim agent progress and inefficiently misallocate resources as a result.

In various discussions we have been asked to show how the models behind our DRIVE API stack up according to the Sim Agent Challenge metrics. So, to avoid license issues, we reimplemented the metrics and even the then winning SMART model as faithfully as possible ourselves.

Here is what the leader board would have looked like if we had participated:

Method	Realism Meta Metric	Kinematic Metrics	Interactive Metrics	Map-based Metrics	minADE
inverted.ai	0.7687 (0.80)	0.6029	0.8461	0.7638	1.4678
Fdriver-tint	0.7584	0.4614	0.8069	0.8658	1.4475
SMART-large	0.7564	0.4769	0.7986	0.8618	1.5501
SMART	0.7511	0.4445	0.8050	0.8571	1.5447

To be transparent this isn't an apples to apples comparison. Here we ran our own implementation of the Waymo metrics and evaluated using our own data. Our implementation of said metrics is as faithful as possible but could have bugs. It also differs intentionally in one way - computing distances to centerline rather than road edges. In this table our numbers are on our data. Others are on Waymo data. License issues all over the place.

It must be noted here that our data is substantially more diverse than the Waymo data, intentionally sourced from all around the world. If we restrict ourselves to North America data for Realism Meta Metric evaluation our score actually goes up to 0.80!

So, did we win or not?

After implementing SMART we did do apples to apples on our data and we see pretty much the same trend with raw values fairly well calibrated to evaluations on Waymo data.

Model	Realism Meta Metric	Kinematic Metric	Interactive Metric	Map-based Metric	Collision Rate
inverted.ai	0.80	0.64	0.87	0.81	0.003
SMART	0.76	0.58	0.79	0.83	0.098

So it looks like we handily would have won, and, notable here, with a collision rate that is a order of magnitude better. Of course our implementation of SMART might not be perfect, but, we tried hard to get it right. It is filled with ideas that we've tried, considered, and gone beyond in our own work. But, if it worked crazily well we would have happily switched over. It didn't. Maybe someday.

Here are a couple example gifs on CARLA maps to compare:

inverted.ai ITRA / DRIVE vs. SMART

Academic users get free access to our models through our API and can participate in a grant program for access to large numbers of calls. Signup and a small amount of usage is free for commercial users, otherwise terms are simple and cost efficient. Be in touch if you want your simulation environments to actually work for you.