Updated: Jun 22
Inverted AI researchers have released a new paper entitled "Critic Sequential Monte Carlo" on arXiv (also under review at NeurIPS 2022) on new way of using a critic (a function that estimates the future expected reward) within sequential Monte Carlo (SMC) to efficiently solve very hard planning as inference problems like avoiding infractions in self-driving car applications.
On the left we show ITRA controlling all agents in the scene. Focus on the "ego" agent with the yellow dot. For whatever reason ITRA does a bad job in this particular instance and decides to rear-end the leading vehicle. On the right we show samples from the ITRA stochastic policy over time. These are the squiggly black lines. They indicate the distribution of realistic human-like acceleration and steering behaviors at that instant in time for the "ego" vehicle. The critic, a kind of back-seat-driver in this case, is evaluated over the whole action space given the state (part of which is the overhead map view shown). Bright yellow regions of action space lead to high reward trajectories. Darker colors are bad. In this scenario the critic is shouting brake! and then, at the perfect point in time, says turn any way you like.
The paper is about how to combine, in general, a behavioral prior with a critic in a particular way that is amenable to model-free reinforcement learning and efficient generation of high reward trajectories. With ITRA as the prior and a suitably trained critic, CriticSMC enables rapid test-time generation of infraction-free trajectories such as those shown in the center plots. The two different trajectories taken correspond to behavior diversity remaining in the SMC-computed planning as inference implicit policy reflecting infraction-free but still human-like trajectories.
More examples follow. In all ITRA, the prior, causes an infraction to occur. Acting under the CriticSMC planning as inference policy starting from the same initial configuration avoids these infractions.
CriticSMC is an integral part of Inverted AI's soon to be released "Drive" cloud API product and helps ensure low NPC infraction rates even in novel environments.
As part of this work the research team also made up a synthetic environment in which an agent tries to get to a goal location without hitting a wall or being hit by evil NPCs. The results from employing CriticSMC in this environment are pretty cool.
The "ego" agent is in green, the goal is in blue and the evil NPCs are in red. The default policy here is a random walk with a drift. Some of the emergent behavior in this adversarial environment is pretty cool. Replanning was not done in these examples; instead these are single samples from the planning as inference target distribution, inferred using CriticSMC once at the outset.