A deep learning framework to estimate the pose of robotic arms and predict their movements
Top of the Shops: Smart Warehouses & Evolving E-Commerce
Top of the Shops: Smart Warehouses & Evolving E-Commerce
Researchers release open-source photorealistic simulator for autonomous driving

VISTA 2.0 is an open-source simulation engine that can make realistic environments for training and testing self-driving cars. Credits: Image courtesy of MIT CSAIL.
By Rachel Gordon | MIT CSAIL
Hyper-realistic virtual worlds have been heralded as the best driving schools for autonomous vehicles (AVs), since they’ve proven fruitful test beds for safely trying out dangerous driving scenarios. Tesla, Waymo, and other self-driving companies all rely heavily on data to enable expensive and proprietary photorealistic simulators, since testing and gathering nuanced I-almost-crashed data usually isn’t the most easy or desirable to recreate.
To that end, scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) created “VISTA 2.0,” a data-driven simulation engine where vehicles can learn to drive in the real world and recover from near-crash scenarios. What’s more, all of the code is being open-sourced to the public.
“Today, only companies have software like the type of simulation environments and capabilities of VISTA 2.0, and this software is proprietary. With this release, the research community will have access to a powerful new tool for accelerating the research and development of adaptive robust control for autonomous driving,” says MIT Professor and CSAIL Director Daniela Rus, senior author on a paper about the research.
VISTA is a data-driven, photorealistic simulator for autonomous driving. It can simulate not just live video but LiDAR data and event cameras, and also incorporate other simulated vehicles to model complex driving situations. VISTA is open source and the code can be found here.
VISTA 2.0 builds off of the team’s previous model, VISTA, and it’s fundamentally different from existing AV simulators since it’s data-driven — meaning it was built and photorealistically rendered from real-world data — thereby enabling direct transfer to reality. While the initial iteration supported only single car lane-following with one camera sensor, achieving high-fidelity data-driven simulation required rethinking the foundations of how different sensors and behavioral interactions can be synthesized.
Enter VISTA 2.0: a data-driven system that can simulate complex sensor types and massively interactive scenarios and intersections at scale. With much less data than previous models, the team was able to train autonomous vehicles that could be substantially more robust than those trained on large amounts of real-world data.
“This is a massive jump in capabilities of data-driven simulation for autonomous vehicles, as well as the increase of scale and ability to handle greater driving complexity,” says Alexander Amini, CSAIL PhD student and co-lead author on two new papers, together with fellow PhD student Tsun-Hsuan Wang. “VISTA 2.0 demonstrates the ability to simulate sensor data far beyond 2D RGB cameras, but also extremely high dimensional 3D lidars with millions of points, irregularly timed event-based cameras, and even interactive and dynamic scenarios with other vehicles as well.”
The team was able to scale the complexity of the interactive driving tasks for things like overtaking, following, and negotiating, including multiagent scenarios in highly photorealistic environments.
Training AI models for autonomous vehicles involves hard-to-secure fodder of different varieties of edge cases and strange, dangerous scenarios, because most of our data (thankfully) is just run-of-the-mill, day-to-day driving. Logically, we can’t just crash into other cars just to teach a neural network how to not crash into other cars.
Recently, there’s been a shift away from more classic, human-designed simulation environments to those built up from real-world data. The latter have immense photorealism, but the former can easily model virtual cameras and lidars. With this paradigm shift, a key question has emerged: Can the richness and complexity of all of the sensors that autonomous vehicles need, such as lidar and event-based cameras that are more sparse, accurately be synthesized?
Lidar sensor data is much harder to interpret in a data-driven world — you’re effectively trying to generate brand-new 3D point clouds with millions of points, only from sparse views of the world. To synthesize 3D lidar point clouds, the team used the data that the car collected, projected it into a 3D space coming from the lidar data, and then let a new virtual vehicle drive around locally from where that original vehicle was. Finally, they projected all of that sensory information back into the frame of view of this new virtual vehicle, with the help of neural networks.
Together with the simulation of event-based cameras, which operate at speeds greater than thousands of events per second, the simulator was capable of not only simulating this multimodal information, but also doing so all in real time — making it possible to train neural nets offline, but also test online on the car in augmented reality setups for safe evaluations. “The question of if multisensor simulation at this scale of complexity and photorealism was possible in the realm of data-driven simulation was very much an open question,” says Amini.
With that, the driving school becomes a party. In the simulation, you can move around, have different types of controllers, simulate different types of events, create interactive scenarios, and just drop in brand new vehicles that weren’t even in the original data. They tested for lane following, lane turning, car following, and more dicey scenarios like static and dynamic overtaking (seeing obstacles and moving around so you don’t collide). With the multi-agency, both real and simulated agents interact, and new agents can be dropped into the scene and controlled any which way.
Taking their full-scale car out into the “wild” — a.k.a. Devens, Massachusetts — the team saw immediate transferability of results, with both failures and successes. They were also able to demonstrate the bodacious, magic word of self-driving car models: “robust.” They showed that AVs, trained entirely in VISTA 2.0, were so robust in the real world that they could handle that elusive tail of challenging failures.
Now, one guardrail humans rely on that can’t yet be simulated is human emotion. It’s the friendly wave, nod, or blinker switch of acknowledgement, which are the type of nuances the team wants to implement in future work.
“The central algorithm of this research is how we can take a dataset and build a completely synthetic world for learning and autonomy,” says Amini. “It’s a platform that I believe one day could extend in many different axes across robotics. Not just autonomous driving, but many areas that rely on vision and complex behaviors. We’re excited to release VISTA 2.0 to help enable the community to collect their own datasets and convert them into virtual worlds where they can directly simulate their own virtual autonomous vehicles, drive around these virtual terrains, train autonomous vehicles in these worlds, and then can directly transfer them to full-sized, real self-driving cars.”
Amini and Wang wrote the paper alongside Zhijian Liu, MIT CSAIL PhD student; Igor Gilitschenski, assistant professor in computer science at the University of Toronto; Wilko Schwarting, AI research scientist and MIT CSAIL PhD ’20; Song Han, associate professor at MIT’s Department of Electrical Engineering and Computer Science; Sertac Karaman, associate professor of aeronautics and astronautics at MIT; and Daniela Rus, MIT professor and CSAIL director. The researchers presented the work at the IEEE International Conference on Robotics and Automation (ICRA) in Philadelphia.
This work was supported by the National Science Foundation and Toyota Research Institute. The team acknowledges the support of NVIDIA with the donation of the Drive AGX Pegasus.
Robohub 2022-06-21 20:22:10

In this episode, Audrow Nash speaks to Maria Telleria, who is a co-founder and the CTO of Canvas. Canvas makes a drywall finishing robot and is based in the Bay Area. In this interview, Maria talks about Canvas’s drywall finishing robot, how Canvas works with unions, Canvas’s business model, and about her career path.
Episode Links
Podcast info
Engineers devise a recipe for improving any autonomous robotic system
Robots found to turn racist and sexist with flawed AI
Observing Arctic marine life, from the seabed to space
Emergency-response drones to save lives in the digital skies
Robotic lightning bugs take flight
Automate 2022 Exceeds Expectations with Largest Attendance and Exhibitor Count Ever
Coffee with a Researcher (#ICRA2022)

As part of her role as one of the IEEE ICRA 2022 Science Communication Awardees, Avie Ravendran sat down virtually with a few researchers from academia and industry attending the conference. Curious about what they had to say? Read their quotes below!
“I really believe that learned methods, especially imitation and transfer learning, will enable scalable robot applications in human and unstructured environments We’re on the cusp of seeing robot agents dynamically adapt and solve real world problems”
– Nicholas Nadeau, CTO, Halodi Robotics
“On one hand I think that the interplay of perception and control is quite exciting, in terms of the common underlying principles, while on the other, it’s both cool and inspiring to see more robots getting out of the lab”
– Matías Mattamala, PhD Student, Oxford Dynamic Robot Systems, Oxford Robotics Institute
“I believe that incorporating priors regarding the existing scene geometry and the temporal consistency that’s present in the context of mobile robotics, can be used to guide the learning of more robust representations”
– Kavisha Vidanapathirana, QUT & CSIRORobotics
“At the moment, I am aiming to find out what researchers need in order to take care of their motivation and wellbeing”
– Daniel Carrillo-Zapata, Founder, Scientific Agitation
“We have an immense amount of unsupervised knowledge and we’re always updating our priors. Taking advantage of large-scale unsupervised pretraining and having a lifelong learning system seems like a significant step in the right direction”
– Nitish Dashora, Researcher, Berkeley AI Research & Redwood Center for Theoretical Neuroscience
“When objects are in clutter, with various objects lying on top of one another, the robot needs to interactively and autonomously rearrange the scene in order to retrieve the pose of the target object with minimal number of actions to achieve overall efficiency. I work on pose estimation algorithms to process dense visual data as well as sparse tactile data”
– Prajval Kumar, BMW & University of Glasgow
“Thinking of why the robots or even the structures behave the way they do, and framing and answering questions in that line satisfies my curiosity as a researcher”
– Tung Ta, Postdoctoral Researcher, The University of Tokyo
“I sometimes hear that legged locomotion is a solved problem, but I disagree. I think that the standards of performance have just been raised and collectively we can now tackle more dynamic, efficient and reliable gaits”
– Kevin Green, PhD Candidate, Oregon State University
“My goal in robotics research is to bring down the cost and improve the capabilities of marine research platforms by introducing modularity and underactuation into the field. We’re working on understanding how to bring our collective swimming technology into flowing environments now”
– Gedaliah Knizhnik, PhD Candidate, GRASP Laboratory & The modular robotics laboratory, University of Pennsylvania
“I am interested in how we can develop the algorithms and representations needed to enable long-term autonomous robot navigation without human intervention, such as in the case of an autonomous underwater robot persistently mapping a marine ecosystem for an extended period of time. There are lots of challenges like how can we build a compact representation of the world, ideally grounded in human-understandable semantics? How can we deal gracefully with outliers in perception that inevitably occur in the lifelong setting? and also how can we scale robot state estimation methods in time and space while bounding memory and computation requirements?”
– Kevin Doherty, Computer Science and AI Lab, MIT & Woods Hole Oceanographic Institution
“How can robots learn to interact with and reason about themselves and the world without an intuitive feel for either? Communication is at the heart of biological and robotic systems. Inspired by control theory, information theory, and neuroscience, early work in artificial intelligence (AI) and robotics focused on a class of dynamical system known as feedback systems. These systems are characterized by recurrent mechanisms or feedback loops that govern, regulate, or ‘steer’ the behaviour of the system toward desirable stable states in the presence of disturbance in diverse environments. Feedback between sensation, prediction, decision, action, and back is a critical component of sensorimotor learning needed to realize robust intelligent robotic systems in the wild, a grand challenge of the field. Existing robots are fundamentally numb to the world, limiting their ability to sense themselves and their environment. This problem will only increase as robots grow in complexity, dexterity, and maneuverability, guided by biomimicry. Feedback control systems such as the proportional integral derivative (PID), reinforcement learning (RL), and model predictive control (MPC) are now common in robotics, as is (optimal, Bayesian) Kálmán filtering of point-based IMU-GPS signals. Lacking are the distributed multi-modal, high-dimensional sensations needed to realize general intelligent behaviour, executing complex action sequences through high-level abstractions built up from an intuitive feel or understanding of physics.While the central nervous system and biological neural networks are quantum parallel distributed processing (PDP) engines, most digital artificial neural networks are fully decoupled from sensors and provide only a passive image of the world. We are working to change that by coupling parallel distributed sensing and data processing through a neural paradigm. This involves innovations in hardware, software, and datasets. At Nervosys, we aim to make this dream a reality by building the first nervous system and platform for general robotic intelligence.”
– Adam Erickson, Founder, Nervosys