Page 7 of 12
1 5 6 7 8 9 12

Tiny motor can “walk” to carry out tasks

This walking microrobot was built by the MIT team from a set of just five basic parts, including a coil, a magnet, and stiff and flexible structural pieces.
Photo by Will Langford

By David L. Chandler

Years ago, MIT Professor Neil Gershenfeld had an audacious thought. Struck by the fact that all the world’s living things are built out of combinations of just 20 amino acids, he wondered: Might it be possible to create a kit of just 20 fundamental parts that could be used to assemble all of the different technological products in the world?

Gershenfeld and his students have been making steady progress in that direction ever since. Their latest achievement, presented this week at an international robotics conference, consists of a set of five tiny fundamental parts that can be assembled into a wide variety of functional devices, including a tiny “walking” motor that can move back and forth across a surface or turn the gears of a machine.

Previously, Gershenfeld and his students showed that structures assembled from many small, identical subunits can have numerous mechanical properties. Next, they demonstrated that a combination of rigid and flexible part types can be used to create morphing airplane wings, a longstanding goal in aerospace engineering. Their latest work adds components for movement and logic, and will be presented at the International Conference on Manipulation, Automation and Robotics at Small Scales (MARSS) in Helsinki, Finland, in a paper by Gershenfeld and MIT graduate student Will Langford.

Their work offers an alternative to today’s approaches to contructing robots, which largely fall into one of two types: custom machines that work well but are relatively expensive and inflexible, and reconfigurable ones that sacrifice performance for versatility. In the new approach, Langford came up with a set of five millimeter-scale components, all of which can be attached to each other by a standard connector. These parts include the previous rigid and flexible types, along with electromagnetic parts, a coil, and a magnet. In the future, the team plans to make these out of still smaller basic part types.

Using this simple kit of tiny parts, Langford assembled them into a novel kind of motor that moves an appendage in discrete mechanical steps, which can be used to turn a gear wheel, and a mobile form of the motor that turns those steps into locomotion, allowing it to “walk” across a surface in a way that is reminiscent of the molecular motors that move muscles. These parts could also be assembled into hands for gripping, or legs for walking, as needed for a particular task, and then later reassembled as those needs change. Gershenfeld refers to them as “digital materials,” discrete parts that can be reversibly joined, forming a kind of functional micro-LEGO.

The new system is a significant step toward creating a standardized kit of parts that could be used to assemble robots with specific capabilities adapted to a particular task or set of tasks. Such purpose-built robots could then be disassembled and reassembled as needed in a variety of forms, without the need to design and manufacture new robots from scratch for each application.

Langford’s initial motor has an ant-like ability to lift seven times its own weight. But if greater forces are required, many of these parts can be added to provide more oomph. Or if the robot needs to move in more complex ways, these parts could be distributed throughout the structure. The size of the building blocks can be chosen to match their application; the team has made nanometer-sized parts to make nanorobots, and meter-sized parts to make megarobots. Previously, specialized techniques were needed at each of these length scale extremes.

“One emerging application is to make tiny robots that can work in confined spaces,” Gershenfeld says. Some of the devices assembled in this project, for example, are smaller than a penny yet can carry out useful tasks.

To build in the “brains,” Langford has added part types that contain millimeter-sized integrated circuits, along with a few other part types to take care of connecting electrical signals in three dimensions.

The simplicity and regularity of these structures makes it relatively easy for their assembly to be automated. To do that, Langford has developed a novel machine that’s like a cross between a 3-D printer and the pick-and-place machines that manufacture electronic circuits, but unlike either of those, this one can produce complete robotic systems directly from digital designs. Gershenfeld says this machine is a first step toward to the project’s ultimate goal of “making an assembler that can assemble itself out of the parts that it’s assembling.”

Study: Social robots can benefit hospitalized children

A new study by researchers from MIT, Boston Children’s Hospital, and elsewhere shows that a “social robot,” named Huggable (pictured), can be used in support sessions to boost positive emotions in hospitalized children.
Image: Courtesy of the Personal Robots Group, MIT Media Lab

A new study demonstrates, for the first time, that “social robots” used in support sessions held in pediatric units at hospitals can lead to more positive emotions in sick children.

Many hospitals host interventions in pediatric units, where child life specialists will provide clinical interventions to hospitalized children for developmental and coping support. This involves play, preparation, education, and behavioral distraction for both routine medical care, as well as before, during, and after difficult procedures. Traditional interventions include therapeutic medical play and normalizing the environment through activities such as arts and crafts, games, and celebrations.

For the study, published today in the journal Pediatrics, researchers from the MIT Media Lab, Boston Children’s Hospital, and Northeastern University deployed a robotic teddy bear, “Huggable,” across several pediatric units at Boston Children’s Hospital. More than 50 hospitalized children were randomly split into three groups of interventions that involved Huggable, a tablet-based virtual Huggable, or a traditional plush teddy bear. In general, Huggable improved various patient outcomes over those other two options.  

The study primarily demonstrated the feasibility of integrating Huggable into the interventions. But results also indicated that children playing with Huggable experienced more positive emotions overall. They also got out of bed and moved around more, and emotionally connected with the robot, asking it personal questions and inviting it to come back later to meet their families. “Such improved emotional, physical, and verbal outcomes are all positive factors that could contribute to better and faster recovery in hospitalized children,” the researchers write in their study.

Although it is a small study, it is the first to explore social robotics in a real-world inpatient pediatric setting with ill children, the researchers say. Other studies have been conducted in labs, have studied very few children, or were conducted in public settings without any patient identification.

But Huggable is designed only to assist health care specialists — not replace them, the researchers stress. “It’s a companion,” says co-author Cynthia Breazeal, an associate professor of media arts and sciences and founding director of the Personal Robots group. “Our group designs technologies with the mindset that they’re teammates. We don’t just look at the child-robot interaction. It’s about [helping] specialists and parents, because we want technology to support everyone who’s invested in the quality care of a child.”

“Child life staff provide a lot of human interaction to help normalize the hospital experience, but they can’t be with every kid, all the time. Social robots create a more consistent presence throughout the day,” adds first author Deirdre Logan, a pediatric psychologist at Boston Children’s Hospital. “There may also be kids who don’t always want to talk to people, and respond better to having a robotic stuffed animal with them. It’s exciting knowing what types of support we can provide kids who may feel isolated or scared about what they’re going through.”

Joining Breazeal and Logan on the paper are: Sooyeon Jeong, a PhD student in the Personal Robots group; Brianna O’Connell, Duncan Smith-Freedman, and Peter Weinstock, all of Boston Children’s Hospital; and Matthew Goodwin and James Heathers, both of Northeastern University.

Boosting mood

First prototyped in 2006, Huggable is a plush teddy bear with a screen depicting animated eyes. While the eventual goal is to make the robot fully autonomous, it is currently operated remotely by a specialist in the hall outside a child’s room. Through custom software, a specialist can control the robot’s facial expressions and body actions, and direct its gaze. The specialists could also talk through a speaker — with their voice automatically shifted to a higher pitch to sound more childlike — and monitor the participants via camera feed. The tablet-based avatar of the bear had identical gestures and was also remotely operated.

During the interventions involving Huggable — involving kids ages 3 to 10 years — a specialist would sing nursery rhymes to younger children through robot and move the arms during the song. Older kids would play the I Spy game, where they have to guess an object in the room described by the specialist through Huggable.  

Through self-reports and questionnaires, the researchers recorded how much the patients and families liked interacting with Huggable. Additional questionnaires assessed patient’s positive moods, as well as anxiety and perceived pain levels. The researchers also used cameras mounted in the child’s room to capture and analyze speech patterns, characterizing them as joyful or sad, using software.

A greater percentage of children and their parents reported that the children enjoyed playing with Huggable more than with the avatar or traditional teddy bear. Speech analysis backed up that result, detecting significantly more joyful expressions among the children during robotic interventions. Additionally, parents noted lower levels of perceived pain among their children.

The researchers noted that 93 percent of patients completed the Huggable-based interventions, and found few barriers to practical implementation, as determined by comments from the specialists.

A previous paper based on the same study found that the robot also seemed to facilitate greater family involvement in the interventions, compared to the other two methods, which improved the intervention overall. “Those are findings we didn’t necessarily expect in the beginning,” says Jeong, also a co-author on the previous paper. “We didn’t tell family to join any of the play sessions — it just happened naturally. When the robot came in, the child and robot and parents all interacted more, playing games or in introducing the robot.”

An automated, take-home bot

The study also generated valuable insights for developing a fully autonomous Huggable robot, which is the researchers’ ultimate goal. They were able to determine which physical gestures are used most and least often, and which features specialists may want for future iterations. Huggable, for instance, could introduce doctors before they enter a child’s room or learn a child’s interests and share that information with specialists. The researchers may also equip the robot with computer vision, so it can detect certain objects in a room to talk about those with children.

“In these early studies, we capture data … to wrap our heads around an authentic use-case scenario where, if the bear was automated, what does it need to do to provide high-quality standard of care,” Breazeal says.

In the future, that automated robot could be used to improve continuity of care. A child would take home a robot after a hospital visit to further support engagement, adherence to care regimens, and monitoring well-being.

“We want to continue thinking about how robots can become part of the whole clinical team and help everyone,” Jeong says. “When the robot goes home, we want to see the robot monitor a child’s progress. … If there’s something clinicians need to know earlier, the robot can let the clinicians know, so [they’re not] surprised at the next appointment that the child hasn’t been doing well.”

Next, the researchers are hoping to zero in on which specific patient populations may benefit the most from the Huggable interventions. “We want to find the sweet spot for the children who need this type of of extra support,” Logan says.

Spotting objects amid clutter

Robots currently attempt to identify objects in a point cloud by comparing a template object — a 3-D dot representation of an object, such as a rabbit — with a point cloud representation of the real world that may contain that object.
Image: Christine Daniloff, MIT

A new MIT-developed technique enables robots to quickly identify objects hidden in a three-dimensional cloud of data, reminiscent of how some people can make sense of a densely patterned “Magic Eye” image if they observe it in just the right way.

Robots typically “see” their environment through sensors that collect and translate a visual scene into a matrix of dots. Think of the world of, well, “The Matrix,” except that the 1s and 0s seen by the fictional character Neo are replaced by dots — lots of dots — whose patterns and densities outline the objects in a particular scene.

Conventional techniques that try to pick out objects from such clouds of dots, or point clouds, can do so with either speed or accuracy, but not both.

With their new technique, the researchers say a robot can accurately pick out an object, such as a small animal, that is otherwise obscured within a dense cloud of dots, within seconds of receiving the visual data. The team says the technique can be used to improve a host of situations in which machine perception must be both speedy and accurate, including driverless cars and robotic assistants in the factory and the home.

“The surprising thing about this work is, if I ask you to find a bunny in this cloud of thousands of points, there’s no way you could do that,” says Luca Carlone, assistant professor of aeronautics and astronautics and a member of MIT’s Laboratory for Information and Decision Systems (LIDS). “But our algorithm is able to see the object through all this clutter. So we’re getting to a level of superhuman performance in localizing objects.”

Carlone and graduate student Heng Yang will present details of the technique later this month at the Robotics: Science and Systems conference in Germany.

“Failing without knowing”

Robots currently attempt to identify objects in a point cloud by comparing a template object — a 3-D dot representation of an object, such as a rabbit — with a point cloud representation of the real world that may contain that object. The template image includes “features,” or collections of dots that indicate characteristic curvatures or angles of that object, such the bunny’s ear or tail. Existing algorithms first extract similar features from the real-life point cloud, then attempt to match those features and the template’s features, and ultimately rotate and align the features to the template to determine if the point cloud contains the object in question.

But the point cloud data that streams into a robot’s sensor invariably includes errors, in the form of dots that are in the wrong position or incorrectly spaced, which can significantly confuse the process of feature extraction and matching. As a consequence, robots can make a huge number of wrong associations, or what researchers call “outliers” between point clouds, and ultimately misidentify objects or miss them entirely.

Carlone says state-of-the-art algorithms are able to sift the bad associations from the good once features have been matched, but they do so in “exponential time,” meaning that even a cluster of processing-heavy computers, sifting through dense point cloud data with existing algorithms, would not be able to solve the problem in a reasonable time. Such techniques, while accurate, are impractical for analyzing larger, real-life datasets containing dense point clouds.

Other algorithms that can quickly identify features and associations do so hastily, creating a huge number of outliers or misdetections in the process, without being aware of these errors.

“That’s terrible if this is running on a self-driving car, or any safety-critical application,” Carlone says. “Failing without knowing you’re failing is the worst thing an algorithm can do.”

A relaxed view

Yang and Carlone instead devised a technique that prunes away outliers in “polynomial time,” meaning that it can do so quickly, even for increasingly dense clouds of dots. The technique can thus quickly and accurately identify objects hidden in cluttered scenes.

The MIT-developed technique quickly and smoothly matches objects to those hidden in dense point clouds (left), versus existing techniques (right) that produce incorrect, disjointed matches. Gif: Courtesy of the researchers

The researchers first used conventional techniques to extract features of a template object from a point cloud. They then developed a three-step process to match the size, position, and orientation of the object in a point cloud with the template object, while simultaneously identifying good from bad feature associations.

The team developed an “adaptive voting scheme” algorithm to prune outliers and match an object’s size and position. For size, the algorithm makes associations between template and point cloud features, then compares the relative distance between features in a template and corresponding features in the point cloud. If, say, the distance between two features in the point cloud is five times that of the corresponding points in the template, the algorithm assigns a “vote” to the hypothesis that the object is five times larger than the template object.

The algorithm does this for every feature association. Then, the algorithm selects those associations that fall under the size hypothesis with the most votes, and identifies those as the correct associations, while pruning away the others.  In this way, the technique simultaneously reveals the correct associations and the relative size of the object represented by those associations. The same process is used to determine the object’s position.  

The researchers developed a separate algorithm for rotation, which finds the orientation of the template object in three-dimensional space.

To do this is an incredibly tricky computational task. Imagine holding a mug and trying to tilt it just so, to match a blurry image of something that might be that same mug. There are any number of angles you could tilt that mug, and each of those angles has a certain likelihood of matching the blurry image.

Existing techniques handle this problem by considering each possible tilt or rotation of the object as a “cost” — the lower the cost, the more likely that that rotation creates an accurate match between features. Each rotation and associated cost is represented in a topographic map of sorts, made up of multiple hills and valleys, with lower elevations associated with lower cost.

But Carlone says this can easily confuse an algorithm, especially if there are multiple valleys and no discernible lowest point representing the true, exact match between a particular rotation of an object and the object in a point cloud. Instead, the team developed a “convex relaxation” algorithm that simplifies the topographic map, with one single valley representing the optimal rotation. In this way, the algorithm is able to quickly identify the rotation that defines the orientation of the object in the point cloud.

With their approach, the team was able to quickly and accurately identify three different objects — a bunny, a dragon, and a Buddha — hidden in point clouds of increasing density. They were also able to identify objects in real-life scenes, including a living room, in which the algorithm quickly was able to spot a cereal box and a baseball hat.

Carlone says that because the approach is able to work in “polynomial time,” it can be easily scaled up to analyze even denser point clouds, resembling the complexity of sensor data for driverless cars, for example.

“Navigation, collaborative manufacturing, domestic robots, search and rescue, and self-driving cars is where we hope to make an impact,” Carlone says.

This research was supported in part by the Army Research Laboratory, the Office of Naval Research, and the Google Daydream Research Program.

Chip design drastically reduces energy needed to compute with light


A new photonic chip design drastically reduces energy needed to compute with light, with simulations suggesting it could run optical neural networks 10 million times more efficiently than its electrical counterparts.
Image: courtesy of the researchers, edited by MIT News

By Rob Matheson

MIT researchers have developed a novel “photonic” chip that uses light instead of electricity — and consumes relatively little power in the process. The chip could be used to process massive neural networks millions of times more efficiently than today’s classical computers do.

Neural networks are machine-learning models that are widely used for such tasks as robotic object identification, natural language processing, drug development, medical imaging, and powering driverless cars. Novel optical neural networks, which use optical phenomena to accelerate computation, can run much faster and more efficiently than their electrical counterparts.  

But as traditional and optical neural networks grow more complex, they eat up tons of power. To tackle that issue, researchers and major tech companies — including Google, IBM, and Tesla — have developed “AI accelerators,” specialized chips that improve the speed and efficiency of training and testing neural networks.

For electrical chips, including most AI accelerators, there is a theoretical minimum limit for energy consumption. Recently, MIT researchers have started developing photonic accelerators for optical neural networks. These chips perform orders of magnitude more efficiently, but they rely on some bulky optical components that limit their use to relatively small neural networks.

In a paper published in Physical Review X, MIT researchers describe a new photonic accelerator that uses more compact optical components and optical signal-processing techniques, to drastically reduce both power consumption and chip area. That allows the chip to scale to neural networks several orders of magnitude larger than its counterparts.

Simulated training of neural networks on the MNIST image-classification dataset suggest the accelerator can theoretically process neural networks more than 10 million times below the energy-consumption limit of traditional electrical-based accelerators and about 1,000 times below the limit of photonic accelerators. The researchers are now working on a prototype chip to experimentally prove the results.

“People are looking for technology that can compute beyond the fundamental limits of energy consumption,” says Ryan Hamerly, a postdoc in the Research Laboratory of Electronics. “Photonic accelerators are promising … but our motivation is to build a [photonic accelerator] that can scale up to large neural networks.”

Practical applications for such technologies include reducing energy consumption in data centers. “There’s a growing demand for data centers for running large neural networks, and it’s becoming increasingly computationally intractable as the demand grows,” says co-author Alexander Sludds, a graduate student in the Research Laboratory of Electronics. The aim is “to meet computational demand with neural network hardware … to address the bottleneck of energy consumption and latency.”

Joining Sludds and Hamerly on the paper are: co-author Liane Bernstein, an RLE graduate student; Marin Soljacic, an MIT professor of physics; and Dirk Englund, an MIT associate professor of electrical engineering and computer science, a researcher in RLE, and head of the Quantum Photonics Laboratory.  

Compact design

Neural networks process data through many computational layers containing interconnected nodes, called “neurons,” to find patterns in the data. Neurons receive input from their upstream neighbors and compute an output signal that is sent to neurons further downstream. Each input is also assigned a “weight,” a value based on its relative importance to all other inputs. As the data propagate “deeper” through layers, the network learns progressively more complex information. In the end, an output layer generates a prediction based on the calculations throughout the layers.

All AI accelerators aim to reduce the energy needed to process and move around data during a specific linear algebra step in neural networks, called “matrix multiplication.” There, neurons and weights are encoded into separate tables of rows and columns and then combined to calculate the outputs.

In traditional photonic accelerators, pulsed lasers encoded with information about each neuron in a layer flow into waveguides and through beam splitters. The resulting optical signals are fed into a grid of square optical components, called “Mach-Zehnder interferometers,” which are programmed to perform matrix multiplication. The interferometers, which are encoded with information about each weight, use signal-interference techniques that process the optical signals and weight values to compute an output for each neuron. But there’s a scaling issue: For each neuron there must be one waveguide and, for each weight, there must be one interferometer. Because the number of weights squares with the number of neurons, those interferometers take up a lot of real estate.

“You quickly realize the number of input neurons can never be larger than 100 or so, because you can’t fit that many components on the chip,” Hamerly says. “If your photonic accelerator can’t process more than 100 neurons per layer, then it makes it difficult to implement large neural networks into that architecture.”

The researchers’ chip relies on a more compact, energy efficient “optoelectronic” scheme that encodes data with optical signals, but uses “balanced homodyne detection” for matrix multiplication. That’s a technique that produces a measurable electrical signal after calculating the product of the amplitudes (wave heights) of two optical signals.

Pulses of light encoded with information about the input and output neurons for each neural network layer — which are needed to train the network — flow through a single channel. Separate pulses encoded with information of entire rows of weights in the matrix multiplication table flow through separate channels. Optical signals carrying the neuron and weight data fan out to grid of homodyne photodetectors. The photodetectors use the amplitude of the signals to compute an output value for each neuron. Each detector feeds an electrical output signal for each neuron into a modulator, which converts the signal back into a light pulse. That optical signal becomes the input for the next layer, and so on.

The design requires only one channel per input and output neuron, and only as many homodyne photodetectors as there are neurons, not weights. Because there are always far fewer neurons than weights, this saves significant space, so the chip is able to scale to neural networks with more than a million neurons per layer.

Finding the sweet spot

With photonic accelerators, there’s an unavoidable noise in the signal. The more light that’s fed into the chip, the less noise and greater the accuracy — but that gets to be pretty inefficient. Less input light increases efficiency but negatively impacts the neural network’s performance. But there’s a “sweet spot,” Bernstein says, that uses minimum optical power while maintaining accuracy.

That sweet spot for AI accelerators is measured in how many joules it takes to perform a single operation of multiplying two numbers — such as during matrix multiplication. Right now, traditional accelerators are measured in picojoules, or one-trillionth of a joule. Photonic accelerators measure in attojoules, which is a million times more efficient.

In their simulations, the researchers found their photonic accelerator could operate with sub-attojoule efficiency. “There’s some minimum optical power you can send in, before losing accuracy. The fundamental limit of our chip is a lot lower than traditional accelerators … and lower than other photonic accelerators,” Bernstein says.

Autonomous boats can target and latch onto each other


MIT researchers have given their fleet of autonomous “roboats” the ability to automatically target and clasp onto each other — and keep trying if they fail. The roboats are being designed to transport people, collect trash, and self-assemble into floating structures in the canals of Amsterdam.
Courtesy of the researchers

By Rob Matheson

The city of Amsterdam envisions a future where fleets of autonomous boats cruise its many canals to transport goods and people, collect trash, or self-assemble into floating stages and bridges. To further that vision, MIT researchers have given new capabilities to their fleet of robotic boats — which are being developed as part of an ongoing project — that lets them target and clasp onto each other, and keep trying if they fail.

About a quarter of Amsterdam’s surface area is water, with 165 canals winding alongside busy city streets. Several years ago, MIT and the Amsterdam Institute for Advanced Metropolitan Solutions (AMS Institute) teamed up on the “Roboat” project. The idea is to build a fleet of autonomous robotic boats — rectangular hulls equipped with sensors, thrusters, microcontrollers, GPS modules, cameras, and other hardware — that provides intelligent mobility on water to relieve congestion in the city’s busy streets.

One of project’s objectives is to create roboat units that provide on-demand transporation on waterways. Another objective is using the roboat units to automatically form “pop-up” structures, such as foot bridges, performance stages, or even food markets. The structures could then automatically disassemble at set times and reform into target structures for different activities. Additionally, the roboat units could be used as agile sensors to gather data on the city’s infrastructure, and air and water quality, among other things.

In 2016, MIT researchers tested a roboat prototype that cruised around Amsterdam’s canals, moving forward, backward, and laterally along a preprogrammed path. Last year, researchers designed low-cost, 3-D-printed, one-quarter scale versions of the boats, which were more efficient and agile, and came equipped with advanced trajectory-tracking algorithms. 

In a paper presented at the International Conference on Robotics and Automation, the researchers describe roboat units that can now identify and connect to docking stations. Control algorithms guide the roboats to the target, where they automatically connect to a customized latching mechanism with millimeter precision. Moreover, the roboat notices if it has missed the connection, backs up, and tries again.

The researchers tested the latching technique in a swimming pool at MIT and in the Charles River, where waters are rougher. In both instances, the roboat units were usually able to successfully connect in about 10 seconds, starting from around 1 meter away, or they succeeded after a few failed attempts. In Amsterdam, the system could be especially useful for overnight garbage collection. Roboat units could sail around a canal, locate and latch onto platforms holding trash containers, and haul them back to collection facilities.

“In Amsterdam, canals were once used for transportation and other things the roads are now used for. Roads near canals are now very congested — and have noise and pollution — so the city wants to add more functionality back to the canals,” says first author Luis Mateos, a graduate student in the Department of Urban Studies and Planning (DUSP) and a researcher in the MIT Senseable City Lab. “Self-driving technologies can save time, costs and energy, and improve the city moving forward.”

“The aim is to use roboat units to bring new capabilities to life on the water,” adds co-author Daniela Rus, director of the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science. “The new latching mechanism is very important for creating pop-up structures. Roboat does not need latching for autonomous transporation on water, but you need the latching to create any structure, whether it’s mobile or fixed.”

Joining Mateos on the paper are: Wei Wang, a joint postdoc in CSAIL and the Senseable City Lab; Banti Gheneti, a graduate student in the Department of Electrical Engineering and Computer Science; Fabio Duarte, a DUSP and Senseable City Lab research scientist; and Carlo Ratti, director of the Senseable City Lab and a principal investigator and professor of the practice in DUSP.

Making the connection

Each roboat is equipped with latching mechanisms, including ball and socket components, on its front, back, and sides. The ball component resembles a badminton shuttlecock — a cone-shaped, rubber body with a metal ball at the end. The socket component is a wide funnel that guides the ball component into a receptor. Inside the funnel, a laser beam acts like a security system that detects when the ball crosses into the receptor. That activates a mechanism with three arms that closes around and captures the ball, while also sending a feedback signal to both roboats that the connection is complete.

On the software side, the roboats run on custom computer vision and control techniques. Each roboat has a LIDAR system and camera, so they can autonomously move from point to point around the canals. Each docking station — typically an unmoving roboat — has a sheet of paper imprinted with an augmented reality tag, called an AprilTag, which resembles a simplified QR code. Commonly used for robotic applications, AprilTags enable robots to detect and compute their precise 3-D position and orientation relative to the tag.

Both the AprilTags and cameras are located in the same locations in center of the roboats. When a traveling roboat is roughly one or two meters away from the stationary AprilTag, the roboat calculates its position and orientation to the tag. Typically, this would generate a 3-D map for boat motion, including roll, pitch, and yaw (left and right). But an algorithm strips away everything except yaw. This produces an easy-to-compute 2-D plane that measures the roboat camera’s distance away and distance left and right of the tag. Using that information, the roboat steers itself toward the tag. By keeping the camera and tag perfectly aligned, the roboat is able to precisely connect.

The funnel compensates for any misalignment in the roboat’s pitch (rocking up and down) and heave (vertical up and down), as canal waves are relatively small. If, however, the roboat goes beyond its calculated distance, and doesn’t receive a feedback signal from the laser beam, it knows it has missed. “In challenging waters, sometimes roboat units at the current one-quarter scale, are not strong enough to overcome wind gusts or heavy water currents,” Mateos says. “A logic component on the roboat says, ‘You missed, so back up, recalculate your position, and try again.’”

Future iterations

The researchers are now designing roboat units roughly four times the size of the current iterations, so they’ll be more stable on water. Mateos is also working on an update to the funnel that includes tentacle-like rubber grippers that tighten around the pin — like a squid grasping its prey. That could help give the roboat units more control when, say, they’re towing platforms or other roboats through narrow canals.

In the works is also a system that displays the AprilTags on an LCD monitor that changes codes to signal multiple roboat units to assemble in a given order. At first, all roboat units will be given a code to stay exactly a meter apart. Then, the code changes to direct the first roboat to latch. After, the screen switches codes to order the next roboat to latch, and so on. “It’s like the telephone game. The changing code passes a message to one roboat at a time, and that message tells them what to do,” Mateos says.

Darwin Caldwell, the research director of Advanced Robotics at the Italian Institute of Technology, envisions even more possible applications for the autonomous latching capability. “I can certainly see this type of autonomous docking being of use in many areas of robotic ‘refuelling’ and docking … beyond aquatic/naval systems,” he says, “including inflight refuelling, space docking, cargo container handling, [and] robot in-house recharging.”

The research was funded by the AMS Institute and the City of Amsterdam.

Sensor-packed glove learns signatures of the human grasp

MIT researchers have developed a low-cost, sensor-packed glove that captures pressure signals as humans interact with objects. The glove can be used to create high-resolution tactile datasets that robots can leverage to better identify, weigh, and manipulate objects.
Image: Courtesy of the researchers
By Rob Matheson

Wearing a sensor-packed glove while handling a variety of objects, MIT researchers have compiled a massive dataset that enables an AI system to recognize objects through touch alone. The information could be leveraged to help robots identify and manipulate objects, and may aid in prosthetics design.

The researchers developed a low-cost knitted glove, called “scalable tactile glove” (STAG), equipped with about 550 tiny sensors across nearly the entire hand. Each sensor captures pressure signals as humans interact with objects in various ways. A neural network processes the signals to “learn” a dataset of pressure-signal patterns related to specific objects. Then, the system uses that dataset to classify the objects and predict their weights by feel alone, with no visual input needed.

In a paper published today in Nature, the researchers describe a dataset they compiled using STAG for 26 common objects — including a soda can, scissors, tennis ball, spoon, pen, and mug. Using the dataset, the system predicted the objects’ identities with up to 76 percent accuracy. The system can also predict the correct weights of most objects within about 60 grams.

Similar sensor-based gloves used today run thousands of dollars and often contain only around 50 sensors that capture less information. Even though STAG produces very high-resolution data, it’s made from commercially available materials totaling around $10.

The tactile sensing system could be used in combination with traditional computer vision and image-based datasets to give robots a more human-like understanding of interacting with objects.

“Humans can identify and handle objects well because we have tactile feedback. As we touch objects, we feel around and realize what they are. Robots don’t have that rich feedback,” says Subramanian Sundaram PhD ’18, a former graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL). “We’ve always wanted robots to do what humans can do, like doing the dishes or other chores. If you want robots to do these things, they must be able to manipulate objects really well.”

The researchers also used the dataset to measure the cooperation between regions of the hand during object interactions. For example, when someone uses the middle joint of their index finger, they rarely use their thumb. But the tips of the index and middle fingers always correspond to thumb usage. “We quantifiably show, for the first time, that, if I’m using one part of my hand, how likely I am to use another part of my hand,” he says.

Prosthetics manufacturers can potentially use information to, say, choose optimal spots for placing pressure sensors and help customize prosthetics to the tasks and objects people regularly interact with.

Joining Sundaram on the paper are: CSAIL postdocs Petr Kellnhofer and Jun-Yan Zhu; CSAIL graduate student Yunzhu Li; Antonio Torralba, a professor in EECS and director of the MIT-IBM Watson AI Lab; and Wojciech Matusik, an associate professor in electrical engineering and computer science and head of the Computational Fabrication group.  

STAG is laminated with an electrically conductive polymer that changes resistance to applied pressure. The researchers sewed conductive threads through holes in the conductive polymer film, from fingertips to the base of the palm. The threads overlap in a way that turns them into pressure sensors. When someone wearing the glove feels, lifts, holds, and drops an object, the sensors record the pressure at each point.

The threads connect from the glove to an external circuit that translates the pressure data into “tactile maps,” which are essentially brief videos of dots growing and shrinking across a graphic of a hand. The dots represent the location of pressure points, and their size represents the force — the bigger the dot, the greater the pressure.

From those maps, the researchers compiled a dataset of about 135,000 video frames from interactions with 26 objects. Those frames can be used by a neural network to predict the identity and weight of objects, and provide insights about the human grasp.

To identify objects, the researchers designed a convolutional neural network (CNN), which is usually used to classify images, to associate specific pressure patterns with specific objects. But the trick was choosing frames from different types of grasps to get a full picture of the object.

The idea was to mimic the way humans can hold an object in a few different ways in order to recognize it, without using their eyesight. Similarly, the researchers’ CNN chooses up to eight semirandom frames from the video that represent the most dissimilar grasps — say, holding a mug from the bottom, top, and handle.

But the CNN can’t just choose random frames from the thousands in each video, or it probably won’t choose distinct grips. Instead, it groups similar frames together, resulting in distinct clusters corresponding to unique grasps. Then, it pulls one frame from each of those clusters, ensuring it has a representative sample. Then the CNN uses the contact patterns it learned in training to predict an object classification from the chosen frames.

“We want to maximize the variation between the frames to give the best possible input to our network,” Kellnhofer says. “All frames inside a single cluster should have a similar signature that represent the similar ways of grasping the object. Sampling from multiple clusters simulates a human interactively trying to find different grasps while exploring an object.”

For weight estimation, the researchers built a separate dataset of around 11,600 frames from tactile maps of objects being picked up by finger and thumb, held, and dropped. Notably, the CNN wasn’t trained on any frames it was tested on, meaning it couldn’t learn to just associate weight with an object. In testing, a single frame was inputted into the CNN. Essentially, the CNN picks out the pressure around the hand caused by the object’s weight, and ignores pressure caused by other factors, such as hand positioning to prevent the object from slipping. Then it calculates the weight based on the appropriate pressures.

The system could be combined with the sensors already on robot joints that measure torque and force to help them better predict object weight. “Joints are important for predicting weight, but there are also important components of weight from fingertips and the palm that we capture,” Sundaram says.

Bringing human-like reasoning to driverless car navigation

To bring more human-like reasoning to autonomous vehicle navigation, MIT researchers have created a system that enables driverless cars to check a simple map and use visual data to follow routes in new, complex environments.
Image: Chelsea Turner

By Rob Matheson

With aims of bringing more human-like reasoning to autonomous vehicles, MIT researchers have created a system that uses only simple maps and visual data to enable driverless cars to navigate routes in new, complex environments.

Human drivers are exceptionally good at navigating roads they haven’t driven on before, using observation and simple tools. We simply match what we see around us to what we see on our GPS devices to determine where we are and where we need to go. Driverless cars, however, struggle with this basic reasoning. In every new area, the cars must first map and analyze all the new roads, which is very time consuming. The systems also rely on complex maps — usually generated by 3-D scans — which are computationally intensive to generate and process on the fly.

In a paper being presented at this week’s International Conference on Robotics and Automation, MIT researchers describe an autonomous control system that “learns” the steering patterns of human drivers as they navigate roads in a small area, using only data from video camera feeds and a simple GPS-like map. Then, the trained system can control a driverless car along a planned route in a brand-new area, by imitating the human driver.

Similarly to human drivers, the system also detects any mismatches between its map and features of the road. This helps the system determine if its position, sensors, or mapping are incorrect, in order to correct the car’s course.

To train the system initially, a human operator controlled an automated Toyota Prius — equipped with several cameras and a basic GPS navigation system — to collect data from local suburban streets including various road structures and obstacles. When deployed autonomously, the system successfully navigated the car along a preplanned path in a different forested area, designated for autonomous vehicle tests.

“With our system, you don’t need to train on every road beforehand,” says first author Alexander Amini, an MIT graduate student. “You can download a new map for the car to navigate through roads it has never seen before.”

“Our objective is to achieve autonomous navigation that is robust for driving in new environments,” adds co-author Daniela Rus, director of the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science. “For example, if we train an autonomous vehicle to drive in an urban setting such as the streets of Cambridge, the system should also be able to drive smoothly in the woods, even if that is an environment it has never seen before.”

Joining Rus and Amini on the paper are Guy Rosman, a researcher at the Toyota Research Institute, and Sertac Karaman, an associate professor of aeronautics and astronautics at MIT.

Point-to-point navigation

Traditional navigation systems process data from sensors through multiple modules customized for tasks such as localization, mapping, object detection, motion planning, and steering control. For years, Rus’s group has been developing “end-to-end” navigation systems, which process inputted sensory data and output steering commands, without a need for any specialized modules.

Until now, however, these models were strictly designed to safely follow the road, without any real destination in mind. In the new paper, the researchers advanced their end-to-end system to drive from goal to destination, in a previously unseen environment. To do so, the researchers trained their system to predict a full probability distribution over all possible steering commands at any given instant while driving.

The system uses a machine learning model called a convolutional neural network (CNN), commonly used for image recognition. During training, the system watches and learns how to steer from a human driver. The CNN correlates steering wheel rotations to road curvatures it observes through cameras and an inputted map. Eventually, it learns the most likely steering command for various driving situations, such as straight roads, four-way or T-shaped intersections, forks, and rotaries.

“Initially, at a T-shaped intersection, there are many different directions the car could turn,” Rus says. “The model starts by thinking about all those directions, but as it sees more and more data about what people do, it will see that some people turn left and some turn right, but nobody goes straight. Straight ahead is ruled out as a possible direction, and the model learns that, at T-shaped intersections, it can only move left or right.”

What does the map say?

In testing, the researchers input the system with a map with a randomly chosen route. When driving, the system extracts visual features from the camera, which enables it to predict road structures. For instance, it identifies a distant stop sign or line breaks on the side of the road as signs of an upcoming intersection. At each moment, it uses its predicted probability distribution of steering commands to choose the most likely one to follow its route.

Importantly, the researchers say, the system uses maps that are easy to store and process. Autonomous control systems typically use LIDAR scans to create massive, complex maps that take roughly 4,000 gigabytes (4 terabytes) of data to store just the city of San Francisco. For every new destination, the car must create new maps, which amounts to tons of data processing. Maps used by the researchers’ system, however, captures the entire world using just 40 gigabytes of data.  

During autonomous driving, the system also continuously matches its visual data to the map data and notes any mismatches. Doing so helps the autonomous vehicle better determine where it is located on the road. And it ensures the car stays on the safest path if it’s being fed contradictory input information: If, say, the car is cruising on a straight road with no turns, and the GPS indicates the car must turn right, the car will know to keep driving straight or to stop.

“In the real world, sensors do fail,” Amini says. “We want to make sure that the system is robust to different failures of different sensors by building a system that can accept these noisy inputs and still navigate and localize itself correctly on the road.”

How to tell whether machine-learning systems are robust enough for the real world

Adversarial examples are slightly altered inputs that cause neural networks to make classification mistakes they normally wouldn’t, such as classifying an image of a cat as a dog.
Image: MIT News Office

By Rob Matheson

MIT researchers have devised a method for assessing how robust machine-learning models known as neural networks are for various tasks, by detecting when the models make mistakes they shouldn’t.

Convolutional neural networks (CNNs) are designed to process and classify images for computer vision and many other tasks. But slight modifications that are imperceptible to the human eye — say, a few darker pixels within an image — may cause a CNN to produce a drastically different classification. Such modifications are known as “adversarial examples.” Studying the effects of adversarial examples on neural networks can help researchers determine how their models could be vulnerable to unexpected inputs in the real world.

For example, driverless cars can use CNNs to process visual input and produce an appropriate response. If the car approaches a stop sign, it would recognize the sign and stop. But a 2018 paper found that placing a certain black-and-white sticker on the stop sign could, in fact, fool a driverless car’s CNN to misclassify the sign, which could potentially cause it to not stop at all.

However, there has been no way to fully evaluate a large neural network’s resilience to adversarial examples for all test inputs. In a paper they are presenting this week at the International Conference on Learning Representations, the researchers describe a technique that, for any input, either finds an adversarial example or guarantees that all perturbed inputs — that still appear similar to the original — are correctly classified. In doing so, it gives a measurement of the network’s robustness for a particular task.

Similar evaluation techniques do exist but have not been able to scale up to more complex neural networks. Compared to those methods, the researchers’ technique runs three orders of magnitude faster and can scale to more complex CNNs.

The researchers evaluated the robustness of a CNN designed to classify images in the MNIST dataset of handwritten digits, which comprises 60,000 training images and 10,000 test images. The researchers found around 4 percent of test inputs can be perturbed slightly to generate adversarial examples that would lead the model to make an incorrect classification.

“Adversarial examples fool a neural network into making mistakes that a human wouldn’t,” says first author Vincent Tjeng, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL). “For a given input, we want to determine whether it is possible to introduce small perturbations that would cause a neural network to produce a drastically different output than it usually would. In that way, we can evaluate how robust different neural networks are, finding at least one adversarial example similar to the input or guaranteeing that none exist for that input.”

Joining Tjeng on the paper are CSAIL graduate student Kai Xiao and Russ Tedrake, a CSAIL researcher and a professor in the Department of Electrical Engineering and Computer Science (EECS).

CNNs process images through many computational layers containing units called neurons. For CNNs that classify images, the final layer consists of one neuron for each category. The CNN classifies an image based on the neuron with the highest output value. Consider a CNN designed to classify images into two categories: “cat” or “dog.” If it processes an image of a cat, the value for the “cat” classification neuron should be higher. An adversarial example occurs when a tiny modification to that image causes the “dog” classification neuron’s value to be higher.

The researchers’ technique checks all possible modifications to each pixel of the image. Basically, if the CNN assigns the correct classification (“cat”) to each modified image, no adversarial examples exist for that image.

Behind the technique is a modified version of “mixed-integer programming,” an optimization method where some of the variables are restricted to be integers. Essentially, mixed-integer programming is used to find a maximum of some objective function, given certain constraints on the variables, and can be designed to scale efficiently to evaluating the robustness of complex neural networks.

The researchers set the limits allowing every pixel in each input image to be brightened or darkened by up to some set value. Given the limits, the modified image will still look remarkably similar to the original input image, meaning the CNN shouldn’t be fooled. Mixed-integer programming is used to find the smallest possible modification to the pixels that could potentially cause a misclassification.

The idea is that tweaking the pixels could cause the value of an incorrect classification to rise. If cat image was fed in to the pet-classifying CNN, for instance, the algorithm would keep perturbing the pixels to see if it can raise the value for the neuron corresponding to “dog” to be higher than that for “cat.”

If the algorithm succeeds, it has found at least one adversarial example for the input image. The algorithm can continue tweaking pixels to find the minimum modification that was needed to cause that misclassification. The larger the minimum modification — called the “minimum adversarial distortion” — the more resistant the network is to adversarial examples. If, however, the correct classifying neuron fires for all different combinations of modified pixels, then the algorithm can guarantee that the image has no adversarial example.

“Given one input image, we want to know if we can modify it in a way that it triggers an incorrect classification,” Tjeng says. “If we can’t, then we have a guarantee that we searched across the whole space of allowable modifications, and found that there is no perturbed version of the original image that is misclassified.”

In the end, this generates a percentage for how many input images have at least one adversarial example, and guarantees the remainder don’t have any adversarial examples. In the real world, CNNs have many neurons and will train on massive datasets with dozens of different classifications, so the technique’s scalability is critical, Tjeng says.

“Across different networks designed for different tasks, it’s important for CNNs to be robust against adversarial examples,” he says. “The larger the fraction of test samples where we can prove that no adversarial example exists, the better the network should perform when exposed to perturbed inputs.”

“Provable bounds on robustness are important as almost all [traditional] defense mechanisms could be broken again,” says Matthias Hein, a professor of mathematics and computer science at Saarland University, who was not involved in the study but has tried the technique. “We used the exact verification framework to show that our networks are indeed robust … [and] made it also possible to verify them compared to normal training.”

Nanoparticles take a fantastic, magnetic voyage

MIT engineers have designed a magnetic microrobot that can help push drug-delivery particles into tumor tissue (left). They also employed swarms of naturally magnetic bacteria to achieve the same effect (right).
Image courtesy of the researchers.

By Anne Trafton

MIT engineers have designed tiny robots that can help drug-delivery nanoparticles push their way out of the bloodstream and into a tumor or another disease site. Like crafts in “Fantastic Voyage” — a 1960s science fiction film in which a submarine crew shrinks in size and roams a body to repair damaged cells — the robots swim through the bloodstream, creating a current that drags nanoparticles along with them.

The magnetic microrobots, inspired by bacterial propulsion, could help to overcome one of the biggest obstacles to delivering drugs with nanoparticles: getting the particles to exit blood vessels and accumulate in the right place.

“When you put nanomaterials in the bloodstream and target them to diseased tissue, the biggest barrier to that kind of payload getting into the tissue is the lining of the blood vessel,” says Sangeeta Bhatia, the John and Dorothy Wilson Professor of Health Sciences and Technology and Electrical Engineering and Computer Science, a member of MIT’s Koch Institute for Integrative Cancer Research and its Institute for Medical Engineering and Science, and the senior author of the study.

“Our idea was to see if you can use magnetism to create fluid forces that push nanoparticles into the tissue,” adds Simone Schuerle, a former MIT postdoc and lead author of the paper, which appears in the April 26 issue of Science Advances.

In the same study, the researchers also showed that they could achieve a similar effect using swarms of living bacteria that are naturally magnetic. Each of these approaches could be suited for different types of drug delivery, the researchers say.

Tiny robots

Schuerle, who is now an assistant professor at the Swiss Federal Institute of Technology (ETH Zurich), first began working on tiny magnetic robots as a graduate student in Brad Nelson’s Multiscale Robotics Lab at ETH Zurich. When she came to Bhatia’s lab as a postdoc in 2014, she began investigating whether this kind of bot could help to make nanoparticle drug delivery more efficient.

In most cases, researchers target their nanoparticles to disease sites that are surrounded by “leaky” blood vessels, such as tumors. This makes it easier for the particles to get into the tissue, but the delivery process is still not as effective as it needs to be.

The MIT team decided to explore whether the forces generated by magnetic robots might offer a better way to push the particles out of the bloodstream and into the target site.

The robots that Schuerle used in this study are 35 hundredths of a millimeter long, similar in size to a single cell, and can be controlled by applying an external magnetic field. This bioinspired robot, which the researchers call an “artificial bacterial flagellum,” consists of a tiny helix that resembles the flagella that many bacteria use to propel themselves. These robots are 3-D-printed with a high-resolution 3-D printer and then coated with nickel, which makes them magnetic.

To test a single robot’s ability to control nearby nanoparticles, the researchers created a microfluidic system that mimics the blood vessels that surround tumors. The channel in their system, between 50 and 200 microns wide, is lined with a gel that has holes to simulate the broken blood vessels seen near tumors.

Using external magnets, the researchers applied magnetic fields to the robot, which makes the helix rotate and swim through the channel. Because fluid flows through the channel in the opposite direction, the robot remains stationary and creates a convection current, which pushes 200-nanometer polystyrene particles into the model tissue. These particles penetrated twice as far into the tissue as nanoparticles delivered without the aid of the magnetic robot.

This type of system could potentially be incorporated into stents, which are stationary and would be easy to target with an externally applied magnetic field. Such an approach could be useful for delivering drugs to help reduce inflammation at the site of the stent, Bhatia says.

Bacterial swarms

The researchers also developed a variant of this approach that relies on swarms of naturally magnetotactic bacteria instead of microrobots. Bhatia has previously developed bacteria that can be used to deliver cancer-fighting drugs and to diagnose cancer, exploiting bacteria’s natural tendency to accumulate at disease sites.

For this study, the researchers used a type of bacteria called Magnetospirillum magneticum, which naturally produces chains of iron oxide. These magnetic particles, known as magnetosomes, help bacteria orient themselves and find their preferred environments.

The researchers discovered that when they put these bacteria into the microfluidic system and applied rotating magnetic fields in certain orientations, the bacteria began to rotate in synchrony and move in the same direction, pulling along any nanoparticles that were nearby. In this case, the researchers found that nanoparticles were pushed into the model tissue three times faster than when the nanoparticles were delivered without any magnetic assistance.

This bacterial approach could be better suited for drug delivery in situations such as a tumor, where the swarm, controlled externally without the need for visual feedback, could generate fluidic forces in vessels throughout the tumor.  

The particles that the researchers used in this study are big enough to carry large payloads, including the components required for the CRISPR genome-editing system, Bhatia says. She now plans to collaborate with Schuerle to further develop both of these magnetic approaches for testing in animal models.

The research was funded by the Swiss National Science Foundation, the Branco Weiss Fellowship, the National Institutes of Health, the National Science Foundation, and the Howard Hughes Medical Institute.

Giving robots a better feel for object manipulation


A new “particle simulator” developed by MIT researchers improves robots’ abilities to mold materials into simulated target shapes and interact with solid objects and liquids. This could give robots a refined touch for industrial applications or for personal robotics— such as shaping clay or rolling sticky sushi rice.
Courtesy of the researchers

By Rob Matheson

A new learning system developed by MIT researchers improves robots’ abilities to mold materials into target shapes and make predictions about interacting with solid objects and liquids. The system, known as a learning-based particle simulator, could give industrial robots a more refined touch — and it may have fun applications in personal robotics, such as modelling clay shapes or rolling sticky rice for sushi.

In robotic planning, physical simulators are models that capture how different materials respond to force. Robots are “trained” using the models, to predict the outcomes of their interactions with objects, such as pushing a solid box or poking deformable clay. But traditional learning-based simulators mainly focus on rigid objects and are unable to handle fluids or softer objects. Some more accurate physics-based simulators can handle diverse materials, but rely heavily on approximation techniques that introduce errors when robots interact with objects in the real world.

In a paper being presented at the International Conference on Learning Representations in May, the researchers describe a new model that learns to capture how small portions of different materials — “particles” — interact when they’re poked and prodded. The model directly learns from data in cases where the underlying physics of the movements are uncertain or unknown. Robots can then use the model as a guide to predict how liquids, as well as rigid and deformable materials, will react to the force of its touch. As the robot handles the objects, the model also helps to further refine the robot’s control.

In experiments, a robotic hand with two fingers, called “RiceGrip,” accurately shaped a deformable foam to a desired configuration — such as a “T” shape — that serves as a proxy for sushi rice. In short, the researchers’ model serves as a type of “intuitive physics” brain that robots can leverage to reconstruct three-dimensional objects somewhat similarly to how humans do.

Humans have an intuitive physics model in our heads, where we can imagine how an object will behave if we push or squeeze it. Based on this intuitive model, humans can accomplish amazing manipulation tasks that are far beyond the reach of current robots,” says first author Yunzhu Li, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL).We want to build this type of intuitive model for robots to enable them to do what humans can do.”

“When children are 5 months old, they already have different expectations for solids and liquids,” adds co-author Jiajun Wu, a CSAIL graduate student. “That’s something we know at an early age, so maybe that’s something we should try to model for robots.”

Joining Li and Wu on the paper are: Russ Tedrake, a CSAIL researcher and a professor in the Department of Electrical Engineering and Computer Science (EECS); Joshua Tenenbaum, a professor in the Department of Brain and Cognitive Sciences; and Antonio Torralba, a professor in EECS and director of the MIT-IBM Watson AI Lab.

Dynamic graphs

A key innovation behind the model, called the “particle interaction network” (DPI-Nets), was creating dynamic interaction graphs, which consist of thousands of nodes and edges that can capture complex behaviors of so-called particles. In the graphs, each node represents a particle. Neighboring nodes are connected with each other using directed edges, which represent the interaction passing from one particle to the other. In the simulator, particles are hundreds of small spheres combined to make up some liquid or a deformable object.

The graphs are constructed as the basis for a machine-learning system called a graph neural network. In training, the model over time learns how particles in different materials react and reshape. It does so by implicitly calculating various properties for each particle — such as its mass and elasticity — to predict if and where the particle will move in the graph when perturbed.

The model then leverages a “propagation” technique, which instantaneously spreads a signal throughout the graph. The researchers customized the technique for each type of material — rigid, deformable, and liquid — to shoot a signal that predicts particles positions at certain incremental time steps. At each step, it moves and reconnects particles, if needed.

For example, if a solid box is pushed, perturbed particles will be moved forward. Because all particles inside the box are rigidly connected with each other, every other particle in the object moves the same calculated distance, rotation, and any other dimension. Particle connections remain intact and the box moves as a single unit. But if an area of deformable foam is indented, the effect will be different. Perturbed particles move forward a lot, surrounding particles move forward only slightly, and particles farther away won’t move at all. With liquids being sloshed around in a cup, particles may completely jump from one end of the graph to the other. The graph must learn to predict where and how much all affected particles move, which is computationally complex.

Shaping and adapting

In their paper, the researchers demonstrate the model by tasking the two-fingered RiceGrip robot with clamping target shapes out of deformable foam. The robot first uses a depth-sensing camera and object-recognition techniques to identify the foam. The researchers randomly select particles inside the perceived shape to initialize the position of the particles. Then, the model adds edges between particles and reconstructs the foam into a dynamic graph customized for deformable materials.

Because of the learned simulations, the robot already has a good idea of how each touch, given a certain amount of force, will affect each of the particles in the graph. As the robot starts indenting the foam, it iteratively matches the real-world position of the particles to the targeted position of the particles. Whenever the particles don’t align, it sends an error signal to the model. That signal tweaks the model to better match the real-world physics of the material.

Next, the researchers aim to improve the model to help robots better predict interactions with partially observable scenarios, such as knowing how a pile of boxes will move when pushed, even if only the boxes at the surface are visible and most of the other boxes are hidden.

The researchers are also exploring ways to combine the model with an end-to-end perception module by operating directly on images. This will be a joint project with Dan Yamins’s group; Yamin recently completed his postdoc at MIT and is now an assistant professor at Stanford University. “You’re dealing with these cases all the time where there’s only partial information,” Wu says. “We’re extending our model to learn the dynamics of all particles, while only seeing a small portion.”

Robots that can sort recycling

RoCycle can detect if an object is paper, metal, or plastic. CSAIL researchers say that such a system could potentially help enable the convenience of single-stream recycling with lower contamination rates that confirm to China’s new recycling standards.
Photo: Jason Dorfman

By Adam Conner-Simons

Every year trash companies sift through an estimated 68 million tons of recycling, which is the weight equivalent of more than 30 million cars.

A key step in the process happens on fast-moving conveyor belts, where workers have to sort items into categories like paper, plastic and glass. Such jobs are dull, dirty, and often unsafe, especially in facilities where workers also have to remove normal trash from the mix.

With that in mind, a team led by researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has developed a robotic system that can detect if an object is paper, metal, or plastic.

The team’s “RoCycle” system includes a soft Teflon hand that uses tactile sensors on its fingertips to detect an object’s size and stiffness. Compatible with any robotic arm, RoCycle was found to be 85 percent accurate at detecting materials when stationary, and 63 percent accurate on an actual simulated conveyer belt. (Its most common error was identifying paper-covered metal tins as paper, which the team says would be improved by adding more sensors along the contact surface.)

“Our robot’s sensorized skin provides haptic feedback that allows it to differentiate between a wide range of objects, from the rigid to the squishy,” says MIT Professor Daniela Rus, senior author on a related paper that will be presented in April at the IEEE International Conference on Soft Robotics (RoboSoft) in Seoul, South Korea. “Computer vision alone will not be able to solve the problem of giving machines human-like perception, so being able to use tactile input is of vital importance.”

A collaboration with Yale University, RoCycle directly demonstrates the limits of sight-based sorting: It can reliably distinguish between two visually similar Starbucks cups, one made of paper and one made of plastic, that would give vision systems trouble.

Incentivizing recycling

Rus says that the project is part of her larger goal to reduce the back-end cost of recycling, in order to incentivize more cities and countries to create their own programs. Today recycling centers aren’t particularly automated; their main kinds of machinery include optical sorters that use different wavelength light to distinguish between plastics, magnetic sorters that separate out iron and steel products, and aluminum sorters that use eddy currents to remove non-magnetic metals.

This is a problem for one very big reason: just last month China raised its standards for the cleanliness of recycled goods it accepts from the United States, meaning that some of the country’s single-stream recycling is now sent to landfills.

“If a system like RoCycle could be deployed on a wide scale, we’d potentially be able to have the convenience of single-stream recycling with the lower contamination rates of multi-stream recycling,” says PhD student Lillian Chin, lead author on the new paper.

It’s surprisingly hard to develop machines that can distinguish between paper, plastic, and metal, which shows how impressive a feat it is for humans. When we pick up an object, we can immediately recognize many of its qualities even with our eyes closed, like whether it’s large and stiff or small and soft. By feeling the object and understanding how that relates to the softness of our own fingertips, we are able to learn how to handle a wide range of objects without dropping or breaking them.

This kind of intuition is tough to program into robots. Traditional hard (“rigid”) robot hands have to know an object’s exact location and size to be able to calculate a precise motion path. Soft hands made of materials like rubber are much more flexible, but have a different problem: Because they’re powered by fluidic forces, they have a balloon-like structure that can puncture quite easily.

How RoCycle works

Rus’ team used a motor-driven hand made of a relatively new material called “auxetics.” Most materials get narrower when pulled on, like a rubber band when you stretch it; auxetics, meanwhile, actually get wider. The MIT team took this concept and put a twist on it, quite literally: They created auxetics that, when cut, twist to either the left or right. Combining a “left-handed” and “right-handed” auxetic for each of the hand’s two large fingers makes them interlock and oppose each other’s rotation, enabling more dynamic movement. (The team calls this “handed-shearing auxetics”, or HSA.)

“In contrast to soft robots, whose fluid-driven approach requires air pumps and compressors, HSA combines twisting with extension, meaning that you’re able to use regular motors,” says Chin.

The team’s gripper first uses its “strain sensor” to estimate an object’s size, and then uses its two pressure sensors to measure the force needed to grasp an object. These metrics — along with calibration data on the size and stiffnesses of objects of different material types — are what gives the gripper a sense of what material the object is made. (Since the tactile sensors are also conductive, they can detect metal by how much it changes the electrical signal.)

“In other words, we estimate the size and measure the pressure difference between the current closed hand and what a normal open hand should look like,” says Chin. “We use this pressure difference and size to classify the specific object based on information about different objects that we’ve already measured.”

RoCycle builds on an set of sensors that detect the radius of an object to within 30 percent accuracy, and tell the difference between “hard” and “soft” objects with 78 percent accuracy. The team’s hand is also almost completely puncture resistant: It was able to be scraped by a sharp lid and punctured by a needle more than 20 times, with minimal structural damage.

As a next step, the researchers plan to build out the system so that it can combine tactile data with actual video data from a robot’s cameras. This would allow the team to further improve its accuracy and potentially allow for even more nuanced differentiation between different kinds of materials.

Chin and Rus co-wrote the RoCycle paper alongside MIT postdoc Jeffrey Lipton, as well as PhD student Michelle Yuen and Professor Rebecca Kramer-Bottiglio of Yale University.

This project was supported in part by Amazon, JD.com, the Toyota Research Institute, and the National Science Foundation.

Teaching machines to reason about what they see

Researchers trained a hybrid AI model to answer questions like “Does the red object left of the green cube have the same shape as the purple matte thing?” by feeding it examples of object colors and shapes followed by more complex scenarios involving multi-object comparisons. The model could transfer this knowledge to new scenarios as well as or better than state-of-the-art models using a fraction of the training data.
Image: Justin Johnson

A child who has never seen a pink elephant can still describe one — unlike a computer. “The computer learns from data,” says Jiajun Wu, a PhD student at MIT. “The ability to generalize and recognize something you’ve never seen before — a pink elephant — is very hard for machines.”

Deep learning systems interpret the world by picking out statistical patterns in data. This form of machine learning is now everywhere, automatically tagging friends on Facebook, narrating Alexa’s latest weather forecast, and delivering fun facts via Google search. But statistical learning has its limits. It requires tons of data, has trouble explaining its decisions, and is terrible at applying past knowledge to new situations; It can’t comprehend an elephant that’s pink instead of gray.  

To give computers the ability to reason more like us, artificial intelligence (AI) researchers are returning to abstract, or symbolic, programming. Popular in the 1950s and 1960s, symbolic AI wires in the rules and logic that allow machines to make comparisons and interpret how objects and entities relate. Symbolic AI uses less data, records the chain of steps it takes to reach a decision, and when combined with the brute processing power of statistical neural networks, it can even beat humans in a complicated image comprehension test. 

A new study by a team of researchers at MITMIT-IBM Watson AI Lab, and DeepMind shows the promise of merging statistical and symbolic AI. Led by Wu and Joshua Tenenbaum, a professor in MIT’s Department of Brain and Cognitive Sciences and the Computer Science and Artificial Intelligence Laboratory, the team shows that its hybrid model can learn object-related concepts like color and shape, and leverage that knowledge to interpret complex object relationships in a scene. With minimal training data and no explicit programming, their model could transfer concepts to larger scenes and answer increasingly tricky questions as well as or better than its state-of-the-art peers. The team presents its results at the International Conference on Learning Representations in May.

“One way children learn concepts is by connecting words with images,” says the study’s lead author Jiayuan Mao, an undergraduate at Tsinghua University who worked on the project as a visiting fellow at MIT. “A machine that can learn the same way needs much less data, and is better able to transfer its knowledge to new scenarios.”

The study is a strong argument for moving back toward abstract-program approaches, says Jacob Andreas, a recent graduate of the University of California at Berkeley, who starts at MIT as an assistant professor this fall and was not involved in the work. “The trick, it turns out, is to add more symbolic structure, and to feed the neural networks a representation of the world that’s divided into objects and properties rather than feeding it raw images,” he says. “This work gives us insight into what machines need to understand before language learning is possible.”

The team trained their model on images paired with related questions and answers, part of the CLEVR image comprehension test developed at Stanford University. As the model learns, the questions grow progressively harder, from, “What’s the color of the object?” to “How many objects are both right of the green cylinder and have the same material as the small blue ball?” Once object-level concepts are mastered, the model advances to learning how to relate objects and their properties to each other.

Like other hybrid AI models, MIT’s works by splitting up the task. A perception module of neural networks crunches the pixels in each image and maps the objects. A language module, also made of neural nets, extracts a meaning from the words in each sentence and creates symbolic programs, or instructions, that tell the machine how to answer the question. A third reasoning module runs the symbolic programs on the scene and gives an answer, updating the model when it makes mistakes.

Key to the team’s approach is a perception module that translates the image into an object-based representation, making the programs easier to execute. Also unique is what they call curriculum learning, or selectively training the model on concepts and scenes that grow progressively more difficult. It turns out that feeding the machine data in a logical way, rather than haphazardly, helps the model learn faster while improving accuracy.

Once the model has a solid foundation, it can interpret new scenes and concepts, and increasingly difficult questions, almost perfectly. Asked to answer an unfamiliar question like, “What’s the shape of the big yellow thing?” it outperformed its peers at Stanford and nearby MIT Lincoln Laboratory with a fraction of the data. 

While other models trained on the full CLEVR dataset of 70,000 images and 700,000 questions, the MIT-IBM model used 5,000 images and 100,000 questions. As the model built on previously learned concepts, it absorbed the programs underlying each question, speeding up the training process. 

Though statistical, deep learning models are now embedded in daily life, much of their decision process remains hidden from view. This lack of transparency makes it difficult to anticipate where the system is susceptible to manipulation, error, or bias. Adding a symbolic layer can open the black box, explaining the growing interest in hybrid AI systems.

“Splitting the task up and letting programs do some of the work is the key to building interpretability into deep learning models,” says Lincoln Laboratory researcher David Mascharka, whose hybrid model, Transparency by Design Network, is benchmarked in the MIT-IBM study.      

The MIT-IBM team is now working to improve the model’s performance on real-world photos and extending it to video understanding and robotic manipulation. Other authors of the study are Chuang Gan and Pushmeet Kohli, researchers atthe MIT-IBM Watson AI Lab and DeepMind, respectively.

“Particle robot” works as a cluster of simple units


By Rob Matheson
Researchers have developed computationally simple robots, called particles, that cluster and form a single “particle robot” that moves around, transports objects, and completes other tasks. The work hails from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), Columbia University, and elsewhere.
Image: Felice Frankel

Taking a cue from biological cells, researchers from MIT, Columbia University, and elsewhere have developed computationally simple robots that connect in large groups to move around, transport objects, and complete other tasks.

This so-called “particle robotics” system — based on a project by MIT, Columbia Engineering, Cornell University, and Harvard University researchers — comprises many individual disc-shaped units, which the researchers call “particles.” The particles are loosely connected by magnets around their perimeters, and each unit can only do two things: expand and contract. (Each particle is about 6 inches in its contracted state and about 9 inches when expanded.) That motion, when carefully timed, allows the individual particles to push and pull one another in coordinated movement. On-board sensors enable the cluster to gravitate toward light sources.

In a Nature paper published today, the researchers demonstrate a cluster of two dozen real robotic particles and a virtual simulation of up to 100,000 particles moving through obstacles toward a light bulb. They also show that a particle robot can transport objects placed in its midst.

Particle robots can form into many configurations and fluidly navigate around obstacles and squeeze through tight gaps. Notably, none of the particles directly communicate with or rely on one another to function, so particles can be added or subtracted without any impact on the group. In their paper, the researchers show particle robotic systems can complete tasks even when many units malfunction.

The paper represents a new way to think about robots, which are traditionally designed for one purpose, comprise many complex parts, and stop working when any part malfunctions. Robots made up of these simplistic components, the researchers say, could enable more scalable, flexible, and robust systems.

“We have small robot cells that are not so capable as individuals but can accomplish a lot as a group,” says Daniela Rus, director of the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science. “The robot by itself is static, but when it connects with other robot particles, all of a sudden the robot collective can explore the world and control more complex actions. With these ‘universal cells,’ the robot particles can achieve different shapes, global transformation, global motion, global behavior, and, as we have shown in our experiments, follow gradients of light. This is very powerful.”

Joining Rus on the paper are: first author Shuguang Li, a CSAIL postdoc; co-first author Richa Batra and corresponding author Hod Lipson, both of Columbia Engineering; David Brown, Hyun-Dong Chang, and Nikhil Ranganathan of Cornell; and Chuck Hoberman of Harvard.

At MIT, Rus has been working on modular, connected robots for nearly 20 years, including an expanding and contracting cube robot that could connect to others to move around. But the square shape limited the robots’ group movement and configurations.

In collaboration with Lipson’s lab, where Li was a graduate student until coming to MIT in 2014, the researchers went for disc-shaped mechanisms that can rotate around one another. They can also connect and disconnect from each other, and form into many configurations.

Each unit of a particle robot has a cylindrical base, which houses a battery, a small motor, sensors that detect light intensity, a microcontroller, and a communication component that sends out and receives signals. Mounted on top is a children’s toy called a Hoberman Flight Ring — its inventor is one of the paper’s co-authors — which consists of small panels connected in a circular formation that can be pulled to expand and pushed back to contract. Two small magnets are installed in each panel.

The trick was programming the robotic particles to expand and contract in an exact sequence to push and pull the whole group toward a destination light source. To do so, the researchers equipped each particle with an algorithm that analyzes broadcasted information about light intensity from every other particle, without the need for direct particle-to-particle communication.

The sensors of a particle detect the intensity of light from a light source; the closer the particle is to the light source, the greater the intensity. Each particle constantly broadcasts a signal that shares its perceived intensity level with all other particles. Say a particle robotic system measures light intensity on a scale of levels 1 to 10: Particles closest to the light register a level 10 and those furthest will register level 1. The intensity level, in turn, corresponds to a specific time that the particle must expand. Particles experiencing the highest intensity — level 10 — expand first. As those particles contract, the next particles in order, level 9, then expand. That timed expanding and contracting motion happens at each subsequent level.

“This creates a mechanical expansion-contraction wave, a coordinated pushing and dragging motion, that moves a big cluster toward or away from environmental stimuli,” Li says. The key component, Li adds, is the precise timing from a shared synchronized clock among the particles that enables movement as efficiently as possible: “If you mess up the synchronized clock, the system will work less efficiently.”

In videos, the researchers demonstrate a particle robotic system comprising real particles moving and changing directions toward different light bulbs as they’re flicked on, and working its way through a gap between obstacles. In their paper, the researchers also show that simulated clusters of up to 10,000 particles maintain locomotion, at half their speed, even with up to 20 percent of units failed.

“It’s a bit like the proverbial ‘gray goo,’” says Lipson, a professor of mechanical engineering at Columbia Engineering, referencing the science-fiction concept of a self-replicating robot that comprises billions of nanobots. “The key novelty here is that you have a new kind of robot that has no centralized control, no single point of failure, no fixed shape, and its components have no unique identity.”

The next step, Lipson adds, is miniaturizing the components to make a robot composed of millions of microscopic particles.

Robots track moving objects with unprecedented precision

MIT Media Lab researchers are using RFID tags to help robots home in on moving objects with unprecedented speed and accuracy, potentially enabling greater collaboration in robotic packaging and assembly and among swarms of drones.
Photo courtesy of the researchers

A novel system developed at MIT uses RFID tags to help robots home in on moving objects with unprecedented speed and accuracy. The system could enable greater collaboration and precision by robots working on packaging and assembly, and by swarms of drones carrying out search-and-rescue missions.

In a paper being presented next week at the USENIX Symposium on Networked Systems Design and Implementation, the researchers show that robots using the system can locate tagged objects within 7.5 milliseconds, on average, and with an error of less than a centimeter.

In the system, called TurboTrack, an RFID (radio-frequency identification) tag can be applied to any object. A reader sends a wireless signal that reflects off the RFID tag and other nearby objects, and rebounds to the reader. An algorithm sifts through all the reflected signals to find the RFID tag’s response. Final computations then leverage the RFID tag’s movement — even though this usually decreases precision — to improve its localization accuracy.

The researchers say the system could replace computer vision for some robotic tasks. As with its human counterpart, computer vision is limited by what it can see, and it can fail to notice objects in cluttered environments. Radio frequency signals have no such restrictions: They can identify targets without visualization, within clutter and through walls.

To validate the system, the researchers attached one RFID tag to a cap and another to a bottle. A robotic arm located the cap and placed it onto the bottle, held by another robotic arm. In another demonstration, the researchers tracked RFID-equipped nanodrones during docking, maneuvering, and flying. In both tasks, the system was as accurate and fast as traditional computer-vision systems, while working in scenarios where computer vision fails, the researchers report.

“If you use RF signals for tasks typically done using computer vision, not only do you enable robots to do human things, but you can also enable them to do superhuman things,” says Fadel Adib, an assistant professor and principal investigator in the MIT Media Lab, and founding director of the Signal Kinetics Research Group. “And you can do it in a scalable way, because these RFID tags are only 3 cents each.”

In manufacturing, the system could enable robot arms to be more precise and versatile in, say, picking up, assembling, and packaging items along an assembly line. Another promising application is using handheld “nanodrones” for search and rescue missions. Nanodrones currently use computer vision and methods to stitch together captured images for localization purposes. These drones often get confused in chaotic areas, lose each other behind walls, and can’t uniquely identify each other. This all limits their ability to, say, spread out over an area and collaborate to search for a missing person. Using the researchers’ system, nanodrones in swarms could better locate each other, for greater control and collaboration.

“You could enable a swarm of nanodrones to form in certain ways, fly into cluttered environments, and even environments hidden from sight, with great precision,” says first author Zhihong Luo, a graduate student in the Signal Kinetics Research Group.

The other Media Lab co-authors on the paper are visiting student Qiping Zhang, postdoc Yunfei Ma, and Research Assistant Manish Singh.

Super resolution

Adib’s group has been working for years on using radio signals for tracking and identification purposes, such as detecting contamination in bottled foods, communicating with devices inside the body, and managing warehouse inventory.

Similar systems have attempted to use RFID tags for localization tasks. But these come with trade-offs in either accuracy or speed. To be accurate, it may take them several seconds to find a moving object; to increase speed, they lose accuracy.

The challenge was achieving both speed and accuracy simultaneously. To do so, the researchers drew inspiration from an imaging technique called “super-resolution imaging.” These systems stitch together images from multiple angles to achieve a finer-resolution image.

“The idea was to apply these super-resolution systems to radio signals,” Adib says. “As something moves, you get more perspectives in tracking it, so you can exploit the movement for accuracy.”

The system combines a standard RFID reader with a “helper” component that’s used to localize radio frequency signals. The helper shoots out a wideband signal comprising multiple frequencies, building on a modulation scheme used in wireless communication, called orthogonal frequency-division multiplexing.

The system captures all the signals rebounding off objects in the environment, including the RFID tag. One of those signals carries a signal that’s specific to the specific RFID tag, because RFID signals reflect and absorb an incoming signal in a certain pattern, corresponding to bits of 0s and 1s, that the system can recognize.

Because these signals travel at the speed of light, the system can compute a “time of flight” — measuring distance by calculating the time it takes a signal to travel between a transmitter and receiver — to gauge the location of the tag, as well as the other objects in the environment. But this provides only a ballpark localization figure, not subcentimter precision.

Leveraging movement

To zoom in on the tag’s location, the researchers developed what they call a “space-time super-resolution” algorithm.

The algorithm combines the location estimations for all rebounding signals, including the RFID signal, which it determined using time of flight. Using some probability calculations, it narrows down that group to a handful of potential locations for the RFID tag.

As the tag moves, its signal angle slightly alters — a change that also corresponds to a certain location. The algorithm then can use that angle change to track the tag’s distance as it moves. By constantly comparing that changing distance measurement to all other distance measurements from other signals, it can find the tag in a three-dimensional space. This all happens in a fraction of a second.

“The high-level idea is that, by combining these measurements over time and over space, you get a better reconstruction of the tag’s position,” Adib says.

The work was sponsored, in part, by the National Science Foundation.

Identifying artificial intelligence “blind spots”

By Rob Matheson

A novel model developed by MIT and Microsoft researchers identifies instances in which autonomous systems have “learned” from training examples that don’t match what’s actually happening in the real world. Engineers could use this model to improve the safety of artificial intelligence systems, such as driverless vehicles and autonomous robots.

The AI systems powering driverless cars, for example, are trained extensively in virtual simulations to prepare the vehicle for nearly every event on the road. But sometimes the car makes an unexpected error in the real world because an event occurs that should, but doesn’t, alter the car’s behavior.

Consider a driverless car that wasn’t trained, and more importantly doesn’t have the sensors necessary, to differentiate between distinctly different scenarios, such as large, white cars and ambulances with red, flashing lights on the road. If the car is cruising down the highway and an ambulance flicks on its sirens, the car may not know to slow down and pull over, because it does not perceive the ambulance as different from a big white car.

In a pair of papers — presented at last year’s Autonomous Agents and Multiagent Systems conference and the upcoming Association for the Advancement of Artificial Intelligence conference — the researchers describe a model that uses human input to uncover these training “blind spots.”

As with traditional approaches, the researchers put an AI system through simulation training. But then, a human closely monitors the system’s actions as it acts in the real world, providing feedback when the system made, or was about to make, any mistakes. The researchers then combine the training data with the human feedback data, and use machine-learning techniques to produce a model that pinpoints situations where the system most likely needs more information about how to act correctly.

The researchers validated their method using video games, with a simulated human correcting the learned path of an on-screen character. But the next step is to incorporate the model with traditional training and testing approaches for autonomous cars and robots with human feedback.

“The model helps autonomous systems better know what they don’t know,” says first author Ramya Ramakrishnan, a graduate student in the Computer Science and Artificial Intelligence Laboratory. “Many times, when these systems are deployed, their trained simulations don’t match the real-world setting [and] they could make mistakes, such as getting into accidents. The idea is to use humans to bridge that gap between simulation and the real world, in a safe way, so we can reduce some of those errors.”

Co-authors on both papers are: Julie Shah, an associate professor in the Department of Aeronautics and Astronautics and head of the CSAIL’s Interactive Robotics Group; and Ece Kamar, Debadeepta Dey, and Eric Horvitz, all from Microsoft Research. Besmira Nushi is an additional co-author on the upcoming paper.

Taking feedback

Some traditional training methods do provide human feedback during real-world test runs, but only to update the system’s actions. These approaches don’t identify blind spots, which could be useful for safer execution in the real world.

The researchers’ approach first puts an AI system through simulation training, where it will produce a “policy” that essentially maps every situation to the best action it can take in the simulations. Then, the system will be deployed in the real-world, where humans provide error signals in regions where the system’s actions are unacceptable.

Humans can provide data in multiple ways, such as through “demonstrations” and “corrections.” In demonstrations, the human acts in the real world, while the system observes and compares the human’s actions to what it would have done in that situation. For driverless cars, for instance, a human would manually control the car while the system produces a signal if its planned behavior deviates from the human’s behavior. Matches and mismatches with the human’s actions provide noisy indications of where the system might be acting acceptably or unacceptably.

Alternatively, the human can provide corrections, with the human monitoring the system as it acts in the real world. A human could sit in the driver’s seat while the autonomous car drives itself along its planned route. If the car’s actions are correct, the human does nothing. If the car’s actions are incorrect, however, the human may take the wheel, which sends a signal that the system was not acting unacceptably in that specific situation.

Once the feedback data from the human is compiled, the system essentially has a list of situations and, for each situation, multiple labels saying its actions were acceptable or unacceptable. A single situation can receive many different signals, because the system perceives many situations as identical. For example, an autonomous car may have cruised alongside a large car many times without slowing down and pulling over. But, in only one instance, an ambulance, which appears exactly the same to the system, cruises by. The autonomous car doesn’t pull over and receives a feedback signal that the system took an unacceptable action.

“At that point, the system has been given multiple contradictory signals from a human: some with a large car beside it, and it was doing fine, and one where there was an ambulance in the same exact location, but that wasn’t fine. The system makes a little note that it did something wrong, but it doesn’t know why,” Ramakrishnan says. “Because the agent is getting all these contradictory signals, the next step is compiling the information to ask, ‘How likely am I to make a mistake in this situation where I received these mixed signals?’”

Intelligent aggregation

The end goal is to have these ambiguous situations labeled as blind spots. But that goes beyond simply tallying the acceptable and unacceptable actions for each situation. If the system performed correct actions nine times out of 10 in the ambulance situation, for instance, a simple majority vote would label that situation as safe.

“But because unacceptable actions are far rarer than acceptable actions, the system will eventually learn to predict all situations as safe, which can be extremely dangerous,” Ramakrishnan says.

To that end, the researchers used the Dawid-Skene algorithm, a machine-learning method used commonly for crowdsourcing to handle label noise. The algorithm takes as input a list of situations, each having a set of noisy “acceptable” and “unacceptable” labels. Then it aggregates all the data and uses some probability calculations to identify patterns in the labels of predicted blind spots and patterns for predicted safe situations. Using that information, it outputs a single aggregated “safe” or “blind spot” label for each situation along with a its confidence level in that label. Notably, the algorithm can learn in a situation where it may have, for instance, performed acceptably 90 percent of the time, the situation is still ambiguous enough to merit a “blind spot.”

In the end, the algorithm produces a type of “heat map,” where each situation from the system’s original training is assigned low-to-high probability of being a blind spot for the system.

“When the system is deployed into the real world, it can use this learned model to act more cautiously and intelligently. If the learned model predicts a state to be a blind spot with high probability, the system can query a human for the acceptable action, allowing for safer execution,” Ramakrishnan says.

Page 7 of 12
1 5 6 7 8 9 12