Page 10 of 12
1 8 9 10 11 12

Cheetah III robot preps for a role as a first responder

Associate professor of mechanical engineering Sangbae Kim and his team at the Biomimetic Robotics Lab developed the quadruped robot, the MIT Cheetah.
Photo: David Sella

By Eric Brown

If you were to ask someone to name a new technology that emerged from MIT in the 21st century, there’s a good chance they would name the robotic cheetah. Developed by the MIT Department of Mechanical Engineering’s Biomimetic Robotics Lab under the direction of Associate Professor Sangbae Kim, the quadruped MIT Cheetah has made headlines for its dynamic legged gait, speed, jumping ability, and biomimetic design.

The dog-sized Cheetah II can run on four articulated legs at up to 6.4 meters per second, make mild running turns, and leap to a height of 60 centimeters. The robot can also autonomously determine how to avoid or jump over obstacles.

Kim is now developing a third-generation robot, the Cheetah III. Instead of improving the Cheetah’s speed and jumping capabilities, Kim is converting the Cheetah into a commercially viable robot with enhancements such as a greater payload capability, wider range of motion, and a dexterous gripping function. The Cheetah III will initially act as a spectral inspection robot in hazardous environments such as a compromised nuclear plant or chemical factory. It will then evolve to serve other emergency response needs.

“The Cheetah II was focused on high speed locomotion and agile jumping, but was not designed to perform other tasks,” says Kim. “With the Cheetah III, we put a lot of practical requirements on the design so it can be an all-around player. It can do high-speed motion and powerful actions, but it can also be very precise.”

The Biomimetic Robotics Lab is also finishing up a smaller, stripped down version of the Cheetah, called the Mini Cheetah, designed for robotics research and education. Other projects include a teleoperated humanoid robot called the Hermes that provides haptic feedback to human operators. There’s also an early stage investigation into applying Cheetah-like actuator technology to address mobility challenges among the disabled and elderly.

Conquering mobility on the land

“With the Cheetah project, I was initially motivated by copying land animals, but I also realized there was a gap in ground mobility,” says Kim. “We have conquered air and water transportation, but we haven’t conquered ground mobility because our technologies still rely on artificially paved roads or rails. None of our transportation technologies can reliably travel over natural ground or even man-made environments with stairs and curbs. Dynamic legged robots can help us conquer mobility on the ground.”

One challenge with legged systems is that they “need high torque actuators,” says Kim. “A human hip joint can generate more torque than a sports car, but achieving such condensed high torque actuation in robots is a big challenge.”

Robots tend to achieve high torque at the expense of speed and flexibility, says Kim. Factory robots use high torque actuators but they are rigid and cannot absorb energy upon the impact that results from climbing steps. Hydraulically powered, dynamic legged robots, such as the larger, higher-payload, quadruped Big Dog from Boston Dynamics, can achieve very high force and power, but at the expense of efficiency. “Efficiency is a serious issue with hydraulics, especially when you move fast,” he adds.

A chief goal of the Cheetah project has been to create actuators that can generate high torque in designs that imitate animal muscles while also achieving efficiency. To accomplish this, Kim opted for electric rather than hydraulic actuators. “Our high torque electric motors have exceeded the efficiency of animals with biological muscles, and are much more efficient, cheaper, and faster than hydraulic robots,” he says.

Cheetah III: More than a speedster

Unlike the earlier versions, the Cheetah III design was motivated more by potential applications than pure research. Kim and his team studied the requirements for an emergency response robot and worked backward.

“We believe the Cheetah III will be able to navigate in a power plant with radiation in two or three years,” says Kim. “In five to 10 years it should be able to do more physical work like disassembling a power plant by cutting pieces and bringing them out. In 15 to 20 years, it should be able to enter a building fire and possibly save a life.”

In situations such as the Fukushima nuclear disaster, robots or drones are the only safe choice for reconnaissance. Drones have some advantages over robots, but they cannot apply large forces necessary for tasks such as opening doors, and there are many disaster situations in which fallen debris prohibits drone flight.

By comparison, the Cheetah III can apply human-level forces to the environment for hours at a time. It can often climb or jump over debris, or even move it out of the way. Compared to a drone, it’s also easier for a robot to closely inspect instrumentation, flip switches, and push buttons, says Kim. “The Cheetah III can measure temperatures or chemical compounds, or close and open valves.”

Advantages over tracked robots include the ability to maneuver over debris and climb stairs. “Stairs are some of the biggest obstacles for robots,” says Kim. “We think legged robots are better in man-made environments, especially in disaster situations where there are even more obstacles.”

The Cheetah III was slowed down a bit compared to the Cheetah II, but also given greater strength and flexibility. “We increased the torque so it can open the heavy doors found in power plants,” says Kim. “We increased the range of motion to 12 degrees of freedom by using 12 electric motors that can articulate the body and the limbs.”

This is still far short of the flexibility of animals, which have over 600 muscles. Yet, the Cheetah III can compensate somewhat with other techniques. “We maximize each joint’s work space to achieve a reasonable amount of reachability,” says Kim.

The design can even use the legs for manipulation. “By utilizing the flexibility of the limbs, the Cheetah III can open the door with one leg,” says Kim. “It can stand on three legs and equip the fourth limb with a customized swappable hand to open the door or close a valve.”

The Cheetah III has an improved payload capability to carry heavier sensors and cameras, and possibly even to drop off supplies to disabled victims. However, it’s a long way from being able to rescue them. The Cheetah III is still limited to a 20-kilogram payload, and can travel untethered for four to five hours with a minimal payload.

“Eventually, we hope to develop a machine that can rescue a person,” says Kim. “We’re not sure if the robot would carry the victim or bring a carrying device,” he says. “Our current design can at least see if there are any victims or if there are any more potential dangerous events.”

Experimenting with human-robot interaction

The semiautonomous Cheetah III can make ambulatory and navigation decisions on its own. However, for disaster work, it will primarily operate by remote control.

“Fully autonomous inspection, especially in disaster response, would be very hard,” says Kim. Among other issues, autonomous decision making often takes time, and can involve trial and error, which could delay the response.

“People will control the Cheetah III at a high level, offering assistance, but not handling every detail,” says Kim. “People could tell it to go to a specific location at the map, find this place, and open that door. When it comes to hand action or manipulation, the human will take over more control and tell the robot what tool to use.”

Humans may also be able to assist with more instinctive controls. For example, if the Cheetah uses one of its legs as an arm and then applies force, it’s hard to maintain balance. Kim is now investigating whether human operators can use “balanced feedback” to keep the Cheetah from falling over while applying full force.

“Even standing on two or three legs, it would still be able to perform high force actions that require complex balancing,” says Kim. “The human operator can feel the balance, and help the robot shift its momentum to generate more force to open or hammer a door.”

The Biomimetic Robotics Lab is exploring balanced feedback with another robot project called Hermes (Highly Efficient Robotic Mechanisms and Electromechanical System). Like the Cheetah III, it’s a fully articulated, dynamic legged robot designed for disaster response. Yet, the Hermes is bipedal, and completely teleoperated by a human who wears a telepresence helmet and a full body suit. Like the Hermes, the suit is rigged with sensors and haptic feedback devices.

“The operator can sense the balance situation and react by using body weight or directly implementing more forces,” says Kim.

The latency required for such intimate real-time feedback is difficult to achieve with Wi-Fi, even when it’s not blocked by walls, distance, or wireless interference. “In most disaster situations, you would need some sort of wired communication,” says Kim. “Eventually, I believe we’ll use reinforced optical fibers.”

Improving mobility for the elderly

Looking beyond disaster response, Kim envisions an important role for agile, dynamic legged robots in health care: improving mobility for the fast-growing elderly population. Numerous robotics projects are targeting the elderly market with chatty social robots. Kim is imagining something more fundamental.

“We still don’t have a technology that can help impaired or elderly people seamlessly move from the bed to the wheelchair to the car and back again,” says Kim. “A lot of elderly people have problems getting out of bed and climbing stairs. Some elderly with knee joint problems, for example, are still pretty mobile on flat ground, but can’t climb down the stairs unassisted. That’s a very small fraction of the day when they need help. So we’re looking for something that’s lightweight and easy to use for short-time help.”

Kim is currently working on “creating a technology that could make the actuator safe,” he says. “The electric actuators we use in the Cheetah are already safer than other machines because they can easily absorb energy. Most robots are stiff, which would cause a lot of impact forces. Our machines give a little.”

By combining such safe actuator technology with some of the Hermes technology, Kim hopes to develop a robot that can help elderly people in the future. “Robots can not only address the expected labor shortages for elder care, but also the need to maintain privacy and dignity,” he says.

The autonomous “selfie drone”

Skydio, a San Francisco-based startup founded by three MIT alumni, is commercializing an autonomous video-capturing drone — dubbed by some as the “selfie drone” — that tracks and films a subject, while freely navigating any environment.
Courtesy of Skydio

By Rob Matheson

If you’re a rock climber, hiker, runner, dancer, or anyone who likes recording themselves while in motion, a personal drone companion can now do all the filming for you — completely autonomously.

Skydio, a San Francisco-based startup founded by three MIT alumni, is commercializing an autonomous video-capturing drone — dubbed by some as the “selfie drone” — that tracks and films a subject, while freely navigating any environment.

Called R1, the drone is equipped with 13 cameras that capture omnidirectional video. It launches and lands through an app — or by itself. On the app, the R1 can also be preset to certain filming and flying conditions or be controlled manually.

The concept for the R1 started taking shape almost a decade ago at MIT, where the co-founders — Adam Bry SM ’12, Abraham Bacharach PhD ’12, and Matt Donahoe SM ’11 — first met and worked on advanced, prize-winning autonomous drones. Skydio launched in 2014 and is releasing the R1 to consumers this week.

“Our goal with our first product is to deliver on the promise of an autonomous flying camera that understands where you are, understands the scene around it, and can move itself to capture amazing video you wouldn’t otherwise be able to get,” says Bry, co-founder and CEO of Skydio.

Deep understanding

Existing drones, Bry says, generally require a human pilot. Some offer pilot-assist features that aid the human controller. But that’s the equivalent of having a car with adaptive cruise control — which automatically adjusts vehicle speed to maintain a safe distance from the cars ahead, Bry says. Skydio, on the other hand, “is like a driverless car with level-four autonomy,” he says, referring to the second-highest level of vehicle automation.

R1’s system integrates advanced algorithm components spanning perception, planning, and control, which give it unique intelligence “that’s analogous to how a person would navigate an environment,” Bry says.

On the perception side, the system uses computer vision to determine the location of objects. Using a deep neural network, it compiles information on each object and identifies each individual by, say, clothing and size. “For each person it sees, it builds up a unique visual identification to tell people apart and stays focused on the right person,” Bry says.

That data feeds into a motion-planning system, which pinpoints a subject’s location and predicts their next move. It also recognizes maneuvering limits in one area to optimize filming. “All information is constantly traded off and balanced … to capture a smooth video,” Bry says.

Finally, the control system takes all information to execute the drone’s plan in real time. “No other system has this depth of understanding,” Bry says. Others may have one or two components, “but none has a full, end-to-end, autonomous [software] stack designed and integrated together.”

For users, the end result, Bry says, is a drone that’s as simple to use as a camera app: “If you’re comfortable taking pictures with your iPhone, you should be comfortable using R1 to capture video.”

A user places the drone on the ground or in their hand, and swipes up on the Skydio app. (A manual control option is also available.) The R1 lifts off, identifies the user, and begins recording and tracking. From there, it operates completely autonomously, staying anywhere from 10 feet to 30 feet away from a subject, autonomously, or 300 feet away, manually, depending on Wi-Fi availability.

When batteries run low, the app alerts the user. Should the user not respond, the drone will find a flat place to land itself. After the flight — which can last about 16 minutes, depending on speed and use — users can store captured video or upload it to social media.

Through the app, users can also switch between several cinematic modes. For instance, with “stadium mode,” for field sports, the drone stays above and moves around the action, following selected subjects. Users can also direct the drone where to fly (in front, to the side, or constantly orbiting). “These are areas we’re now working on to add more capabilities,” Bry says.

The lightweight drone can fit into an average backpack and runs about $2,500.

Skydio takes wing

Bry came to MIT in 2009, “when it was first possible to take a [hobby] airplane and put super powerful computers and sensors on it,” he says.

He joined the Robust Robotics Group, led by Nick Roy, an expert in drone autonomy. There, he met Bacharach, now Skydio’s chief technology officer, who that year was on a team that won the Association for Unmanned Vehicles International contest with an autonomous minihelicopter that navigated the aftermath of a mock nuclear meltdown. Donahoe was a friend and graduate student at the MIT Media Lab at the time.

In 2012, Bry and Bacharach helped develop autonomous-control algorithms that could calculate a plane’s trajectory and determine its “state” — its location, physical orientation, velocity, and acceleration. In a series of test flights, a drone running their algorithms maneuvered around pillars in the parking garage under MIT’s Stata Center and through the Johnson Athletic Center.

These experiences were the seeds of Skydio, Bry says: “The foundation of the [Skydio] technology, and how all the technology works and the recipe for how all of it comes together, all started at MIT.”

After graduation, in 2012, Bry and Bacharach took jobs in industry, landing at Google’s Project Wing delivery-drone initiative — a couple years before Roy was tapped by Google to helm the project. Seeing a need for autonomy in drones, in 2014, Bry, Bacharach, and Donahoe founded Skydio to fulfill a vision that “drones [can have] enormous potential across industries and applications,” Bry says.

For the first year, the three co-founders worked out of Bacharach’s dad’s basement, getting “free rent in exchange for helping out with yard work,” Bry says. Working with off-the-shelf hardware, the team built a “pretty ugly” prototype. “We started with a [quadcopter] frame and put a media center computer on it and a USB camera. Duct tape was holding everything together,” Bry says.

But that prototype landed the startup a seed round of $3 million in 2015. Additional funding rounds over the next few years — more than $70 million in total — helped the startup hire engineers from MIT, Google, Apple, Tesla, and other top tech firms.

Over the years, the startup refined the drone and tested it in countries around the world — experimenting with high and low altitudes, heavy snow, fast winds, and extreme high and low temperatures. “We’ve really tried to bang on the system pretty hard to validate it,” Bry says.

Athletes, artists, inspections

Early buyers of Skydio’s first product are primarily athletes and outdoor enthusiasts who record races, training, or performances. For instance, Skydio has worked with Mikel Thomas, Olympic hurdler from Trinidad and Tobago, who used the R1 to analyze his form.

Artists, however, are also interested, Bry adds: “There’s a creative element to it. We’ve had people make music videos. It was themselves in a driveway or forest. They dance and move around and the camera will respond to them and create cool content that would otherwise be impossible to get.”

In the future, Skydio hopes to find other applications, such as inspecting commercial real estate, power lines, and energy infrastructure for damage. “People have talked about using drones for these things, but they have to be manually flown and it’s not scalable or reliable,” Bry says. “We’re going in the direction of sleek, birdlike devices that are quiet, reliable, and intelligent, and that people are comfortable using on a daily basis.”

ML 2.0: Machine learning for many

“As the momentum builds, developers will be able to set up a ML [machine learning] apparatus just as they set up a database,” says Max Kanter, CEO at Feature Labs. “It will be that simple.”
Courtesy of the Laboratory for Information and Decision Systems

Today, when an enterprise wants to use machine learning to solve a problem, they have to call in the cavalry. Even a simple problem requires multiple data scientists, machine learning experts, and domain experts to come together to agree on priorities and exchange data and information.

This process is often inefficient, and it takes months to get results. It also only solves the problem immediate at hand. The next time something comes up, the enterprise has to do the same thing all over again.

One group of MIT researchers wondered, “What if we tried another strategy? What if we created automation tools that enable the subject matter experts to use ML, in order to solve these problems themselves?”

For the past five years, Kalyan Veeramachaneni, a principal research scientist at MIT’s Laboratory for Information and Decision Systems, along with Max Kanter and Ben Schreck who began working with Veeramachaneni as MIT students and later co-founded machine learning startup Feature Labs, has been designing a rigorous paradigm for applied machine learning.

The team first divided the process into a discrete set of steps. For instance, one step involved searching for buried patterns with predictive power, known as “feature engineering.” Another is called “model selection,” in which the best modeling technique is chosen from the many available options. They then automated these steps, releasing open-source tools to help domain experts efficiently complete them.

In their new paper, “Machine Learning 2.0: Engineering Data Driven AI Products,” the team brings together these automation tools, turning raw data into a trustworthy, deployable model over the course of seven steps. This chain of automation makes it possible for subject matter experts — even those without data science experience — to use machine learning to solve business problems.

“Through automation, ML 2.0 frees up subject matter experts to spend more time on the steps that truly require their domain expertise, like deciding which problems to solve in the first place and evaluating how predictions impact business outcomes,” says Schreck.

Last year, Accenture joined the MIT and Feature Labs team to undertake an ambitious project — build an AI project manager by developing and deploying a machine learning model that could predict critical problems ahead of time and augment seasoned human project managers in the software industry.

This was an opportunity to test ML 2.0’s automation tool, Featuretools, an open-source library funded by DARPA’s Data-Driven Discovery of Models (D3M) program, on a real-world problem.

Veeramachaneni and his colleagues closely collaborated with domain experts from Accenture along every step, from figuring out the best problem to solve, to running through a robust gauntlet of testing. The first model the team built was to predict the performance of software projects against a host of delivery metrics. When testing was completed, the model was found to correctly predict more than 80 percent of project performance outcomes.

Using Featuretools involved a series of human-machine interactions. In this case, Featuretools first recommended 40,000 features to the domain experts. Next, the humans used their expertise to narrow this list down to the 100 most promising features, which they then put to work training the machine-learning algorithm.

Next, the domain experts used the software to simulate using the model, and test how well it would work as new, real-time data came in. This method also extends the “train-test-validate” protocol typical to contemporary machine-learning research, making it more applicable to real-world use. The model was then deployed making predictions for hundreds of projects on a weekly basis.

“We wanted to apply machine learning (ML) to critical problems that we face in the technology services business,” says Sanjeev Vohra, global technology officer, Accenture Technology. “More specifically, we wanted to see for ourselves if MIT’s ML 2.0 could help anticipate potential risks in software delivery. We are very happy with the outcomes, and will be sharing them broadly so others can also benefit.”

In a separate joint paper, “The AI Project Manager,” the teams walk through how they used the ML 2.0 paradigm to achieve fast and accurate predictions.

“For 20 years, the task of applying machine learning to problems has been approached as a research or feasibility project, or an opportunity to make a discovery,” says Veeramachaneni. “With these new automation tools it is now possible to create a machine learning model from raw data and put them to use — within weeks,” says Veeramachaneni.

The team intends to keep honing ML 2.0 in order to make it relevant to as many industry problems as possible. “This is the true idea behind democratizing machine learning. We want to make ML useful to a broad swath of people,” he adds.

In the next five years, we are likely to see an increase in the adoption of ML 2.0. “As the momentum builds, developers will be able to set up a ML apparatus just as they set up a database,” says Max Kanter, CEO at Feature Labs. “It will be that simple.”

Custom carpentry with help from robots

PhD student Adriana Schulz was co-lead on AutoSaw, which lets nonexperts customize different items that can then be constructed with the help of robots.
Photo: Jason Dorfman, MIT CSAIL

By Adam Conner-Simons and Rachel Gordon

Every year thousands of carpenters injure their hands and fingers doing dangerous tasks such as sawing.

In an effort to minimize injury and let carpenters focus on design and other bigger-picture tasks, a team from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has created AutoSaw, a system that lets nonexperts customize different items that can then be constructed with the help of robots.

Users can choose from a range of carpenter-designed templates for chairs, desks, and other furniture. The team says that AutoSaw could eventually be used for projects as large as a deck or a porch.

“If you’re building a deck, you have to cut large sections of lumber to length, and that’s often done on site,” says CSAIL postdoc Jeffrey Lipton, who was a lead author on a related paper about the system. “Every time you put a hand near a blade, you’re at risk. To avoid that, we’ve largely automated the process using a chop-saw and jigsaw.”

The system also offers flexibility for designing furniture to fit space-constrained houses and apartments. For example, it could allow a user to modify a desk to squeeze into an L-shaped living room, or customize a table to fit in a microkitchen.  

“Robots have already enabled mass production, but with artificial intelligence (AI) they have the potential to enable mass customization and personalization in almost everything we produce,” says CSAIL director and co-author Daniela Rus. “AutoSaw shows this potential for easy access and customization in carpentry.”

The paper, which will be presented in May at the International Conference on Robotics and Automation (ICRA) in Brisbane, Australia, was co-written by Lipton, Rus, and PhD student Adriana Schulz. Other co-authors include MIT Professor Wojciech Matusik, PhD student Andrew Spielberg, and undergraduate Luis Trueba.

How it works

Software isn’t a foreign concept for some carpenters. “Computer Numerical Control” (CNC) can convert designs into numbers that are fed to specially programmed tools to execute. However, the machines used for CNC fabrication are usually large and cumbersome, and users are limited to the size of the existing CNC tools.

As a result, many carpenters continue to use chop-saws, jigsaws, and other hand tools that are low cost, easy to move, and simple to use. These tools, while useful for customization, still put people at a high risk of injury.

AutoSaw draws on expert knowledge for designing, and robotics for the more risky cutting tasks. Using the existing CAD system OnShape with an interface of design templates, users can customize their furniture for things like size, sturdiness, and aesthetics. Once the design is finalized, it’s sent to the robots to assist in the cutting process using the jigsaw and chop-saw.

To cut lumber the team used motion-tracking software and small mobile robots — an approach that takes up less space and is more cost-effective than large robotic arms.

Specifically, the team used a modified Roomba with a jigsaw attached to cut lumber of any shape on a plank. For the chopping, the team used two Kuka youBots to lift the beam, place it on the chop saw, and cut.

“We added soft grippers to the robots to give them more flexibility, like that of a human carpenter,” says Lipton. “This meant we could rely on the accuracy of the power tools instead of the rigid-bodied robots.”

After the robots finish with cutting, the user then assembles the new piece of furniture using step-by-step directions from the system.

Democratizing custom furniture

When testing the system, the teams’ simulations showed that they could build a chair, shed, and deck. Using the robots, the team also made a table with an accuracy comparable to that of a human, without a real hand ever getting near a blade.

“There have been many recent AI achievements in virtual environments, like playing Go and composing music,” says Hod Lipson, a professor of mechanical engineering and data science at Columbia University. “Systems that can work in unstructured physical environments, such as this carpentry system, are notoriously difficult to make. This is truly a fascinating step forward.”

While AutoSaw is still a research platform, in the future the team plans to use materials such as wood, and integrate complex tasks such as drilling and gluing.

“Our aim is to democratize furniture-customization,” says Schulz. “We’re trying to open up a realm of opportunities so users aren’t bound to what they’ve bought at Ikea. Instead, they can make what best fits their needs.”

The project was supported in part by the National Science Foundation.

Robo-picker grasps and packs

The “pick-and-place” system consists of a standard industrial robotic arm that the researchers outfitted with a custom gripper and suction cup. They developed an “object-agnostic” grasping algorithm that enables the robot to assess a bin of random objects and determine the best way to grip or suction onto an item amid the clutter, without having to know anything about the object before picking it up.
Image: Melanie Gonick/MIT
By Jennifer Chu

Unpacking groceries is a straightforward albeit tedious task: You reach into a bag, feel around for an item, and pull it out. A quick glance will tell you what the item is and where it should be stored.

Now engineers from MIT and Princeton University have developed a robotic system that may one day lend a hand with this household chore, as well as assist in other picking and sorting tasks, from organizing products in a warehouse to clearing debris from a disaster zone.

The team’s “pick-and-place” system consists of a standard industrial robotic arm that the researchers outfitted with a custom gripper and suction cup. They developed an “object-agnostic” grasping algorithm that enables the robot to assess a bin of random objects and determine the best way to grip or suction onto an item amid the clutter, without having to know anything about the object before picking it up.

Once it has successfully grasped an item, the robot lifts it out from the bin. A set of cameras then takes images of the object from various angles, and with the help of a new image-matching algorithm the robot can compare the images of the picked object with a library of other images to find the closest match. In this way, the robot identifies the object, then stows it away in a separate bin.

In general, the robot follows a “grasp-first-then-recognize” workflow, which turns out to be an effective sequence compared to other pick-and-place technologies.

“This can be applied to warehouse sorting, but also may be used to pick things from your kitchen cabinet or clear debris after an accident. There are many situations where picking technologies could have an impact,” says Alberto Rodriguez, the Walter Henry Gale Career Development Professor in Mechanical Engineering at MIT.

Rodriguez and his colleagues at MIT and Princeton will present a paper detailing their system at the IEEE International Conference on Robotics and Automation, in May. 

Building a library of successes and failures

While pick-and-place technologies may have many uses, existing systems are typically designed to function only in tightly controlled environments.

Today, most industrial picking robots are designed for one specific, repetitive task, such as gripping a car part off an assembly line, always in the same, carefully calibrated orientation. However, Rodriguez is working to design robots as more flexible, adaptable, and intelligent pickers, for unstructured settings such as retail warehouses, where a picker may consistently encounter and have to sort hundreds, if not thousands of novel objects each day, often amid dense clutter.

The team’s design is based on two general operations: picking — the act of successfully grasping an object, and perceiving — the ability to recognize and classify an object, once grasped.   

The researchers trained the robotic arm to pick novel objects out from a cluttered bin, using any one of four main grasping behaviors: suctioning onto an object, either vertically, or from the side; gripping the object vertically like the claw in an arcade game; or, for objects that lie flush against a wall, gripping vertically, then using a flexible spatula to slide between the object and the wall.

Rodriguez and his team showed the robot images of bins cluttered with objects, captured from the robot’s vantage point. They then showed the robot which objects were graspable, with which of the four main grasping behaviors, and which were not, marking each example as a success or failure. They did this for hundreds of examples, and over time, the researchers built up a library of picking successes and failures. They then incorporated this library into a “deep neural network” — a class of learning algorithms that enables the robot to match the current problem it faces with a successful outcome from the past, based on its library of successes and failures.

“We developed a system where, just by looking at a tote filled with objects, the robot knew how to predict which ones were graspable or suctionable, and which configuration of these picking behaviors was likely to be successful,” Rodriguez says. “Once it was in the gripper, the object was much easier to recognize, without all the clutter.”

From pixels to labels

The researchers developed a perception system in a similar manner, enabling the robot to recognize and classify an object once it’s been successfully grasped.

To do so, they first assembled a library of product images taken from online sources such as retailer websites. They labeled each image with the correct identification — for instance, duct tape versus masking tape — and then developed another learning algorithm to relate the pixels in a given image to the correct label for a given object.

“We’re comparing things that, for humans, may be very easy to identify as the same, but in reality, as pixels, they could look significantly different,” Rodriguez says. “We make sure that this algorithm gets it right for these training examples. Then the hope is that we’ve given it enough training examples that, when we give it a new object, it will also predict the correct label.”

Last July, the team packed up the 2-ton robot and shipped it to Japan, where, a month later, they reassembled it to participate in the Amazon Robotics Challenge, a yearly competition sponsored by the online megaretailer to encourage innovations in warehouse technology. Rodriguez’s team was one of 16 taking part in a competition to pick and stow objects from a cluttered bin.

In the end, the team’s robot had a 54 percent success rate in picking objects up using suction and a 75 percent success rate using grasping, and was able to recognize novel objects with 100 percent accuracy. The robot also stowed all 20 objects within the allotted time.

For his work, Rodriguez was recently granted an Amazon Research Award and will be working with the company to further improve pick-and-place technology — foremost, its speed and reactivity.

“Picking in unstructured environments is not reliable unless you add some level of reactiveness,” Rodriguez says. “When humans pick, we sort of do small adjustments as we are picking. Figuring out how to do this more responsive picking, I think, is one of the key technologies we’re interested in.”

The team has already taken some steps toward this goal by adding tactile sensors to the robot’s gripper and running the system through a new training regime.

“The gripper now has tactile sensors, and we’ve enabled a system where the robot spends all day continuously picking things from one place to another. It’s capturing information about when it succeeds and fails, and how it feels to pick up, or fails to pick up objects,” Rodriguez says. “Hopefully it will use that information to start bringing that reactiveness to grasping.”

This research was sponsored in part by ABB Inc., Mathworks, and Amazon.

Programming drones to fly in the face of uncertainty

Researchers trail a drone on a test flight outdoors.
Photo: Jonathan How/MIT

Companies like Amazon have big ideas for drones that can deliver packages right to your door. But even putting aside the policy issues, programming drones to fly through cluttered spaces like cities is difficult. Being able to avoid obstacles while traveling at high speeds is computationally complex, especially for small drones that are limited in how much they can carry onboard for real-time processing.

Many existing approaches rely on intricate maps that aim to tell drones exactly where they are relative to obstacles, which isn’t particularly practical in real-world settings with unpredictable objects. If their estimated location is off by even just a small margin, they can easily crash.

With that in mind, a team from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has developed NanoMap, a system that allows drones to consistently fly 20 miles per hour through dense environments such as forests and warehouses.

One of NanoMap’s key insights is a surprisingly simple one: The system considers the drone’s position in the world over time to be uncertain, and actually models and accounts for that uncertainty.

“Overly confident maps won’t help you if you want drones that can operate at higher speeds in human environments,” says graduate student Pete Florence, lead author on a new related paper. “An approach that is better aware of uncertainty gets us a much higher level of reliability in terms of being able to fly in close quarters and avoid obstacles.”

Specifically, NanoMap uses a depth-sensing system to stitch together a series of measurements about the drone’s immediate surroundings. This allows it to not only make motion plans for its current field of view, but also anticipate how it should move around in the hidden fields of view that it has already seen.

“It’s kind of like saving all of the images you’ve seen of the world as a big tape in your head,” says Florence. “For the drone to plan motions, it essentially goes back into time to think individually of all the different places that it was in.”

The team’s tests demonstrate the impact of uncertainty. For example, if NanoMap wasn’t modeling uncertainty and the drone drifted just 5 percent away from where it was expected to be, the drone would crash more than once every four flights. Meanwhile, when it accounted for uncertainty, the crash rate reduced to 2 percent.

The paper was co-written by Florence and MIT Professor Russ Tedrake alongside research software engineers John Carter and Jake Ware. It was recently accepted to the IEEE International Conference on Robotics and Automation, which takes place in May in Brisbane, Australia.

For years computer scientists have worked on algorithms that allow drones to know where they are, what’s around them, and how to get from one point to another. Common approaches such as simultaneous localization and mapping (SLAM) take raw data of the world and convert them into mapped representations.

But the output of SLAM methods aren’t typically used to plan motions. That’s where researchers often use methods like “occupancy grids,” in which many measurements are incorporated into one specific representation of the 3-D world.

The problem is that such data can be both unreliable and hard to gather quickly. At high speeds, computer-vision algorithms can’t make much of their surroundings, forcing drones to rely on inexact data from the inertial measurement unit (IMU) sensor, which measures things like the drone’s acceleration and rate of rotation.

The way NanoMap handles this is that it essentially doesn’t sweat the minor details. It operates under the assumption that, to avoid an obstacle, you don’t have to take 100 different measurements and find the average to figure out its exact location in space; instead, you can simply gather enough information to know that the object is in a general area.

“The key difference to previous work is that the researchers created a map consisting of a set of images with their position uncertainty rather than just a set of images and their positions and orientation,” says Sebastian Scherer, a systems scientist at Carnegie Mellon University’s Robotics Institute. “Keeping track of the uncertainty has the advantage of allowing the use of previous images even if the robot doesn’t know exactly where it is and allows in improved planning.”

Florence describes NanoMap as the first system that enables drone flight with 3-D data that is aware of “pose uncertainty,” meaning that the drone takes into consideration that it doesn’t perfectly know its position and orientation as it moves through the world. Future iterations might also incorporate other pieces of information, such as the uncertainty in the drone’s individual depth-sensing measurements.

NanoMap is particularly effective for smaller drones moving through smaller spaces, and works well in tandem with a second system that is focused on more long-horizon planning. (The researchers tested NanoMap last year in a program tied to the Defense Advanced Research Projects Agency, or DARPA.)

The team says that the system could be used in fields ranging from search and rescue and defense to package delivery and entertainment. It can also be applied to self-driving cars and other forms of autonomous navigation.

“The researchers demonstrated impressive results avoiding obstacles and this work enables robots to quickly check for collisions,” says Scherer. “Fast flight among obstacles is a key capability that will allow better filming of action sequences, more efficient information gathering and other advances in the future.”

This work was supported in part by DARPA’s Fast Lightweight Autonomy program.

Robotic interiors

MIT Media Lab spinout Ori is developing smart robotic furniture that transforms into a bedroom, working or storage area, or large closet — or slides back against the wall — to optimize space in small apartments.
Courtesy of Ori

By Rob Matheson

Imagine living in a cramped studio apartment in a large city — but being able to summon your bed or closet through a mobile app, call forth your desk using voice command, or have everything retract at the push of a button.

MIT Media Lab spinout Ori aims to make that type of robotic living a reality. The Boston-based startup is selling smart robotic furniture that transforms into a bedroom, working or storage area, or large closet — or slides back against the wall — to optimize space in small apartments.

Based on years of Media Lab work, Ori’s system is an L-shaped unit installed on a track along a wall, so can slide back and forth. One side features a closet, a small fold-out desk, and several drawers and large cubbies. At the bottom is a pull-out bed. The other side of the unit includes a horizontal surface that can open out to form a table. The vertical surface above that features a large nook where a television can be placed, and additional drawers and cubbies. The third side, opposite the wall, contains still more shelving, and pegs to hang coats and other items.

Users control the unit through a control hub plugged into a wall, or through Ori’s mobile app or a smart home system, such as Amazon’s Echo.

Essentially, a small studio can at any time become a bedroom, lounge, walk-in closet, or living and working area, says Ori founder and CEO Hasier Larrea SM ’15. “We use robotics to … make small spaces act like they were two or three times bigger,” he says. “Around 200 square feet seems too small [total area] to live in, but a 200-square-foot bedroom or living room doesn’t seem so small.” Larrea was named to Forbes’ 2017 30 Under 30 list for his work with Ori.

The first commercial line of the systems, which goes for about $10,000, is now being sold to real estate developers in Boston and other major cities across the U.S. and Canada, for newly built or available apartments. In Boston, partners include Skanska, which has apartments in the Seaport; Samuels and Associates, with buildings around Harvard Square; and Hines for its Marina Bay units. Someday, Larrea says, the system could be bought directly by consumers.

Once the system catches on and the technology evolves, Larrea imagines future apartments could be furnished entirely with robotic furniture from Ori and other companies.

“These technologies can evolve for kitchens, bathrooms, and general partition walls. At some point, a two-bedroom apartment could turn into a large studio, transform into three rooms for your startup, or go into ‘party mode,’ where it all opens up again,” Larrea says. “Spaces will adapt to us, instead of us adapting to spaces, which is what we’ve been doing for so many years.”

Architectural robotics

In 2011, Larrea joined the Media Lab’s City Science research group, directed by Principal Research Scientist Kent Larson, which included his three co-founders: Chad Bean ’14, Carlos Rubio ’14, and Ivan Fernandez de Casadevante, who was a visiting researcher.

The group’s primary focus was tackling challenges of mass urbanization, as cities are becoming increasingly popular living destinations. “Data tells us that, in places like China and India, 600 million people will move from towns to cities in the next 15 years,” Larrea says. “Not only is the way we move through cities and feed people going to need to evolve, but so will the way people live and work in spaces.”

A second emerging phenomenon was the Internet of Things, which saw an influx of smart gadgets, including household items and furniture, designed to connect to the Internet. “Those two megatrends were bound to converge,” Larrea says.

The group started a project called CityHome, creating what it called “architectural robotics,” which integrated robotics, architecture, computer science, and engineering to design smart, modular furniture. The group prototyped a moveable wall that could be controlled via gesture control — which looked similar to today’s Ori system — and constructed a mock 200-square-foot studio apartment on the fifth floor of the Media Lab to test it out. Within the group, the unit was called “furniture with superpowers,” Larrea says, as it made small spaces seem bigger.

After they had constructed their working prototype, in early 2015 the researchers wanted to scale up. Inspiration came from the Media Lab-LEGO MindStorms collaboration from the late 1990s, where researchers created kits that incorporated sensors and motors inside traditional LEGO bricks so kids could build robots and researchers could prototype.

Drawing from that concept, the group built standardized components that could be assembled into a larger piece of modular furniture — what Ori now calls the robotic “muscle,” “skeleton,” “brains,” and the furniture “skins.” Specifically, the muscle consists of the track, motors, and electronics that actuate the system. The skeleton is the frame and the wheels that give the unit structure and movement. The brain is the microcomputer that controls all the safety features and connects the device to the Internet. And the skin is the various pieces of furniture that can be integrated, using the same robotic architecture.

Today, units fit full- or queen-size mattresses and come in different colors. In the future, however, any type of furniture could be integrated, creating units of various shapes, sizes, uses, and price. “The robotics will keep evolving but stay standardized … so, by adding different skins, you can really create anything you can imagine,” Larrea says.

Kickstarting Ori

Going through the Martin Trust Center for MIT Entrepreneurship’s summer accelerator delta V (then called the Global Founders Skills Accelerator) in 2015 “kickstarted” the startup, Larrea says. One lesson that particularly stood out: the importance of conducting market research. “At MIT, sometimes we assume, because we have such a cool technology, marketing it will be easy. … But we forget to talk to people,” he says.

In the early days, the co-founders put tech development aside to speak with owners of studios, offices, and hotels, as well as tenants. In doing so, they learned studio renters in particular had three major complaints: Couples wanted separate living areas, and everyone wanted walk-in closets and space to host parties. The startup then focused on developing a furniture unit that addressed those issues.

After earning one of its first investors in the Media Lab’s E14 Fund in fall 2015, the startup installed an early version of its system in several Boston apartments for renters to test and provide feedback. Soon after, the system hit apartments in 10 major cities across the U.S. and Canada, including San Francisco, Vancouver, Chicago, Miami, and New York. Over the past two years, the startup has used feedback from those pilots to refine the system into today’s commercial model.

Ori will ship an initial production run of 500 units for apartments over the next few months. Soon, Larrea says, the startup also aims to penetrate adjacent markets, such as hotels, dormitories, and offices. “The idea is to prove this isn’t a one-trick pony,” Larrea says. “It’s part of a more comprehensive strategy to unlock the potential of space.”

3Q: Daron Acemoglu on technology and the future of work

K. Daron Acemoglu, the Elizabeth and James Killian Professor of Economics at MIT, is a leading thinker on the labor market implications of artificial intelligence, robotics, automation, and new technologies.
Photo: Jared Charney

By Meg Murphy
K. Daron Acemoglu, the Elizabeth and James Killian Professor of Economics at MIT, is a leading thinker on the labor market implications of artificial intelligence, robotics, automation, and new technologies. His innovative work challenges the way people think about these technologies intersect with the world of work. In 2005, he won the John Bates Clark Medal, an honor shared by a number of Nobel Prize recipients and luminaries in the field of economics.

Acemoglu holds a bachelor’s degree in economics from University of York. His master’s degree in mathematical economics and econometrics and doctorate in economics are from the London School of Economics. With political scientist James Robinson, Acemoglu co-authored the much discussed book “Why Nations Fail” (Crown Business, 2012) and “Economic Origins of Dictatorship and Democracy” (Cambridge University Press, 2006). He also wrote the book, “Introduction to Modern Economic Growth” (Princeton University Press, 2008). Acemoglu recently answered a few questions about technology and work.

Q: How do we begin to understand the rise of artificial intelligence and its future impact on society?

A: We need to look to the past in the face of modern innovations in machine learning, robotics, artificial intelligence, big data, and beyond. The process of machines replacing labor in the production process is not a new one. It’s been going on pretty much continuously since the Industrial Revolution. Spinning and weaving machines took jobs away from spinners and weavers. One innovation would follow another, and people would be thrown out of work by a machine performing the job in a cheaper way.

But at the end of the day, the Industrial Revolution and its aftermath created much better opportunities for people. For much of the 20th century in the U.S., workers’ wages and employment kept growing. New occupations and new tasks and new jobs were generated within the framework of new technological knowledge. A huge number of occupations in the American economy today did not exist 50 years ago — radiologists, management consultants, software developers, and so on. Go back a century and most of the white-collar jobs today did not exist.

Q:  Do you think public fears about the future of work are just?

A: The way we live continuously changes in significant ways — how we learn, how we acquire food, what we emphasize, our social organizations.

Our adjustments to technology — especially transformative technologies — are not a walk in the park. It is not going to be easy and seamless and just sort itself out. A lot of historical evidence shows the process is a painful one. The mechanization of agriculture is one of the greatest achievement of the American economy but it was hugely disruptive for millions of people who suffered joblessness.

At the same time, we are capable technologically and socially of creating many new jobs that will take people to new horizons in terms of productivity and freedom from the hardest types of manual labor. There are great opportunities with artificial intelligence but whether or not we exploit them is a different question. I think you should never be too optimistic but neither should you be too pessimistic.

Q: How do you suggest people prepare for the future job market?

A: We are very much in the midst of understanding what sort of process we are going through. We don’t even necessarily know what skills are needed for the jobs of the future.

Imagine one scenario. Artificial intelligence removes the need for seasoned accountants to fulfill numeracy-related tasks. But we need tax professionals, for instance, to inform clients about their choices and options in some sort of emphatic human way. They will have to become the interface between the machines and the customers. The jobs of the future, in this instance and many others, would require communications, flexibility, and social skills.

However, I don’t know if my hypothesis is true because we haven’t tested it. We haven’t lived through it. I see the biggest void in our knowledge. People at institutions like MIT must learn more about what’s is going on so that we are better prepared to understand the future.

Engineers design artificial synapse for “brain-on-a-chip” hardware

From left: MIT researchers Scott H. Tan, Jeehwan Kim, and Shinhyun Choi
Image: Kuan Qiao

By Jennifer Chu

When it comes to processing power, the human brain just can’t be beat.

Packed within the squishy, football-sized organ are somewhere around 100 billion neurons. At any given moment, a single neuron can relay instructions to thousands of other neurons via synapses — the spaces between neurons, across which neurotransmitters are exchanged. There are more than 100 trillion synapses that mediate neuron signaling in the brain, strengthening some connections while pruning others, in a process that enables the brain to recognize patterns, remember facts, and carry out other learning tasks, at lightning speeds.

Researchers in the emerging field of “neuromorphic computing” have attempted to design computer chips that work like the human brain. Instead of carrying out computations based on binary, on/off signaling, like digital chips do today, the elements of a “brain on a chip” would work in an analog fashion, exchanging a gradient of signals, or “weights,” much like neurons that activate in various ways depending on the type and number of ions that flow across a synapse.

In this way, small neuromorphic chips could, like the brain, efficiently process millions of streams of parallel computations that are currently only possible with large banks of supercomputers. But one significant hangup on the way to such portable artificial intelligence has been the neural synapse, which has been particularly tricky to reproduce in hardware.

Now engineers at MIT have designed an artificial synapse in such a way that they can precisely control the strength of an electric current flowing across it, similar to the way ions flow between neurons. The team has built a small chip with artificial synapses, made from silicon germanium. In simulations, the researchers found that the chip and its synapses could be used to recognize samples of handwriting, with 95 percent accuracy.

The design, published today in the journal Nature Materials, is a major step toward building portable, low-power neuromorphic chips for use in pattern recognition and other learning tasks.

The research was led by Jeehwan Kim, the Class of 1947 Career Development Assistant Professor in the departments of Mechanical Engineering and Materials Science and Engineering, and a principal investigator in MIT’s Research Laboratory of Electronics and Microsystems Technology Laboratories. His co-authors are Shinhyun Choi (first author), Scott Tan (co-first author), Zefan Li, Yunjo Kim, Chanyeol Choi, and Hanwool Yeon of MIT, along with Pai-Yu Chen and Shimeng Yu of Arizona State University.

Too many paths

Most neuromorphic chip designs attempt to emulate the synaptic connection between neurons using two conductive layers separated by a “switching medium,” or synapse-like space. When a voltage is applied, ions should move in the switching medium to create conductive filaments, similarly to how the “weight” of a synapse changes.

But it’s been difficult to control the flow of ions in existing designs. Kim says that’s because most switching mediums, made of amorphous materials, have unlimited possible paths through which ions can travel — a bit like Pachinko, a mechanical arcade game that funnels small steel balls down through a series of pins and levers, which act to either divert or direct the balls out of the machine.

Like Pachinko, existing switching mediums contain multiple paths that make it difficult to predict where ions will make it through. Kim says that can create unwanted nonuniformity in a synapse’s performance.

“Once you apply some voltage to represent some data with your artificial neuron, you have to erase and be able to write it again in the exact same way,” Kim says. “But in an amorphous solid, when you write again, the ions go in different directions because there are lots of defects. This stream is changing, and it’s hard to control. That’s the biggest problem — nonuniformity of the artificial synapse.”

A perfect mismatch

Instead of using amorphous materials as an artificial synapse, Kim and his colleagues looked to single-crystalline silicon, a defect-free conducting material made from atoms arranged in a continuously ordered alignment. The team sought to create a precise, one-dimensional line defect, or dislocation, through the silicon, through which ions could predictably flow.

To do so, the researchers started with a wafer of silicon, resembling, at microscopic resolution, a chicken-wire pattern. They then grew a similar pattern of silicon germanium — a material also used commonly in transistors — on top of the silicon wafer. Silicon germanium’s lattice is slightly larger than that of silicon, and Kim found that together, the two perfectly mismatched materials can form a funnel-like dislocation, creating a single path through which ions can flow. 

The researchers fabricated a neuromorphic chip consisting of artificial synapses made from silicon germanium, each synapse measuring about 25 nanometers across. They applied voltage to each synapse and found that all synapses exhibited more or less the same current, or flow of ions, with about a 4 percent variation between synapses — a much more uniform performance compared with synapses made from amorphous material.

They also tested a single synapse over multiple trials, applying the same voltage over 700 cycles, and found the synapse exhibited the same current, with just 1 percent variation from cycle to cycle.

“This is the most uniform device we could achieve, which is the key to demonstrating artificial neural networks,” Kim says.

Writing, recognized

As a final test, Kim’s team explored how its device would perform if it were to carry out actual learning tasks — specifically, recognizing samples of handwriting, which researchers consider to be a first practical test for neuromorphic chips. Such chips would consist of “input/hidden/output neurons,” each connected to other “neurons” via filament-based artificial synapses.

Scientists believe such stacks of neural nets can be made to “learn.” For instance, when fed an input that is a handwritten ‘1,’ with an output that labels it as ‘1,’ certain output neurons will be activated by input neurons and weights from an artificial synapse. When more examples of handwritten ‘1s’ are fed into the same chip, the same output neurons may be activated when they sense similar features between different samples of the same letter, thus “learning” in a fashion similar to what the brain does.

Kim and his colleagues ran a computer simulation of an artificial neural network consisting of three sheets of neural layers connected via two layers of artificial synapses, the properties of which they based on measurements from their actual neuromorphic chip. They fed into their simulation tens of thousands of samples from a handwritten recognition dataset commonly used by neuromorphic designers, and found that their neural network hardware recognized handwritten samples 95 percent of the time, compared to the 97 percent accuracy of existing software algorithms.

The team is in the process of fabricating a working neuromorphic chip that can carry out handwriting-recognition tasks, not in simulation but in reality. Looking beyond handwriting, Kim says the team’s artificial synapse design will enable much smaller, portable neural network devices that can perform complex computations that currently are only possible with large supercomputers.

“Ultimately we want a chip as big as a fingernail to replace one big supercomputer,” Kim says. “This opens a stepping stone to produce real artificial hardware.”

This research was supported in part by the National Science Foundation.

Computer systems predict objects’ responses to physical forces

As part of an investigation into the nature of humans’ physical intuitions, MIT researchers trained a neural network to predict how unstably stacked blocks would respond to the force of gravity.
Image: Christine Daniloff/MIT

Josh Tenenbaum, a professor of brain and cognitive sciences at MIT, directs research on the development of intelligence at the Center for Brains, Minds, and Machines, a multiuniversity, multidisciplinary project based at MIT that seeks to explain and replicate human intelligence.

Presenting their work at this year’s Conference on Neural Information Processing Systems, Tenenbaum and one of his students, Jiajun Wu, are co-authors on four papers that examine the fundamental cognitive abilities that an intelligent agent requires to navigate the world: discerning distinct objects and inferring how they respond to physical forces.

By building computer systems that begin to approximate these capacities, the researchers believe they can help answer questions about what information-processing resources human beings use at what stages of development. Along the way, the researchers might also generate some insights useful for robotic vision systems.

“The common theme here is really learning to perceive physics,” Tenenbaum says. “That starts with seeing the full 3-D shapes of objects, and multiple objects in a scene, along with their physical properties, like mass and friction, then reasoning about how these objects will move over time. Jiajun’s four papers address this whole space. Taken together, we’re starting to be able to build machines that capture more and more of people’s basic understanding of the physical world.”

Three of the papers deal with inferring information about the physical structure of objects, from both visual and aural data. The fourth deals with predicting how objects will behave on the basis of that data.

Two-way street

Something else that unites all four papers is their unusual approach to machine learning, a technique in which computers learn to perform computational tasks by analyzing huge sets of training data. In a typical machine-learning system, the training data are labeled: Human analysts will have, say, identified the objects in a visual scene or transcribed the words of a spoken sentence. The system attempts to learn what features of the data correlate with what labels, and it’s judged on how well it labels previously unseen data.

In Wu and Tenenbaum’s new papers, the system is trained to infer a physical model of the world — the 3-D shapes of objects that are mostly hidden from view, for instance. But then it works backward, using the model to resynthesize the input data, and its performance is judged on how well the reconstructed data matches the original data.

For instance, using visual images to build a 3-D model of an object in a scene requires stripping away any occluding objects; filtering out confounding visual textures, reflections, and shadows; and inferring the shape of unseen surfaces. Once Wu and Tenenbaum’s system has built such a model, however, it rotates it in space and adds visual textures back in until it can approximate the input data.

Indeed, two of the researchers’ four papers address the complex problem of inferring 3-D models from visual data. On those papers, they’re joined by four other MIT researchers, including William Freeman, the Perkins Professor of Electrical Engineering and Computer Science, and by colleagues at DeepMind, ShanghaiTech University, and Shanghai Jiao Tong University.

Divide and conquer

The researchers’ system is based on the influential theories of the MIT neuroscientist David Marr, who died in 1980 at the tragically young age of 35. Marr hypothesized that in interpreting a visual scene, the brain first creates what he called a 2.5-D sketch of the objects it contained — a representation of just those surfaces of the objects facing the viewer. Then, on the basis of the 2.5-D sketch — not the raw visual information about the scene — the brain infers the full, three-dimensional shapes of the objects.

“Both problems are very hard, but there’s a nice way to disentangle them,” Wu says. “You can do them one at a time, so you don’t have to deal with both of them at the same time, which is even harder.”

Wu and his colleagues’ system needs to be trained on data that include both visual images and 3-D models of the objects the images depict. Constructing accurate 3-D models of the objects depicted in real photographs would be prohibitively time consuming, so initially, the researchers train their system using synthetic data, in which the visual image is generated from the 3-D model, rather than vice versa. The process of creating the data is like that of creating a computer-animated film.

Once the system has been trained on synthetic data, however, it can be fine-tuned using real data. That’s because its ultimate performance criterion is the accuracy with which it reconstructs the input data. It’s still building 3-D models, but they don’t need to be compared to human-constructed models for performance assessment.

In evaluating their system, the researchers used a measure called intersection over union, which is common in the field. On that measure, their system outperforms its predecessors. But a given intersection-over-union score leaves a lot of room for local variation in the smoothness and shape of a 3-D model. So Wu and his colleagues also conducted a qualitative study of the models’ fidelity to the source images. Of the study’s participants, 74 percent preferred the new system’s reconstructions to those of its predecessors.

All that fall

In another of Wu and Tenenbaum’s papers, on which they’re joined again by Freeman and by researchers at MIT, Cambridge University, and ShanghaiTech University, they train a system to analyze audio recordings of an object being dropped, to infer properties such as the object’s shape, its composition, and the height from which it fell. Again, the system is trained to produce an abstract representation of the object, which, in turn, it uses to synthesize the sound the object would make when dropped from a particular height. The system’s performance is judged on the similarity between the synthesized sound and the source sound.

Finally, in their fourth paper, Wu, Tenenbaum, Freeman, and colleagues at DeepMind and Oxford University describe a system that begins to model humans’ intuitive understanding of the physical forces acting on objects in the world. This paper picks up where the previous papers leave off: It assumes that the system has already deduced objects’ 3-D shapes.

Those shapes are simple: balls and cubes. The researchers trained their system to perform two tasks. The first is to estimate the velocities of balls traveling on a billiard table and, on that basis, to predict how they will behave after a collision. The second is to analyze a static image of stacked cubes and determine whether they will fall and, if so, where the cubes will land.

Wu developed a representational language he calls scene XML that can quantitatively characterize the relative positions of objects in a visual scene. The system first learns to describe input data in that language. It then feeds that description to something called a physics engine, which models the physical forces acting on the represented objects. Physics engines are a staple of both computer animation, where they generate the movement of clothing, falling objects, and the like, and of scientific computing, where they’re used for large-scale physical simulations.

After the physics engine has predicted the motions of the balls and boxes, that information is fed to a graphics engine, whose output is, again, compared with the source images. As with the work on visual discrimination, the researchers train their system on synthetic data before refining it with real data.

In tests, the researchers’ system again outperformed its predecessors. In fact, in some of the tests involving billiard balls, it frequently outperformed human observers as well.

Reading a neural network’s mind

Neural nets are so named because they roughly approximate the structure of the human brain. Typically, they’re arranged into layers, and each layer consists of many simple processing units — nodes — each of which is connected to several nodes in the layers above and below. Data is fed into the lowest layer, whose nodes process it and pass it to the next layer. The connections between layers have different “weights,” which determine how much the output of any one node figures into the calculation performed by the next.
Image: Chelsea Turner/MIT

By Larry Hardesty

Neural networks, which learn to perform computational tasks by analyzing huge sets of training data, have been responsible for the most impressive recent advances in artificial intelligence, including speech-recognition and automatic-translation systems.

During training, however, a neural net continually adjusts its internal settings in ways that even its creators can’t interpret. Much recent work in computer science has focused on clever techniques for determining just how neural nets do what they do.

In several recent papers, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Qatar Computing Research Institute have used a recently developed interpretive technique, which had been applied in other areas, to analyze neural networks trained to do machine translation and speech recognition.

They find empirical support for some common intuitions about how the networks probably work. For example, the systems seem to concentrate on lower-level tasks, such as sound recognition or part-of-speech recognition, before moving on to higher-level tasks, such as transcription or semantic interpretation.

But the researchers also find a surprising omission in the type of data the translation network considers, and they show that correcting that omission improves the network’s performance. The improvement is modest, but it points toward the possibility that analysis of neural networks could help improve the accuracy of artificial intelligence systems.

“In machine translation, historically, there was sort of a pyramid with different layers,” says Jim Glass, a CSAIL senior research scientist who worked on the project with Yonatan Belinkov, an MIT graduate student in electrical engineering and computer science. “At the lowest level there was the word, the surface forms, and the top of the pyramid was some kind of interlingual representation, and you’d have different layers where you were doing syntax, semantics. This was a very abstract notion, but the idea was the higher up you went in the pyramid, the easier it would be to translate to a new language, and then you’d go down again. So part of what Yonatan is doing is trying to figure out what aspects of this notion are being encoded in the network.”

The work on machine translation was presented recently in two papers at the International Joint Conference on Natural Language Processing. On one, Belinkov is first author, and Glass is senior author, and on the other, Belinkov is a co-author. On both, they’re joined by researchers from the Qatar Computing Research Institute (QCRI), including Lluís Màrquez, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, and Stephan Vogel. Belinkov and Glass are sole authors on the paper analyzing speech recognition systems, which Belinkov presented at the Neural Information Processing Symposium last week.

Leveling down

Neural nets are so named because they roughly approximate the structure of the human brain. Typically, they’re arranged into layers, and each layer consists of many simple processing units — nodes — each of which is connected to several nodes in the layers above and below. Data are fed into the lowest layer, whose nodes process it and pass it to the next layer. The connections between layers have different “weights,” which determine how much the output of any one node figures into the calculation performed by the next.

During training, the weights between nodes are constantly readjusted. After the network is trained, its creators can determine the weights of all the connections, but with thousands or even millions of nodes, and even more connections between them, deducing what algorithm those weights encode is nigh impossible.

The MIT and QCRI researchers’ technique consists of taking a trained network and using the output of each of its layers, in response to individual training examples, to train another neural network to perform a particular task. This enables them to determine what task each layer is optimized for.

In the case of the speech recognition network, Belinkov and Glass used individual layers’ outputs to train a system to identify “phones,” distinct phonetic units particular to a spoken language. The “t” sounds in the words “tea,” “tree,” and “but,” for instance, might be classified as separate phones, but a speech recognition system has to transcribe all of them using the letter “t.” And indeed, Belinkov and Glass found that lower levels of the network were better at recognizing phones than higher levels, where, presumably, the distinction is less important.

Similarly, in an earlier paper, presented last summer at the Annual Meeting of the Association for Computational Linguistics, Glass, Belinkov, and their QCRI colleagues showed that the lower levels of a machine-translation network were particularly good at recognizing parts of speech and morphology — features such as tense, number, and conjugation.

Making meaning

But in the new paper, they show that higher levels of the network are better at something called semantic tagging. As Belinkov explains, a part-of-speech tagger will recognize that “herself” is a pronoun, but the meaning of that pronoun — its semantic sense — is very different in the sentences “she bought the book herself” and “she herself bought the book.” A semantic tagger would assign different tags to those two instances of “herself,” just as a machine translation system might find different translations for them in a given target language.

The best-performing machine-translation networks use so-called encoding-decoding models, so the MIT and QCRI researchers’ network uses it as well. In such systems, the input, in the source language, passes through several layers of the network — known as the encoder — to produce a vector, a string of numbers that somehow represent the semantic content of the input. That vector passes through several more layers of the network — the decoder — to yield a translation in the target language.

Although the encoder and decoder are trained together, they can be thought of as separate networks. The researchers discovered that, curiously, the lower layers of the encoder are good at distinguishing morphology, but the higher layers of the decoder are not. So Belinkov and the QCRI researchers retrained the network, scoring its performance according to not only accuracy of translation but also analysis of morphology in the target language. In essence, they forced the decoder to get better at distinguishing morphology.

Using this technique, they retrained the network to translate English into German and found that its accuracy increased by 3 percent. That’s not an overwhelming improvement, but it’s an indication that looking under the hood of neural networks could be more than an academic exercise.

Can artificial intelligence learn to scare us?

Just in time for Halloween, a research team from the MIT Media Lab’s Scalable Cooperation group has introduced Shelley: the world’s first artificial intelligence-human horror story collaboration.

Shelley, named for English writer Mary Shelley — best known as the author of “Frankenstein: or, the Modern Prometheus” — is a deep-learning powered artificial intelligence (AI) system that was trained on over 140,000 horror stories on Reddit’s infamous r/nosleep subreddit. She lives on Twitter, where every hour, @shelley_ai tweets out the beginning of a new horror story and the hashtag #yourturn to invite a human collaborator. Anyone is welcome to reply to the tweet with the next part of the story, then Shelley will reply again with the next part, and so on. The results are weird, fun, and unpredictable horror stories that represent both creativity and collaboration — traits that explore the limits of artificial intelligence and machine learning.

“Shelley is a combination of a multi-layer recurrent neural network and an online learning algorithm that learns from crowd’s feedback over time,” explains Pinar Yanardhag, the project’s lead researcher. “The more collaboration Shelley gets from people, the more and scarier stories she will write.”

Shelley starts stories based on the AI’s own learning dataset, but she responds directly to additions to the story from human contributors — which, in turn, adds to her knowledge base. Each completed story is then collected on the Shelley project website.

“Shelley’s creative mind has no boundaries,” the research team says. “She writes stories about a pregnant man who woke up in a hospital, a mouth on the floor with a calm smile, an entire haunted town, a faceless man on the mirror anything is possible!”

One final note on Shelley: The AI was trained on a subreddit filled with adult content, and the researchers have limited control over her — so parents beware.

Teleoperating robots with virtual reality


by Rachel Gordon
Consisting of a headset and hand controllers, CSAIL’s new VR system enables users to teleoperate a robot using an Oculus Rift headset.
Photo: Jason Dorfman/MIT CSAIL

Certain industries have traditionally not had the luxury of telecommuting. Many manufacturing jobs, for example, require a physical presence to operate machinery.

But what if such jobs could be done remotely? Last week researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) presented a virtual reality (VR) system that lets you teleoperate a robot using an Oculus Rift headset.

The system embeds the user in a VR control room with multiple sensor displays, making it feel like they’re inside the robot’s head. By using hand controllers, users can match their movements to the robot’s movements to complete various tasks.

“A system like this could eventually help humans supervise robots from a distance,” says CSAIL postdoc Jeffrey Lipton, who was the lead author on a related paper about the system. “By teleoperating robots from home, blue-collar workers would be able to tele-commute and benefit from the IT revolution just as white-collars workers do now.”  

The researchers even imagine that such a system could help employ increasing numbers of jobless video-gamers by “gameifying” manufacturing positions.

The team used the Baxter humanoid robot from Rethink Robotics, but said that it can work on other robot platforms and is also compatible with the HTC Vive headset.

Lipton co-wrote the paper with CSAIL Director Daniela Rus and researcher Aidan Fay. They presented the paper at the recent IEEE/RSJ International Conference on Intelligent Robots and Systems in Vancouver.

There have traditionally been two main approaches to using VR for teleoperation.

In a direct model, the user’s vision is directly coupled to the robot’s state. With these systems, a delayed signal could lead to nausea and headaches, and the user’s viewpoint is limited to one perspective.

In a cyber-physical model, the user is separate from the robot. The user interacts with a virtual copy of the robot and the environment. This requires much more data, and specialized spaces.

The CSAIL team’s system is halfway between these two methods. It solves the delay problem, since the user is constantly receiving visual feedback from the virtual world. It also solves the the cyber-physical issue of being distinct from the robot: Once a user puts on the headset and logs into the system, they’ll feel as if they’re inside Baxter’s head.

The system mimics the homunculus model of the mind — the idea that there’s a small human inside our brains controlling our actions, viewing the images we see, and understanding them for us. While it’s a peculiar idea for humans, for robots it fits: Inside the robot is a human in a virtual control room, seeing through its eyes and controlling its actions.

Using Oculus’ controllers, users can interact with controls that appear in the virtual space to open and close the hand grippers to pick up, move, and retrieve items. A user can plan movements based on the distance between the arm’s location marker and their hand while looking at the live display of the arm.

To make these movements possible, the human’s space is mapped into the virtual space, and the virtual space is then mapped into the robot space to provide a sense of co-location.

The system is also more flexible compared to previous systems that require many resources. Other systems might extract 2-D information from each camera, build out a full 3-D model of the environment, and then process and redisplay the data. In contrast, the CSAIL team’s approach bypasses all of that by simply taking the 2-D images that are displayed to each eye. (The human brain does the rest by automatically inferring the 3-D information.) 

To test the system, the team first teleoperated Baxter to do simple tasks like picking up screws or stapling wires. They then had the test users teleoperate the robot to pick up and stack blocks.

Users successfully completed the tasks at a much higher rate compared to the direct model. Unsurprisingly, users with gaming experience had much more ease with the system.

Tested against current state-of-the-art systems, CSAIL’s system was better at grasping objects 95 percent of the time and 57 percent faster at doing tasks. The team also showed that the system could pilot the robot from hundreds of miles away; testing included controling Baxter at MIT from a hotel’s wireless network in Washington.

“This contribution represents a major milestone in the effort to connect the user with the robot’s space in an intuitive, natural, and effective manner.” says Oussama Khatib, a computer science professor at Stanford University who was not involved in the paper.

The team eventually wants to focus on making the system more scalable, with many users and different types of robots that can be compatible with current automation technologies.

The project was funded, in part, by the Boeing Company and the National Science Foundation.

“Superhero” robot wears different outfits for different tasks

Dubbed “Primer,” a new cube-shaped robot can be controlled via magnets to make it walk, roll, sail, and glide. It carries out these actions by wearing different exoskeletons, which start out as sheets of plastic that fold into specific shapes when heated. After Primer finishes its task, it can shed its “skin” by immersing itself in water, which dissolves the exoskeleton. Credit: the researchers.

From butterflies that sprout wings to hermit crabs that switch their shells, many animals must adapt their exterior features in order to survive. While humans don’t undergo that kind of metamorphosis, we often try to create functional objects that are similarly adaptive — including our robots.

Despite what you might have seen in “Transformers” movies, though, today’s robots are still pretty inflexible. Each of their parts usually has a fixed structure and a single defined purpose, making it difficult for them to perform a wide variety of actions.

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) are aiming to change that with a new shape-shifting robot that’s something of a superhero: It can transform itself with different “outfits” that allow it to perform different tasks.

Dubbed “Primer,” the cube-shaped robot can be controlled via magnets to make it walk, roll, sail, and glide. It carries out these actions by wearing different exoskeletons, which start out as sheets of plastic that fold into specific shapes when heated. After Primer finishes its task, it can shed its “skin” by immersing itself in water, which dissolves the exoskeleton.

“If we want robots to help us do things, it’s not very efficient to have a different one for each task,” says Daniela Rus, CSAIL director and principal investigator on the project. “With this metamorphosis-inspired approach, we can extend the capabilities of a single robot by giving it different ‘accessories’ to use in different situations.”

Primer’s various forms have a range of advantages. For example, “Wheel-bot” has wheels that allow it to move twice as fast as “Walk-bot.” “Boat-bot” can float on water and carry nearly twice its weight. “Glider-bot” can soar across longer distances, which could be useful for deploying robots or switching environments.

Primer can even wear multiple outfits at once, like a Russian nesting doll. It can add one exoskeleton to become “Walk-bot,” and then interface with another, larger exoskeleton that allows it to carry objects and move two body lengths per second. To deploy the second exoskeleton, “Walk-bot” steps onto the sheet, which then blankets the bot with its four self-folding arms.

“Imagine future applications for space exploration, where you could send a single robot with a stack of exoskeletons to Mars,” says postdoc Shuguang Li, one of the co-authors of the study. “The robot could then perform different tasks by wearing different ‘outfits.’”

The project was led by Rus and Shuhei Miyashita, a former CSAIL postdoc who is now director of the Microrobotics Group at the University of York. Their co-authors include Li and graduate student Steven Guitron. An article about the work appears in the journal Science Robotics on Sept. 27.

Robot metamorphosis

Primer builds on several previous projects from Rus’ team, including magnetic blocks that can assemble themselves into different shapes and centimeter-long microrobots that can be precisely customized from sheets of plastic.

While robots that can change their form or function have been developed at larger sizes, it’s generally been difficult to build such structures at much smaller scales.

“This work represents an advance over the authors’ previous work in that they have now demonstrated a scheme that allows for the creation of five different functionalities,” says Eric Diller, a microrobotics expert and assistant professor of mechanical engineering at the University of Toronto, who was not involved in the paper. “Previous work at most shifted between only two functionalities, such as ‘open’ or ‘closed’ shapes.”

The team outlines many potential applications for robots that can perform multiple actions with just a quick costume change. For example, say some equipment needs to be moved across a stream. A single robot with multiple exoskeletons could potentially sail across the stream and then carry objects on the other side.

“Our approach shows that origami-inspired manufacturing allows us to have robotic components that are versatile, accessible, and reusable,” says Rus, the Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science at MIT.

Designed in a matter of hours, the exoskeletons fold into shape after being heated for just a few seconds, suggesting a new approach to rapid fabrication of robots.

“I could envision devices like these being used in ‘microfactories’ where prefabricated parts and tools would enable a single microrobot to do many complex tasks on demand,” Diller says.

As a next step, the team plans to explore giving the robots an even wider range of capabilities, from driving through water and burrowing in sand to camouflaging their color. Guitron pictures a future robotics community that shares open-source designs for parts much the way 3-D-printing enthusiasts trade ideas on sites such as Thingiverse.

“I can imagine one day being able to customize robots with different arms and appendages,” says Rus. “Why update a whole robot when you can just update one part of it?”

This project was supported, in part, by the National Science Foundation.

Automatic code reuse

“CodeCarbonCopy enables one of the holy grails of software engineering: automatic code reuse,” says Stelios Sidiroglou-Douskos, a research scientist at CSAIL. Credit: MIT News

by Larry Hardesty

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed a new system that allows programmers to transplant code from one program into another. The programmer can select the code from one program and an insertion point in a second program, and the system will automatically make modifications necessary — such as changing variable names — to integrate the code into its new context.

Crucially, the system is able to translate between “data representations” used by the donor and recipient programs. An image-processing program, for instance, needs to be able to handle files in a range of formats, such as jpeg, tiff, or png. But internally, it will represent all such images using a single standardized scheme. Different programs, however, may use different internal schemes. The CSAIL researchers’ system automatically maps the donor program’s scheme onto that of the recipient, to import code seamlessly.

The researchers presented the new system, dubbed CodeCarbonCopy, at the Association for Computing Machinery’s Symposium on the Foundations of Software Engineering.

“CodeCarbonCopy enables one of the holy grails of software engineering: automatic code reuse,” says Stelios Sidiroglou-Douskos, a research scientist at CSAIL and first author on the paper. “It’s another step toward automating the human away from the development cycle. Our view is that perhaps we have written most of the software that we’ll ever need — we now just need to reuse it.”

The researchers conducted eight experiments in which they used CodeCarbonCopy to transplant code between six popular open-source image-processing programs. Seven of the eight transplants were successful, with the recipient program properly executing the new functionality.

Joining Sidiroglou-Douskos on the paper are Martin Rinard, a professor of electrical engineering and computer science; Fan Long, an MIT graduate student in electrical engineering and computer science; and Eric Lahtinen and Anthony Eden, who were contract programmers at MIT when the work was done.

Mutatis mutandis

With CodeCarbonCopy, the first step in transplanting code from one program to another is to feed both of them the same input file. The system then compares how the two programs process the file.

If, for instance, the donor program performs a series of operations on a particular piece of data and loads the result into a variable named “mem_clip->width,” and the recipient performs the same operations on the same piece of data and loads the result into a variable named “picture.width,” the system will infer that the variables are playing the same roles in their respective programs.

Once it has identified correspondences between variables, CodeCarbonCopy presents them to the user. It also presents all the variables in the donor for which it could not find matches in the recipient, together with those variables’ initial definitions. Frequently, those variables are playing some role in the donor that’s irrelevant to the recipient. The user can flag those variables as unnecessary, and CodeCarbonCopy will automatically excise any operations that make use of them from the transplanted code.

New order

To map the data representations from one program onto those of the other, CodeCarbonCopy looks at the precise values that both programs store in memory. Every pixel in a digital image, for instance, is governed by three color values: red, green, and blue. Some programs, however, store those triplets of values in the order red, green, blue, and others store them in the order blue, green, red.

If CodeCarbonCopy finds a systematic relationship between the values stored by one program and those stored by the other, it generates a set of operations for translating between representations.

CodeCarbonCopy works well with file formats, such as images, whose data is rigidly organized, and with programs, such as image processors, that store data representations in arrays, which are essentially rows of identically sized memory units. In ongoing work, the researchers are looking to generalize their approach to file formats that permit more flexible data organization and programs that use data structures other than arrays, such as trees or linked lists.

“In general, code quoting is where a lot of problems in software come from,” says Vitaly Shmatikov, a professor of computer science at Cornell Tech, a joint academic venture between Cornell University and Israel’s Technion. “Both bugs and security vulnerabilities — a lot of them occur when there is functionality in one place, and someone tries to either cut and paste or reimplement this functionality in another place. They make a small mistake, and that’s how things break. So having an automated way of moving code from one place to another would be a huge, huge deal, and this is a very solid step toward having it.”

“Recognizing irrelevant code that’s not important for the functionality that they’re quoting, that’s another technical innovation that’s important,” Shmatikov adds. “That’s the kind of thing that was an obstacle for a lot of previous approaches — that you know the right code is there, but it’s mixed up with a lot of code that is not relevant to what you’re trying to do. So being able to separate that out is a fairly significant technical contribution.”

Page 10 of 12
1 8 9 10 11 12