Page 11 of 12
1 9 10 11 12

Using machine learning to improve patient care

Credit: Shutterstock / MIT

Doctors are often deluged by signals from charts, test results, and other metrics to keep track of. It can be difficult to integrate and monitor all of these data for multiple patients while making real-time treatment decisions, especially when data is documented inconsistently across hospitals.

In a new pair of papers, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) explore ways for computers to help doctors make better medical decisions.

One team created a machine-learning approach called “ICU Intervene” that takes large amounts of intensive-care-unit (ICU) data, from vitals and labs to notes and demographics, to determine what kinds of treatments are needed for different symptoms. The system uses “deep learning” to make real-time predictions, learning from past ICU cases to make suggestions for critical care, while also explaining the reasoning behind these decisions.

“The system could potentially be an aid for doctors in the ICU, which is a high-stress, high-demand environment,” says PhD student Harini Suresh, lead author on the paper about ICU Intervene. “The goal is to leverage data from medical records to improve health care and predict actionable interventions.”

Another team developed an approach called “EHR Model Transfer” that can facilitate the application of predictive models on an electronic health record (EHR) system, despite being trained on data from a different EHR system. Specifically, using this approach the team showed that predictive models for mortality and prolonged length of stay can be trained on one EHR system and used to make predictions in another.

ICU Intervene was co-developed by Suresh, undergraduate student Nathan Hunt, postdoc Alistair Johnson, researcher Leo Anthony Celi, MIT Professor Peter Szolovits, and PhD student Marzyeh Ghassemi. It was presented this month at the Machine Learning for Healthcare Conference in Boston.

EHR Model Transfer was co-developed by lead authors Jen Gong and Tristan Naumann, both PhD students at CSAIL, as well as Szolovits and John Guttag, who is the Dugald C. Jackson Professor in Electrical Engineering. It was presented at the ACM’s Special Interest Group on Knowledge Discovery and Data Mining in Halifax, Canada.

Both models were trained using data from the critical care database MIMIC, which includes de-identified data from roughly 40,000 critical care patients and was developed by the MIT Lab for Computational Physiology.

ICU Intervene

Integrated ICU data is vital to automating the process of predicting patients’ health outcomes.

“Much of the previous work in clinical decision-making has focused on outcomes such as mortality (likelihood of death), while this work predicts actionable treatments,” Suresh says. “In addition, the system is able to use a single model to predict many outcomes.”

ICU Intervene focuses on hourly prediction of five different interventions that cover a wide variety of critical care needs, such as breathing assistance, improving cardiovascular function, lowering blood pressure, and fluid therapy.

At each hour, the system extracts values from the data that represent vital signs, as well as clinical notes and other data points. All of the data are represented with values that indicate how far off a patient is from the average (to then evaluate further treatment).

Importantly, ICU Intervene can make predictions far into the future. For example, the model can predict whether a patient will need a ventilator six hours later rather than just 30 minutes or an hour later. The team also focused on providing reasoning for the model’s predictions, giving physicians more insight.

“Deep neural-network-based predictive models in medicine are often criticized for their black-box nature,” says Nigam Shah, an associate professor of medicine at Stanford University who was not involved in the paper. “However, these authors predict the start and end of medical interventions with high accuracy, and are able to demonstrate interpretability for the predictions they make.”

The team found that the system outperformed previous work in predicting interventions, and was especially good at predicting the need for vasopressors, a medication that tightens blood vessels and raises blood pressure.

In the future, the researchers will be trying to improve ICU Intervene to be able to give more individualized care and provide more advanced reasoning for decisions, such as why one patient might be able to taper off steroids, or why another might need a procedure like an endoscopy.

EHR Model Transfer

Another important consideration for leveraging ICU data is how it’s stored and what happens when that storage method gets changed. Existing machine-learning models need data to be encoded in a consistent way, so the fact that hospitals often change their EHR systems can create major problems for data analysis and prediction.

That’s where EHR Model Transfer comes in. The approach works across different versions of EHR platforms, using natural language processing to identify clinical concepts that are encoded differently across systems and then mapping them to a common set of clinical concepts (such as “blood pressure” and “heart rate”).

For example, a patient in one EHR platform could be switching hospitals and would need their data transferred to a different type of platform. EHR Model Transfer aims to ensure that the model could still predict aspects of that patient’s ICU visit, such as their likelihood of a prolonged stay or even of dying in the unit.

“Machine-learning models in health care often suffer from low external validity, and poor portability across sites,” says Shah. “The authors devise a nifty strategy for using prior knowledge in medical ontologies to derive a shared representation across two sites that allows models trained at one site to perform well at another site. I am excited to see such creative use of codified medical knowledge in improving portability of predictive models.”

With EHR Model Transfer, the team tested their model’s ability to predict two outcomes: mortality and the need for a prolonged stay. They trained it on one EHR platform and then tested its predictions on a different platform. EHR Model Transfer was found to outperform baseline approaches and demonstrated better transfer of predictive models across EHR versions compared to using EHR-specific events alone.

In the future, the EHR Model Transfer team plans to evaluate the system on data and EHR systems from other hospitals and care settings.

Both papers were supported, in part, by the Intel Science and Technology Center for Big Data and the National Library of Medicine. The paper detailing EHR Model Transfer was additionally supported by the National Science Foundation and Quanta Computer, Inc.

New AI algorithm monitors sleep with radio waves

Researchers have devised a new way to monitor sleep stages without sensors attached to the body. Their device uses an advanced artificial intelligence algorithm to analyze the radio signals around the person and translate those measurements into sleep stages: light, deep, or rapid eye movement (REM). Image: Christine Daniloff/MIT

More than 50 million Americans suffer from sleep disorders, and diseases including Parkinson’s and Alzheimer’s can also disrupt sleep. Diagnosing and monitoring these conditions usually requires attaching electrodes and a variety of other sensors to patients, which can further disrupt their sleep.

To make it easier to diagnose and study sleep problems, researchers at MIT and Massachusetts General Hospital have devised a new way to monitor sleep stages without sensors attached to the body. Their device uses an advanced artificial intelligence algorithm to analyze the radio signals around the person and translate those measurements into sleep stages: light, deep, or rapid eye movement (REM).

“Imagine if your Wi-Fi router knows when you are dreaming, and can monitor whether you are having enough deep sleep, which is necessary for memory consolidation,” says Dina Katabi, the Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science, who led the study. “Our vision is developing health sensors that will disappear into the background and capture physiological signals and important health metrics, without asking the user to change her behavior in any way.”

Katabi worked on the study with Matt Bianchi, chief of the division of sleep medicine at MGH, and Tommi Jaakkola, the Thomas Siebel Professor of Electrical Engineering and Computer Science and a member of the Institute for Data, Systems, and Society at MIT. Mingmin Zhao, an MIT graduate student, is the paper’s first author, and Shichao Yue, another MIT graduate student, is also a co-author.

The researchers will present their new sensor at the International Conference on Machine Learning on Aug. 9.

Remote sensing

Katabi and members of her group in MIT’s Computer Science and Artificial Intelligence Laboratory have previously developed radio-based sensors that enable them to remotely measure vital signs and behaviors that can be indicators of health. These sensors consist of a wireless device, about the size of a laptop computer, that emits low-power radio frequency (RF) signals. As the radio waves reflect off of the body, any slight movement of the body alters the frequency of the reflected waves. Analyzing those waves can reveal vital signs such as pulse and breathing rate.

“It’s a smart Wi-Fi-like box that sits in the home and analyzes these reflections and discovers all of these changes in the body, through a signature that the body leaves on the RF signal,” Katabi says.

Katabi and her students have also used this approach to create a sensor called WiGait that can measure walking speed using wireless signals, which could help doctors predict cognitive decline, falls, certain cardiac or pulmonary diseases, or other health problems.

After developing those sensors, Katabi thought that a similar approach could also be useful for monitoring sleep, which is currently done while patients spend the night in a sleep lab hooked up to monitors such as electroencephalography (EEG) machines.

“The opportunity is very big because we don’t understand sleep well, and a high fraction of the population has sleep problems,” says Zhao. “We have this technology that, if we can make it work, can move us from a world where we do sleep studies once every few months in the sleep lab to continuous sleep studies in the home.”

To achieve that, the researchers had to come up with a way to translate their measurements of pulse, breathing rate, and movement into sleep stages. Recent advances in artificial intelligence have made it possible to train computer algorithms known as deep neural networks to extract and analyze information from complex datasets, such as the radio signals obtained from the researchers’ sensor. However, these signals have a great deal of information that is irrelevant to sleep and can be confusing to existing algorithms. The MIT researchers had to come up with a new AI algorithm based on deep neural networks, which eliminates the irrelevant information.

“The surrounding conditions introduce a lot of unwanted variation in what you measure. The novelty lies in preserving the sleep signal while removing the rest,” says Jaakkola. Their algorithm can be used in different locations and with different people, without any calibration.

Using this approach in tests of 25 healthy volunteers, the researchers found that their technique was about 80 percent accurate, which is comparable to the accuracy of ratings determined by sleep specialists based on EEG measurements.

“Our device allows you not only to remove all of these sensors that you put on the person, and make it a much better experience that can be done at home, it also makes the job of the doctor and the sleep technologist much easier,” Katabi says. “They don’t have to go through the data and manually label it.”

Sleep deficiencies

Other researchers have tried to use radio signals to monitor sleep, but these systems are accurate only 65 percent of the time and mainly determine whether a person is awake or asleep, not what sleep stage they are in. Katabi and her colleagues were able to improve on that by training their algorithm to ignore wireless signals that bounce off of other objects in the room and include only data reflected from the sleeping person.

The researchers now plan to use this technology to study how Parkinson’s disease affects sleep.

“When you think about Parkinson’s, you think about it as a movement disorder, but the disease is also associated with very complex sleep deficiencies, which are not very well understood,” Katabi says.

The sensor could also be used to learn more about sleep changes produced by Alzheimer’s disease, as well as sleep disorders such as insomnia and sleep apnea. It may also be useful for studying epileptic seizures that happen during sleep, which are usually difficult to detect.

Automatic image retouching system

A new system can automatically retouch images in the style of a professional photographer. It can run on a cellphone and display retouched images in real-time. Courtesy of the researchers (edited by MIT News)

The data captured by today’s digital cameras is often treated as the raw material of a final image. Before uploading pictures to social networking sites, even casual cellphone photographers might spend a minute or two balancing color and tuning contrast, with one of the many popular image-processing programs now available.

This week at Siggraph, the premier digital graphics conference, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory and Google are presenting a new system that can automatically retouch images in the style of a professional photographer. It’s so energy-efficient, however, that it can run on a cellphone, and it’s so fast that it can display retouched images in real-time, so that the photographer can see the final version of the image while still framing the shot.

The same system can also speed up existing image-processing algorithms. In tests involving a new Google algorithm for producing high-dynamic-range images, which capture subtleties of color lost in standard digital images, the new system produced results that were visually indistinguishable from those of the algorithm in about one-tenth the time — again, fast enough for real-time display.

The system is a machine-learning system, meaning that it learns to perform tasks by analyzing training data; in this case, for each new task it learned, it was trained on thousands of pairs of images, raw and retouched.

The work builds on an earlier project from the MIT researchers, in which a cellphone would send a low-resolution version of an image to a web server. The server would send back a “transform recipe” that could be used to retouch the high-resolution version of the image on the phone, reducing bandwidth consumption.

“Google heard about the work I’d done on the transform recipe,” says Michaël Gharbi, an MIT graduate student in electrical engineering and computer science and first author on both papers. “They themselves did a follow-up on that, so we met and merged the two approaches. The idea was to do everything we were doing before but, instead of having to process everything on the cloud, to learn it. And the first goal of learning it was to speed it up.”

Short cuts

In the new work, the bulk of the image processing is performed on a low-resolution image, which drastically reduces time and energy consumption. But this introduces a new difficulty, because the color values of the individual pixels in the high-res image have to be inferred from the much coarser output of the machine-learning system.

In the past, researchers have attempted to use machine learning to learn how to “upsample” a low-res image, or increase its resolution by guessing the values of the omitted pixels. During training, the input to the system is a low-res image, and the output is a high-res image. But this doesn’t work well in practice; the low-res image just leaves out too much data.

Gharbi and his colleagues — MIT professor of electrical engineering and computer science Frédo Durand and Jiawen Chen, Jon Barron, and Sam Hasinoff of Google — address this problem with two clever tricks. The first is that the output of their machine-learning system is not an image; rather, it’s a set of simple formulae for modifying the colors of image pixels. During training, the performance of the system is judged according to how well the output formulae, when applied to the original image, approximate the retouched version.

Taking bearings

The second trick is a technique for determining how to apply those formulae to individual pixels in the high-res image. The output of the researchers’ system is a three-dimensional grid, 16 by 16 by 8. The 16-by-16 faces of the grid correspond to pixel locations in the source image; the eight layers stacked on top of them correspond to different pixel intensities. Each cell of the grid contains formulae that determine modifications of the color values of the source images.

That means that each cell of one of the grid’s 16-by-16 faces has to stand in for thousands of pixels in the high-res image. But suppose that each set of formulae corresponds to a single location at the center of its cell. Then any given high-res pixel falls within a square defined by four sets of formulae.

Roughly speaking, the modification of that pixel’s color value is a combination of the formulae at the square’s corners, weighted according to distance. A similar weighting occurs in the third dimension of the grid, the one corresponding to pixel intensity.

The researchers trained their system on a data set created by Durand’s group and Adobe Systems, the creators of Photoshop. The data set includes 5,000 images, each retouched by five different photographers. They also trained their system on thousands of pairs of images produced by the application of particular image-processing algorithms, such as the one for creating high-dynamic-range (HDR) images. The software for performing each modification takes up about as much space in memory as a single digital photo, so in principle, a cellphone could be equipped to process images in a range of styles.

Finally, the researchers compared their system’s performance to that of a machine-learning system that processed images at full resolution rather than low resolution. During processing, the full-res version needed about 12 gigabytes of memory to execute its operations; the researchers’ version needed about 100 megabytes, or one-hundredth as much. The full-resolution version of the HDR system took about 10 times as long to produce an image as the original algorithm, or 100 times as long as the researchers’ system.

“This technology has the potential to be very useful for real-time image enhancement on mobile platforms,” says Barron. “Using machine learning for computational photography is an exciting prospect but is limited by the severe computational and power constraints of mobile phones. This paper may provide us with a way to sidestep these issues and produce new, compelling, real-time photographic experiences without draining your battery or giving you a laggy viewfinder experience.”

SMART trials self-driving wheelchair at hospital

Image: MIT CSAIL

Singapore and MIT have been at the forefront of autonomous vehicle development. First, there were self-driving golf buggies. Then, an autonomous electric car. Now, leveraging similar technology, MIT and Singaporean researchers have developed and deployed a self-driving wheelchair at a hospital.

Spearheaded by Daniela Rus, the Andrew (1956) and Erna Viterbi Professor of Electrical Engineering and Computer Science and director of MIT’s Computer Science and Artificial Intelligence Laboratory, this autonomous wheelchair is an extension of the self-driving scooter that launched at MIT last year — and it is a testament to the success of the Singapore-MIT Alliance for Research and Technology, or SMART, a collaboration between researchers at MIT and in Singapore.

Rus, who is also the principal investigator of the SMART Future Urban Mobility research group, says this newest innovation can help nurses focus more on patient care as they can get relief from logistics work which includes searching for wheelchairs and wheeling patients in the complex hospital network.

“When we visited several retirement communities, we realized that the quality of life is dependent on mobility. We want to make it really easy for people to move around,” Rus says.

Reshaping computer-aided design

Adriana Schulz, an MIT PhD student in the Computer Science and Artificial Intelligence Laboratory, demonstrates the InstantCAD computer-aided-design-optimizing interface. Photo: Rachel Gordon/MIT CSAIL

Almost every object we use is developed with computer-aided design (CAD). Ironically, while CAD programs are good for creating designs, using them is actually very difficult and time-consuming if you’re trying to improve an existing design to make the most optimal product. Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Columbia University are trying to make the process faster and easier: In a new paper, they’ve developed InstantCAD, a tool that lets designers interactively edit, improve, and optimize CAD models using a more streamlined and intuitive workflow.

 InstantCAD integrates seamlessly with existing CAD programs as a plug-in, meaning that designers don’t have to learn new tools to use it.

“From more ergonomic desks to higher-performance cars, this is really about creating better products in less time,” says Department of Electrical Engineering and Computer Science PhD student and lead author Adriana Schulz, who will be presenting the paper at this month’s SIGGRAPH computer-graphics conference in Los Angeles. “We think this could be a real game changer for automakers and other companies that want to be able to test and improve complex designs in a matter of seconds to minutes, instead of hours to days.”

The paper was co-written by Associate Professor Wojciech Matusik, PhD student Jie Xu, and postdoc Bo Zhu of CSAIL, as well as Associate Professor Eitan Grinspun and Assistant Professor Changxi Zheng of Columbia University.

Traditional CAD systems are “parametric,” which means that when engineers design models, they can change properties like shape and size (“parameters”) based on different priorities. For example, when designing a wind turbine you might have to make trade-offs between how much airflow you can get versus how much energy it will generate.

InstantCAD enables designers to interactively edit, improve, and optimize CAD models using a more streamlined and intuitive workflow. Photo: Rachel Gordon/MIT CSAIL

However, it can be difficult to determine the absolute best design for what you want your object to do, because there are many different options for modifying the design. On top of that, the process is time-consuming because changing a single property means having to wait to regenerate the new design, run a simulation, see the result, and then figure out what to do next.

With InstantCAD, the process of improving and optimizing the design can be done in real-time, saving engineers days or weeks. After an object is designed in a commercial CAD program, it is sent to a cloud platform where multiple geometric evaluations and simulations are run at the same time.

With this precomputed data, you can instantly improve and optimize the design in two ways. With “interactive exploration,” a user interface provides real-time feedback on how design changes will affect performance, like how the shape of a plane wing impacts air pressure distribution. With “automatic optimization,” you simply tell the system to give you a design with specific characteristics, like a drone that’s as lightweight as possible while still being able to carry the maximum amount of weight.

The reason it’s hard to optimize an object’s design is because of the massive size of the design space (the number of possible design options).

“It’s too data-intensive to compute every single point, so we have to come up with a way to predict any point in this space from just a small number of sampled data points,” says Schulz. “This is called ‘interpolation,’ and our key technical contribution is a new algorithm we developed to take these samples and estimate points in the space.”

Matusik says InstantCAD could be particularly helpful for more intricate designs for objects like cars, planes, and robots, particularly for industries like car manufacturing that care a lot about squeezing every little bit of performance out of a product.

“Our system doesn’t just save you time for changing designs, but has the potential to dramatically improve the quality of the products themselves,” says Matusik. “The more complex your design gets, the more important this kind of a tool can be.”

Because of the system’s productivity boosts and CAD integration, Schulz is confident that it will have immediate applications for industry. Down the line, she hopes that InstantCAD can also help lower the barrier for entry for casual users.

“In a world where 3-D printing and industrial robotics are making manufacturing more accessible, we need systems that make the actual design process more accessible, too,” Schulz says. “With systems like this that make it easier to customize objects to meet your specific needs, we hope to be paving the way to a new age of personal manufacturing and DIY design.”

The project was supported by the National Science Foundation.

Artificial intelligence suggests recipes based on food photos

Pic2Recipe, an artificial intelligence system developed at MIT, can take a photo of an entree and suggest a similar recipe to it. Photo: Jason Dorfman/MIT CSAIL

There are few things social media users love more than flooding their feeds with photos of food. Yet we seldom use these images for much more than a quick scroll on our cellphones. Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) believe that analyzing photos like these could help us learn recipes and better understand people’s eating habits.

In a new paper with the Qatar Computing Research Institute (QCRI), the team trained an artificial intelligence system called Pic2Recipe to look at a photo of food and be able to predict the ingredients and suggest similar recipes.

“In computer vision, food is mostly neglected because we don’t have the large-scale datasets needed to make predictions,” says Yusuf Aytar, an MIT postdoc who co-wrote a paper about the system with MIT Professor Antonio Torralba. “But seemingly useless photos on social media can actually provide valuable insight into health habits and dietary preferences.”

 The paper will be presented later this month at the Computer Vision and Pattern Recognition conference in Honolulu. CSAIL graduate student Nick Hynes was lead author alongside Amaia Salvador of the Polytechnic University of Catalonia in Spain. Co-authors include CSAIL postdoc Javier Marin, as well as scientist Ferda Ofli and research director Ingmar Weber of QCRI.

How it works

The web has spurred a huge growth of research in the area of classifying food data, but the majority of it has used much smaller datasets, which often leads to major gaps in labeling foods.

In 2014 Swiss researchers created the “Food-101” dataset and used it to develop an algorithm that could recognize images of food with 50 percent accuracy. Future iterations only improved accuracy to about 80 percent, suggesting that the size of the dataset may be a limiting factor.

Even the larger datasets have often been somewhat limited in how well they generalize across populations. A database from the City University in Hong Kong has over 110,000 images and 65,000 recipes, each with ingredient lists and instructions, but only contains Chinese cuisine.

The CSAIL team’s project aims to build off of this work but dramatically expand in scope. Researchers combed websites like All Recipes and Food.com to develop “Recipe1M,” a database of over 1 million recipes that were annotated with information about the ingredients in a wide range of dishes. They then used that data to train a neural network to find patterns and make connections between the food images and the corresponding ingredients and recipes.

Given a photo of a food item, Pic2Recipe could identify ingredients like flour, eggs, and butter, and then suggest several recipes that it determined to be similar to images from the database. (The team has an online demo where people can upload their own food photos to test it out.)

“You can imagine people using this to track their daily nutrition, or to photograph their meal at a restaurant and know what’s needed to cook it at home later,” says Christoph Trattner, an assistant professor at MODUL University Vienna in the New Media Technology Department who was not involved in the paper. “The team’s approach works at a similar level to human judgement, which is remarkable.”

The system did particularly well with desserts like cookies or muffins, since that was a main theme in the database. However, it had difficulty determining ingredients for more ambiguous foods, like sushi rolls and smoothies.

It was also often stumped when there were similar recipes for the same dishes. For example, there are dozens of ways to make lasagna, so the team needed to make sure that system wouldn’t “penalize” recipes that are similar when trying to separate those that are different. (One way to solve this was by seeing if the ingredients in each are generally similar before comparing the recipes themselves).

In the future, the team hopes to be able to improve the system so that it can understand food in even more detail. This could mean being able to infer how a food is prepared (i.e. stewed versus diced) or distinguish different variations of foods, like mushrooms or onions.

The researchers are also interested in potentially developing the system into a “dinner aide” that could figure out what to cook given a dietary preference and a list of items in the fridge.

“This could potentially help people figure out what’s in their food when they don’t have explicit nutritional information,” says Hynes. “For example, if you know what ingredients went into a dish but not the amount, you can take a photo, enter the ingredients, and run the model to find a similar recipe with known quantities, and then use that information to approximate your own meal.”

The project was funded, in part, by QCRI, as well as the European Regional Development Fund (ERDF) and the Spanish Ministry of Economy, Industry, and Competitiveness.

Bringing neural networks to cellphones

Image: Jose-Luis Olivares/MIT

In recent years, the best-performing artificial-intelligence systems — in areas such as autonomous driving, speech recognition, computer vision, and automatic translation — have come courtesy of software systems known as neural networks.

But neural networks take up a lot of memory and consume a lot of power, so they usually run on servers in the cloud, which receive data from desktop or mobile devices and then send back their analyses.

Last year, MIT associate professor of electrical engineering and computer science Vivienne Sze and colleagues unveiled a new, energy-efficient computer chip optimized for neural networks, which could enable powerful artificial-intelligence systems to run locally on mobile devices.

Now, Sze and her colleagues have approached the same problem from the opposite direction, with a battery of techniques for designing more energy-efficient neural networks. First, they developed an analytic method that can determine how much power a neural network will consume when run on a particular type of hardware. Then they used the method to evaluate new techniques for paring down neural networks so that they’ll run more efficiently on handheld devices.

The researchers describe the work in a paper they’re presenting next week at the Computer Vision and Pattern Recognition Conference. In the paper, they report that the methods offered as much as a 73 percent reduction in power consumption over the standard implementation of neural networks, and as much as a 43 percent reduction over the best previous method for paring the networks down.

Energy evaluator

Loosely based on the anatomy of the brain, neural networks consist of thousands or even millions of simple but densely interconnected information-processing nodes, usually organized into layers. Different types of networks vary according to their number of layers, the number of connections between the nodes, and the number of nodes in each layer.

The connections between nodes have “weights” associated with them, which determine how much a given node’s output will contribute to the next node’s computation. During training, in which the network is presented with examples of the computation it’s learning to perform, those weights are continually readjusted, until the output of the network’s last layer consistently corresponds with the result of the computation.

“The first thing we did was develop an energy-modeling tool that accounts for data movement, transactions, and data flow,” Sze says. “If you give it a network architecture and the value of its weights, it will tell you how much energy this neural network will take. One of the questions that people had is ‘Is it more energy efficient to have a shallow network and more weights or a deeper network with fewer weights?’ This tool gives us better intuition as to where the energy is going, so that an algorithm designer could have a better understanding and use this as feedback. The second thing we did is that, now that we know where the energy is actually going, we started to use this model to drive our design of energy-efficient neural networks.”

In the past, Sze explains, researchers attempting to reduce neural networks’ power consumption used a technique called “pruning.” Low-weight connections between nodes contribute very little to a neural network’s final output, so many of them can be safely eliminated, or pruned.

Principled pruning

With the aid of their energy model, Sze and her colleagues — first author Tien-Ju Yang and Yu-Hsin Chen, both graduate students in electrical engineering and computer science — varied this approach. Although cutting even a large number of low-weight connections can have little effect on a neural net’s output, cutting all of them probably would, so pruning techniques must have some mechanism for deciding when to stop.

The MIT researchers thus begin pruning those layers of the network that consume the most energy. That way, the cuts translate to the greatest possible energy savings. They call this method “energy-aware pruning.”

Weights in a neural network can be either positive or negative, so the researchers’ method also looks for cases in which connections with weights of opposite sign tend to cancel each other out. The inputs to a given node are the outputs of nodes in the layer below, multiplied by the weights of their connections. So the researchers’ method looks not only at the weights but also at the way the associated nodes handle training data. Only if groups of connections with positive and negative weights consistently offset each other can they be safely cut. This leads to more efficient networks with fewer connections than earlier pruning methods did.

“Recently, much activity in the deep-learning community has been directed toward development of efficient neural-network architectures for computationally constrained platforms,” says Hartwig Adam, the team lead for mobile vision at Google. “However, most of this research is focused on either reducing model size or computation, while for smartphones and many other devices energy consumption is of utmost importance because of battery usage and heat restrictions. This work is taking an innovative approach to CNN [convolutional neural net] architecture optimization that is directly guided by minimization of power consumption using a sophisticated new energy estimation tool, and it demonstrates large performance gains over computation-focused methods. I hope other researchers in the field will follow suit and adopt this general methodology to neural-network-model architecture design.”

Miniaturizing the brain of a drone


By Jennifer Chu

In recent years, engineers have worked to shrink drone technology, building flying prototypes that are the size of a bumblebee and loaded with even tinier sensors and cameras. Thus far, they have managed to miniaturize almost every part of a drone, except for the brains of the entire operation — the computer chip.

Standard computer chips for quadcoptors and other similarly sized drones process an enormous amount of streaming data from cameras and sensors, and interpret that data on the fly to autonomously direct a drone’s pitch, speed, and trajectory. To do so, these computers use between 10 and 30 watts of power, supplied by batteries that would weigh down a much smaller, bee-sized drone.

Now, engineers at MIT have taken a first step in designing a computer chip that uses a fraction of the power of larger drone computers and is tailored for a drone as small as a bottlecap. They will present a new methodology and design, which they call “Navion,” at the Robotics: Science and Systems conference, held this week at MIT.

The team, led by Sertac Karaman, the Class of 1948 Career Development Associate Professor of Aeronautics and Astronautics at MIT, and Vivienne Sze, an associate professor in MIT’s Department of Electrical Engineering and Computer Science, developed a low-power algorithm, in tandem with pared-down hardware, to create a specialized computer chip.

The key contribution of their work is a new approach for designing the chip hardware and the algorithms that run on the chip. “Traditionally, an algorithm is designed, and you throw it over to a hardware person to figure out how to map the algorithm to hardware,” Sze says. “But we found by designing the hardware and algorithms together, we can achieve more substantial power savings.”

“We are finding that this new approach to programming robots, which involves thinking about hardware and algorithms jointly, is key to scaling them down,” Karaman says.

The new chip processes streaming images at 20 frames per second and automatically carries out commands to adjust a drone’s orientation in space. The streamlined chip performs all these computations while using just below 2 watts of power — making it an order of magnitude more efficient than current drone-embedded chips.

Karaman, says the team’s design is the first step toward engineering “the smallest intelligent drone that can fly on its own.” He ultimately envisions disaster-response and search-and-rescue missions in which insect-sized drones flit in and out of tight spaces to examine a collapsed structure or look for trapped individuals. Karaman also foresees novel uses in consumer electronics.

“Imagine buying a bottlecap-sized drone that can integrate with your phone, and you can take it out and fit it in your palm,” he says. “If you lift your hand up a little, it would sense that, and start to fly around and film you. Then you open your hand again and it would land on your palm, and you could upload that video to your phone and share it with others.”

Karaman and Sze’s co-authors are graduate students Zhengdong Zhang and Amr Suleiman, and research scientist Luca Carlone.

From the ground up

Current minidrone prototypes are small enough to fit on a person’s fingertip and are extremely light, requiring only 1 watt of power to lift off from the ground. Their accompanying cameras and sensors use up an additional half a watt to operate.

“The missing piece is the computers — we can’t fit them in terms of size and power,” Karaman says. “We need to miniaturize the computers and make them low power.”

The group quickly realized that conventional chip design techniques would likely not produce a chip that was small enough and provided the required processing power to intelligently fly a small autonomous drone.

“As transistors have gotten smaller, there have been improvements in efficiency and speed, but that’s slowing down, and now we have to come up with specialized hardware to get improvements in efficiency,” Sze says.

The researchers decided to build a specialized chip from the ground up, developing algorithms to process data, and hardware to carry out that data-processing, in tandem.

Tweaking a formula

Specifically, the researchers made slight changes to an existing algorithm commonly used to determine a drone’s “ego-motion,” or awareness of its position in space. They then implemented various versions of the algorithm on a field-programmable gate array (FPGA), a very simple programmable chip. To formalize this process, they developed a method called iterative splitting co-design that could strike the right balance of achieving accuracy while reducing the power consumption and the number of gates.

A typical FPGA consists of hundreds of thousands of disconnected gates, which researchers can connect in desired patterns to create specialized computing elements. Reducing the number gates with co-design allowed the team to chose an FPGA chip with fewer gates, leading to substantial power savings.

“If we don’t need a certain logic or memory process, we don’t use them, and that saves a lot of power,” Karaman explains.

Each time the researchers tweaked the ego-motion algorithm, they mapped the version onto the FPGA’s gates and connected the chip to a circuit board. They then fed the chip data from a standard drone dataset — an accumulation of streaming images and accelerometer measurements from previous drone-flying experiments that had been carried out by others and made available to the robotics community.

“These experiments are also done in a motion-capture room, so you know exactly where the drone is, and we use all this information after the fact,” Karaman says.

Memory savings

For each version of the algorithm that was implemented on the FPGA chip, the researchers observed the amount of power that the chip consumed as it processed the incoming data and estimated its resulting position in space.

The team’s most efficient design processed images at 20 frames per second and accurately estimated the drone’s orientation in space, while consuming less than 2 watts of power.

The power savings came partly from modifications to the amount of memory stored in the chip. Sze and her colleagues found that they were able to shrink the amount of data that the algorithm needed to process, while still achieving the same outcome. As a result, the chip itself was able to store less data and consume less power.

“Memory is really expensive in terms of power,” Sze says. “Since we do on-the-fly computing, as soon as we receive any data on the chip, we try to do as much processing as possible so we can throw it out right away, which enables us to keep a very small amount of memory on the chip without accessing off-chip memory, which is much more expensive.”

In this way, the team was able to reduce the chip’s memory storage to 2 megabytes without using off-chip memory, compared to a typical embedded computer chip for drones, which uses off-chip memory on the order of a few gigabytes.

“Any which way you can reduce the power so you can reduce battery size or extend battery life, the better,” Sze says.

This summer, the team will mount the FPGA chip onto a drone to test its performance in flight. Ultimately, the team plans to implement the optimized algorithm on an application-specific integrated circuit, or ASIC, a more specialized hardware platform that allows engineers to design specific types of gates, directly onto the chip.

“We think we can get this down to just a few hundred milliwatts,” Karaman says. “With this platform, we can do all kinds of optimizations, which allows tremendous power savings.”

This research was supported, in part, by Air Force Office of Scientific Research and the National Science Foundation.

Peering into neural networks

Neural networks learn to perform computational tasks by analyzing large sets of training data. But once they’ve been trained, even their designers rarely have any idea what data elements they’re processing.
Image: Christine Daniloff/MIT

By Larry Hardesty

Neural networks, which learn to perform computational tasks by analyzing large sets of training data, are responsible for today’s best-performing artificial intelligence systems, from speech recognition systems, to automatic translators, to self-driving cars.

But neural nets are black boxes. Once they’ve been trained, even their designers rarely have any idea what they’re doing — what data elements they’re processing and how.

Two years ago, a team of computer-vision researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) described a method for peering into the black box of a neural net trained to identify visual scenes. The method provided some interesting insights, but it required data to be sent to human reviewers recruited through Amazon’s Mechanical Turk crowdsourcing service.

At this year’s Computer Vision and Pattern Recognition conference, CSAIL researchers will present a fully automated version of the same system. Where the previous paper reported the analysis of one type of neural network trained to perform one task, the new paper reports the analysis of four types of neural networks trained to perform more than 20 tasks, including recognizing scenes and objects, colorizing grey images, and solving puzzles. Some of the new networks are so large that analyzing any one of them would have been cost-prohibitive under the old method.

The researchers also conducted several sets of experiments on their networks that not only shed light on the nature of several computer-vision and computational-photography algorithms, but could also provide some evidence about the organization of the human brain.

Neural networks are so called because they loosely resemble the human nervous system, with large numbers of fairly simple but densely connected information-processing “nodes.” Like neurons, a neural net’s nodes receive information signals from their neighbors and then either “fire” — emitting their own signals — or don’t. And as with neurons, the strength of a node’s firing response can vary.

In both the new paper and the earlier one, the MIT researchers doctored neural networks trained to perform computer vision tasks so that they disclosed the strength with which individual nodes fired in response to different input images. Then they selected the 10 input images that provoked the strongest response from each node.

In the earlier paper, the researchers sent the images to workers recruited through Mechanical Turk, who were asked to identify what the images had in common. In the new paper, they use a computer system instead.

“We catalogued 1,100 visual concepts — things like the color green, or a swirly texture, or wood material, or a human face, or a bicycle wheel, or a snowy mountaintop,” says David Bau, an MIT graduate student in electrical engineering and computer science and one of the paper’s two first authors. “We drew on several data sets that other people had developed, and merged them into a broadly and densely labeled data set of visual concepts. It’s got many, many labels, and for each label we know which pixels in which image correspond to that label.”

The paper’s other authors are Bolei Zhou, co-first author and fellow graduate student; Antonio Torralba, MIT professor of electrical engineering and computer science; Aude Oliva, CSAIL principal research scientist; and Aditya Khosla, who earned his PhD as a member of Torralba’s group and is now the chief technology officer of the medical-computing company PathAI.

The researchers also knew which pixels of which images corresponded to a given network node’s strongest responses. Today’s neural nets are organized into layers. Data are fed into the lowest layer, which processes them and passes them to the next layer, and so on. With visual data, the input images are broken into small chunks, and each chunk is fed to a separate input node.

For every strong response from a high-level node in one of their networks, the researchers could trace back the firing patterns that led to it, and thus identify the specific image pixels it was responding to. Because their system could frequently identify labels that corresponded to the precise pixel clusters that provoked a strong response from a given node, it could characterize the node’s behavior with great specificity.

The researchers organized the visual concepts in their database into a hierarchy. Each level of the hierarchy incorporates concepts from the level below, beginning with colors and working upward through textures, materials, parts, objects, and scenes. Typically, lower layers of a neural network would fire in response to simpler visual properties — such as colors and textures — and higher layers would fire in response to more complex properties.

But the hierarchy also allowed the researchers to quantify the emphasis that networks trained to perform different tasks placed on different visual properties. For instance, a network trained to colorize black-and-white images devoted a large majority of its nodes to recognizing textures. Another network, when trained to track objects across several frames of video, devoted a higher percentage of its nodes to scene recognition than it did when trained to recognize scenes; in that case, many of its nodes were in fact dedicated to object detection.

One of the researchers’ experiments could conceivably shed light on a vexed question in neuroscience. Research involving human subjects with electrodes implanted in their brains to control severe neurological disorders has seemed to suggest that individual neurons in the brain fire in response to specific visual stimuli. This hypothesis, originally called the grandmother-neuron hypothesis, is more familiar to a recent generation of neuroscientists as the Jennifer-Aniston-neuron hypothesis, after the discovery that several neurological patients had neurons that appeared to respond only to depictions of particular Hollywood celebrities.

Many neuroscientists dispute this interpretation. They argue that shifting constellations of neurons, rather than individual neurons, anchor sensory discriminations in the brain. Thus, the so-called Jennifer Aniston neuron is merely one of many neurons that collectively fire in response to images of Jennifer Aniston. And it’s probably part of many other constellations that fire in response to stimuli that haven’t been tested yet.

Because their new analytic technique is fully automated, the MIT researchers were able to test whether something similar takes place in a neural network trained to recognize visual scenes. In addition to identifying individual network nodes that were tuned to particular visual concepts, they also considered randomly selected combinations of nodes. Combinations of nodes, however, picked out far fewer visual concepts than individual nodes did — roughly 80 percent fewer.

“To my eye, this is suggesting that neural networks are actually trying to approximate getting a grandmother neuron,” Bau says. “They’re not trying to just smear the idea of grandmother all over the place. They’re trying to assign it to a neuron. It’s this interesting hint of this structure that most people don’t believe is that simple.”

Shrinking data for surgical training

Image: MIT News

Laparoscopy is a surgical technique in which a fiber-optic camera is inserted into a patient’s abdominal cavity to provide a video feed that guides the surgeon through a minimally invasive procedure. Laparoscopic surgeries can take hours, and the video generated by the camera — the laparoscope — is often recorded. Those recordings contain a wealth of information that could be useful for training both medical providers and computer systems that would aid with surgery, but because reviewing them is so time consuming, they mostly sit idle.

Researchers at MIT and Massachusetts General Hospital hope to change that, with a new system that can efficiently search through hundreds of hours of video for events and visual features that correspond to a few training examples.

In work they presented at the International Conference on Robotics and Automation this month, the researchers trained their system to recognize different stages of an operation, such as biopsy, tissue removal, stapling, and wound cleansing.

But the system could be applied to any analytical question that doctors deem worthwhile. It could, for instance, be trained to predict when particular medical instruments — such as additional staple cartridges — should be prepared for the surgeon’s use, or it could sound an alert if a surgeon encounters rare, aberrant anatomy.

“Surgeons are thrilled by all the features that our work enables,” says Daniela Rus, an Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science and senior author on the paper. “They are thrilled to have the surgical tapes automatically segmented and indexed, because now those tapes can be used for training. If we want to learn about phase two of a surgery, we know exactly where to go to look for that segment. We don’t have to watch every minute before that. The other thing that is extraordinarily exciting to the surgeons is that in the future, we should be able to monitor the progression of the operation in real-time.”

Joining Rus on the paper are first author Mikhail Volkov, who was a postdoc in Rus’ group when the work was done and is now a quantitative analyst at SMBC Nikko Securities in Tokyo; Guy Rosman, another postdoc in Rus’ group; and Daniel Hashimoto and Ozanan Meireles of Massachusetts General Hospital (MGH).

Representative frames

The new paper builds on previous work from Rus’ group on “coresets,” or subsets of much larger data sets that preserve their salient statistical characteristics. In the past, Rus’ group has used coresets to perform tasks such as deducing the topics of Wikipedia articles or recording the routes traversed by GPS-connected cars.

In this case, the coreset consists of a couple hundred or so short segments of video — just a few frames each. Each segment is selected because it offers a good approximation of the dozens or even hundreds of frames surrounding it. The coreset thus winnows a video file down to only about one-tenth its initial size, while still preserving most of its vital information.

For this research, MGH surgeons identified seven distinct stages in a procedure for removing part of the stomach, and the researchers tagged the beginnings of each stage in eight laparoscopic videos. Those videos were used to train a machine-learning system, which was in turn applied to the coresets of four laparoscopic videos it hadn’t previously seen. For each short video snippet in the coresets, the system was able to assign it to the correct stage of surgery with 93 percent accuracy.

“We wanted to see how this system works for relatively small training sets,” Rosman explains. “If you’re in a specific hospital, and you’re interested in a specific surgery type, or even more important, a specific variant of a surgery — all the surgeries where this or that happened — you may not have a lot of examples.”

Selection criteria

The general procedure that the researchers used to extract the coresets is one they’ve previously described, but coreset selection always hinges on specific properties of the data it’s being applied to. The data included in the coreset — here, frames of video — must approximate the data being left out, and the degree of approximation is measured differently for different types of data.

Machine learning can be thought of as a problem of approximation, however. In this case, the system had to learn to identify similarities between frames of video in separate laparoscopic feeds that denoted the same phases of a surgical procedure. The metric of similarity that it arrived at also served to assess the similarity of video frames that were included in the coreset, to those that were omitted.

“Interventional medicine — surgery in particular — really comes down to human performance in many ways,” says Gregory Hager, a professor of computer science at Johns Hopkins University who investigates medical applications of computer and robotic technologies. “As in many other areas of human endeavor, like sports, the quality of the human performance determines the quality of the outcome that you achieve, but we don’t know a lot about, if you will, the analytics of what creates a good surgeon. Work like what Daniela is doing and our work really goes to the question of: Can we start to quantify what the process in surgery is, and then within that process, can we develop measures where we can relate human performance to the quality of care that a patient receives?”

“Right now, efficiency” — of the kind provided by coresets — “is probably not that important, because we’re dealing with small numbers of these things,” Hager adds. “But you could imagine that, if you started to record every surgery that’s performed — we’re talking tens of millions of procedures in the U.S. alone — now it starts to be interesting to think about efficiency.”

Engineers design drones that can stay aloft for five days

The Jungle Hawk Owl team. Photo: Sally Chapman/MIT

In the event of a natural disaster that disrupts phone and Internet systems over a wide area, autonomous aircraft could potentially hover over affected regions, carrying communications payloads that provide temporary telecommunications coverage to those in need.

However, such unpiloted aerial vehicles, or UAVs, are often expensive to operate, and can only remain in the air for a day or two, as is the case with most autonomous surveillance aircraft operated by the U.S. Air Force. Providing adequate and persistent coverage would require a relay of multiple aircraft, landing and refueling around the clock, with operational costs of thousands of dollars per hour, per vehicle.

Now a team of MIT engineers has come up with a much less expensive UAV design that can hover for longer durations to provide wide-ranging communications support. The researchers designed, built, and tested a UAV resembling a thin glider with a 24-foot wingspan. The vehicle can carry 10 to 20 pounds of communications equipment while flying at an altitude of 15,000 feet. Weighing in at just under 150 pounds, the vehicle is powered by a 5-horsepower gasoline engine and can keep itself aloft for more than five days — longer than any gasoline-powered autonomous aircraft has remained in flight, the researchers say.

The team is presenting its results this week at the American Institute of Aeronautics and Astronautics Conference in Denver, Colorado. The team was led by R. John Hansman, the T. Wilson Professor of Aeronautics and Astronautics; and Warren Hoburg, the Boeing Assistant Professor of Aeronautics and Astronautics. Hansman and Hoburg are co-instructors for MIT’s Beaver Works project, a student research collaboration between MIT and the MIT Lincoln Laboratory.

A solar no-go

Hansman and Hoburg worked with MIT students to design a long-duration UAV as part of a Beaver Works capstone project — typically a two- or three-semester course that allows MIT students to design a vehicle that meets certain mission specifications, and to build and test their design.

In the spring of 2016, the U.S. Air Force approached the Beaver Works collaboration with an idea for designing a long-duration UAV powered by solar energy. The thought at the time was that an aircraft, fueled by the sun, could potentially remain in flight indefinitely. Others, including Google, have experimented with this concept,  designing solar-powered, high-altitude aircraft to deliver continuous internet access to rural and remote parts of Africa.

But when the team looked into the idea and analyzed the problem from multiple engineering angles, they found that solar power — at least for long-duration emergency response — was not the way to go.

“[A solar vehicle] would work fine in the summer season, but in winter, particularly if you’re far from the equator, nights are longer, and there’s not as much sunlight  during the day. So you have to carry more batteries, which adds weight and makes the plane bigger,” Hansman says. “For the mission of disaster relief, this could only respond to disasters that occur in summer, at low latitude. That just doesn’t work.”

The researchers came to their conclusions after modeling the problem using GPkit, a software tool developed by Hoburg that allows engineers to determine the optimal design decisions or dimensions for a vehicle, given certain constraints or mission requirements.

This method is not unique among initial aircraft design tools, but unlike these tools, which take into account only several main constraints, Hoburg’s method allowed the team to consider around 200 constraints and physical models simultaneously, and to fit them all together to create an optimal aircraft design.

“This gives you all the information you need to draw up the airplane,” Hansman says. “It also says that for every one of these hundreds of parameters, if you changed one of them, how much would that influence the plane’s performance? If you change the engine a bit, it will make a big difference. And if you change wingspan, will it show an effect?”

Framing for takeoff

After determining, through their software estimations, that a solar-powered UAV would not be feasible, at least for long-duration use in any part of the world, the team performed the same modeling for a gasoline-powered aircraft. They came up with a design that was predicted to stay in flight for more than five days, at altitudes of 15,000 feet, in up to 94th-percentile winds, at any latitude.

In the fall of 2016, the team built a prototype UAV, following the dimensions determined by students using Hoburg’s software tool. To keep the vehicle lightweight, they used materials such as carbon fiber for its wings and fuselage, and Kevlar for the tail and nosecone, which houses the payload. The researchers designed the UAV to be easily taken apart and stored in a FedEx box, to be shipped to any disaster region and quickly reassembled.

This spring, the students refined the prototype and developed a launch system, fashioning a simple metal frame to fit on a typical car roof rack. The UAV sits atop the frame as a driver accelerates the launch vehicle (a car or truck) up to rotation speed — the UAV’s optimal takeoff speed. At that point, the remote pilot would angle the UAV toward the sky, automatically releasing a fastener and allowing the UAV to lift off.

In early May, the team put the UAV to the test, conducting flight tests at Plum Island Airport in Newburyport, Massachusetts. For initial flight testing, the students modified the vehicle to comply with FAA regulations for small unpiloted aircraft, which allow drones flying at low altitude and weighing less than 55 pounds. To reduce the UAV’s weight from 150 to under 55 pounds, the researchers simply loaded it with a smaller ballast payload and less gasoline.

In their initial tests, the UAV successfully took off, flew around, and landed safely. Hoburg says there are special considerations that have to be made to test the vehicle over multiple days, such as having enough people to monitor the aircraft over a long period of time.

“There are a few aspects to flying for five straight days,” Hoburg says. “But we’re pretty confident that we have the right fuel burn rate and right engine that we could fly it for five days.”

“These vehicles could be used not only for disaster relief but also other missions, such as environmental monitoring. You might want to keep watch on wildfires or the outflow of a river,” Hansman adds. “I think it’s pretty clear that someone within a few years will manufacture a vehicle that will be a knockoff of this.”

This research was supported, in part, by MIT Lincoln Laboratory.

Giving robots a sense of touch

A GelSight sensor attached to a robot’s gripper enables the robot to determine precisely where it has grasped a small screwdriver, removing it from and inserting it back into a slot, even when the gripper screens the screwdriver from the robot’s camera. Photo: Robot Locomotion Group at MIT

Eight years ago, Ted Adelson’s research group at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) unveiled a new sensor technology, called GelSight, that uses physical contact with an object to provide a remarkably detailed 3-D map of its surface.

Now, by mounting GelSight sensors on the grippers of robotic arms, two MIT teams have given robots greater sensitivity and dexterity. The researchers presented their work in two papers at the International Conference on Robotics and Automation last week.

In one paper, Adelson’s group uses the data from the GelSight sensor to enable a robot to judge the hardness of surfaces it touches — a crucial ability if household robots are to handle everyday objects.

In the other, Russ Tedrake’s Robot Locomotion Group at CSAIL uses GelSight sensors to enable a robot to manipulate smaller objects than was previously possible.

The GelSight sensor is, in some ways, a low-tech solution to a difficult problem. It consists of a block of transparent rubber — the “gel” of its name — one face of which is coated with metallic paint. When the paint-coated face is pressed against an object, it conforms to the object’s shape.

The metallic paint makes the object’s surface reflective, so its geometry becomes much easier for computer vision algorithms to infer. Mounted on the sensor opposite the paint-coated face of the rubber block are three colored lights and a single camera.

“[The system] has colored lights at different angles, and then it has this reflective material, and by looking at the colors, the computer … can figure out the 3-D shape of what that thing is,” explains Adelson, the John and Dorothy Wilson Professor of Vision Science in the Department of Brain and Cognitive Sciences.

In both sets of experiments, a GelSight sensor was mounted on one side of a robotic gripper, a device somewhat like the head of a pincer, but with flat gripping surfaces rather than pointed tips.

Contact points

For an autonomous robot, gauging objects’ softness or hardness is essential to deciding not only where and how hard to grasp them but how they will behave when moved, stacked, or laid on different surfaces. Tactile sensing could also aid robots in distinguishing objects that look similar.

In previous work, robots have attempted to assess objects’ hardness by laying them on a flat surface and gently poking them to see how much they give. But this is not the chief way in which humans gauge hardness. Rather, our judgments seem to be based on the degree to which the contact area between the object and our fingers changes as we press on it. Softer objects tend to flatten more, increasing the contact area.

The MIT researchers adopted the same approach. Wenzhen Yuan, a graduate student in mechanical engineering and first author on the paper from Adelson’s group, used confectionary molds to create 400 groups of silicone objects, with 16 objects per group. In each group, the objects had the same shapes but different degrees of hardness, which Yuan measured using a standard industrial scale.

Then she pressed a GelSight sensor against each object manually and recorded how the contact pattern changed over time, essentially producing a short movie for each object. To both standardize the data format and keep the size of the data manageable, she extracted five frames from each movie, evenly spaced in time, which described the deformation of the object that was pressed.

Finally, she fed the data to a , which automatically looked for correlations between changes in contact patterns and hardness measurements. The resulting system takes frames of video as inputs and produces hardness scores with very high accuracy. Yuan also conducted a series of informal experiments in which human subjects palpated fruits and vegetables and ranked them according to hardness. In every instance, the GelSight-equipped robot arrived at the same rankings.

Yuan is joined on the paper by her two thesis advisors, Adelson and Mandayam Srinivasan, a senior research scientist in the Department of Mechanical Engineering; Chenzhuo Zhu, an undergraduate from Tsinghua University who visited Adelson’s group last summer; and Andrew Owens, who did his PhD in electrical engineering and computer science at MIT and is now a postdoc at the University of California at Berkeley.

Obstructed views

The paper from the Robot Locomotion Group was born of the group’s experience with the Defense Advanced Research Projects Agency’s Robotics Challenge (DRC), in which academic and industry teams competed to develop control systems that would guide a humanoid robot through a series of tasks related to a hypothetical emergency.

Typically, an autonomous robot will use some kind of computer vision system to guide its manipulation of objects in its environment. Such systems can provide very reliable information about an object’s location — until the robot picks the object up. Especially if the object is small, much of it will be occluded by the robot’s gripper, making location estimation much harder. Thus, at exactly the point at which the robot needs to know the object’s location precisely, its estimate becomes unreliable. This was the problem the MIT team faced during the DRC, when their robot had to pick up and turn on a power drill.

“You can see in our video for the DRC that we spend two or three minutes turning on the drill,” says Greg Izatt, a graduate student in electrical engineering and computer science and first author on the new paper. “It would be so much nicer if we had a live-updating, accurate estimate of where that drill was and where our hands were relative to it.”

That’s why the Robot Locomotion Group turned to GelSight. Izatt and his co-authors — Tedrake, the Toyota Professor of Electrical Engineering and Computer Science, Aeronautics and Astronautics, and Mechanical Engineering; Adelson; and Geronimo Mirano, another graduate student in Tedrake’s group — designed control algorithms that use a computer vision system to guide the robot’s gripper toward a tool and then turn location estimation over to a GelSight sensor once the robot has the tool in hand.

In general, the challenge with such an approach is reconciling the data produced by a vision system with data produced by a tactile sensor. But GelSight is itself camera-based, so its data output is much easier to integrate with visual data than the data from other tactile sensors.

In Izatt’s experiments, a robot with a GelSight-equipped gripper had to grasp a small screwdriver, remove it from a holster, and return it. Of course, the data from the GelSight sensor don’t describe the whole screwdriver, just a small patch of it. But Izatt found that, as long as the vision system’s estimate of the screwdriver’s initial position was accurate to within a few centimeters, his algorithms could deduce which part of the screwdriver the GelSight sensor was touching and thus determine the screwdriver’s position in the robot’s hand.

“I think that the GelSight technology, as well as other high-bandwidth tactile sensors, will make a big impact in robotics,” says Sergey Levine, an assistant professor of electrical engineering and computer science at the University of California at Berkeley. “For humans, our sense of touch is one of the key enabling factors for our amazing manual dexterity. Current robots lack this type of dexterity and are limited in their ability to react to surface features when manipulating objects. If you imagine fumbling for a light switch in the dark, extracting an object from your pocket, or any of the other numerous things that you can do without even thinking — these all rely on touch sensing.”

“Software is finally catching up with the capabilities of our sensors,” Levine adds. “Machine learning algorithms inspired by innovations in deep learning and computer vision can process the rich sensory data from sensors such as the GelSight to deduce object properties. In the future, we will see these kinds of learning methods incorporated into end-to-end trained manipulation skills, which will make our robots more dexterous and capable, and maybe help us understand something about our own sense of touch and motor control.”

Faster, more nimble drones on the horizon

Engineers at MIT have come up with an algorithm to tune a Dynamic Vision Sensor (DVS) camera, simplifying a scene to its most essential visual elements and potentially enabling the development of faster drones. Image: Jose-Luis Olivares/MIT

There’s a limit to how fast autonomous vehicles can fly while safely avoiding obstacles. That’s because the cameras used on today’s drones can only process images so fast, frame by individual frame. Beyond roughly 30 miles per hour, a drone is likely to crash simply because its cameras can’t keep up.

Recently, researchers in Zurich invented a new type of camera, known as the Dynamic Vision Sensor (DVS), that continuously visualizes a scene in terms of changes in brightness, at extremely short, microsecond intervals. But this deluge of data can overwhelm a system, making it difficult for a drone to distinguish an oncoming obstacle through the noise.

Now engineers at MIT have come up with an algorithm to tune a DVS camera to detect only specific changes in brightness that matter for a particular system, vastly simplifying a scene to its most essential visual elements.

The results, which they presented at the IEEE American Control Conference in Seattle, can be applied to any linear system that directs a robot to move from point A to point B as a response to high-speed visual data. Eventually, the results could also help to increase the speeds for more complex systems such as drones and other autonomous robots.

“There is a new family of vision sensors that has the capacity to bring high-speed autonomous flight to reality, but researchers have not developed algorithms that are suitable to process the output data,” says lead author Prince Singh, a graduate student in MIT’s Department of Aeronautics and Astronautics. “We present a first approach for making sense of the DVS’ ambiguous data, by reformulating the inherently noisy system into an amenable form.”

Singh’s co-authors are MIT visiting professor Emilio Frazzoli of the Swiss Federal Institute of Technology in Zurich, and Sze Zheng Yong of Arizona State University.

Taking a visual cue from biology

The DVS camera is the first commercially available “neuromorphic” sensor — a class of sensors that is modeled after the vision systems in animals and humans. In the very early stages of processing a scene, photosensitive cells in the human retina, for example, are activated in response to changes in luminosity, in real time.

Neuromorphic sensors are designed with multiple circuits arranged in parallel, similarly to photosensitive cells, that activate and produce blue or red pixels on a computer screen in response to either a drop or spike in brightness.

Instead of a typical video feed, a drone with a DVS camera would “see” a grainy depiction of pixels that switch between two colors, depending on whether that point in space has brightened or darkened at any given moment. The sensor requires no image processing and is designed to enable, among other applications, high-speed autonomous flight.

Researchers have used DVS cameras to enable simple linear systems to see and react to high-speed events, and they have designed controllers, or algorithms, to quickly translate DVS data and carry out appropriate responses. For example, engineers have designed controllers that interpret pixel changes in order to control the movements of a robotic goalie to block an incoming soccer ball, as well as to direct a motorized platform to keep a pencil standing upright.

But for any given DVS system, researchers have had to start from scratch in designing a controller to translate DVS data in a meaningful way for that particular system.

“The pencil and goalie examples are very geometrically constrained, meaning if you give me those specific scenarios, I can design a controller,” Singh says. “But the question becomes, what if I want to do something more complicated?”

Cutting through the noise

In the team’s new paper, the researchers report developing a sort of universal controller that can translate DVS data in a meaningful way for any simple linear, robotic system. The key to the controller is that it identifies the ideal value for a parameter Singh calls “H,” or the event-threshold value, signifying the minimum change in brightness that the system can detect.

Setting the H value for a particular system can essentially determine that system’s visual sensitivity: A system with a low H value would be programmed to take in and interpret changes in luminosity that range from very small to relatively large, while a high H value would exclude small changes, and only “see” and react to large variations in brightness.

The researchers formulated an algorithm first by taking into account the possibility that a change in brightness would occur for every “event,” or pixel activated in a particular system. They also estimated the probability for “spurious events,” such as a pixel randomly misfiring, creating false noise in the data.

Once they derived a formula with these variables in mind, they were able to work it into a well-known algorithm known as an H-infinity robust controller, to determine the H value for that system.

The team’s algorithm can now be used to set a DVS camera’s sensitivity to detect the most essential changes in brightness for any given linear system, while excluding extraneous signals. The researchers performed a numerical simulation to test the algorithm, identifying an H value for a theoretical linear system, which they found was able to remain stable and carry out its function without being disrupted by extraneous pixel events.

“We found that this H threshold serves as a ‘sweet-spot,’ so that a system doesn’t become overwhelmed with too many events,” Singh says. He adds that the new results “unify control of many systems,” and represent a first step toward faster, more stable autonomous flying robots, such as the Robobee, developed by researchers at Harvard University.

“We want to break that speed limit of 20 to 30 miles per hour, and go faster without colliding,” Singh says. “The next step may be to combine DVS with a regular camera, which can tell you, based on the DVS rendering, that an object is a couch versus a car, in real time.”

This research was supported in part by the Singapore National Research Foundation through the SMART Future Urban Mobility project.

The Force was strong in this robot competition

An Imperial Snowtrooper inspects a competitor’s entry at the 2017 MIT Mechanical Engineering 2.007 Student Design Final Robot Competition. Photo: Tony Pulsone

Thursday night, dozens of robots designed and built by undergraduates in a mechanical engineering class endured hours of intense, boisterous, and often jubilant competition as they scrambled to rack up points in one-on-one clashes on special “Star Wars”-themed playing arenas.

As has often happened in these contests — which have been going on, and constantly evolving, since 1970 — the ultimate winner in the single-elimination tournament was not the one that’d most consistently racked up the highest scores all evening. Rather, it was a high-scoring bot that triumphed when its competitor missed a crucial scoring opportunity because its starting position was just slightly out of alignment.

The class, 2.007 (Design and Manufacturing I), which has 165 mostly sophomore students, begins by giving each student an identical kit of parts, from which they each have to create a robot to carry out a variety of tasks to score points. This year, in a nod to the 40th anniversary of the first “Star Wars” film, released in 1977, the robots crawled around and over a replica of a “Star Wars” X-wing Starfighter. Students could earn points by pulling up a sliding frame to rescue prisoners trapped in carbonite; by dumping Imperial stormtroopers into a trash trench; by activating a cantina band; or by spinning up one or both of two large cylindrical thrusters on the wings. Students could choose which tasks to have their robot try to accomplish, and had just one semester to design, test, and operate their bot.

The devices could be pre-programmed to carry out set tasks, but could also be manually controlled through a radio-linked controller. As in past years, the open-ended nature of the assignment — and the variety of different ways to score — led to a wide range of strategies and designs, spanning from tall towers that would extend by telescoping out or with hinged sections, to elevator-like lifting devices, to small and nimble bots that scurried around to carry out multiple tasks, to an array of arms and devices for grasping or turning the different pieces. They sported names like Dodocopter, Bonnie and Clyde, Pitfall, Torque Toilet, Spinit to Winit, and Nicki Spinaj.

Students could earn extra points by accomplishing any of the tasks during an initial period when the robot had to perform autonomously, before the start of a manually remote-controlled round. The students were allowed to create multiple robots to carry out different tasks, as long as they were all made from the basic kit of parts, and all fit within a designated starting area. Most of the students opted to build two devices, and some even made three.

Second-place finisher Richard Moyer, with his small but powerful and robust robot called Tornado, consistently scored 960.5 points in every round (the highest score achieved by any of the bots), by spinning both the lower and upper thrusters to their maximum speeds — and by using the lower thruster during the high-scoring autonomous period. But on the final matchup, Tornado was just slightly out of place in the starting box, and missed the thruster, losing out on that big initial score.

The robot used a simple but reliable design, which sported a single horizontally-mounted drive wheel that it used to spin both the lower and upper thrusters, and also to activate an elevator mechanism that carried it from one wing to the other. It was “like the Swiss army knife of robots,” thanks to this multifunction device, said Sangbae Kim, an associate professor of mechanical engineering and co-instructor of the course, who was dressed as the “Star Wars” wookie, Chewbacca.

The grand-prize winner, Tom Frejowski, also built a compact, powerful robot that concentrated on the spinning task, and scored 640 in the final round to take home the top trophy (a replica of the MIT dome). Frejowski’s robot, in order to ensure that it made a straight shot from the starting position to the thruster to line up just right to spin the heavy cylinder, used a single motor to drive both of its front wheels, which helped him earn consistent high scores. “That’s how he goes dead straight every time,” said co-instructor Amos Winter, an assistant professor of mechanical engineering, who was dressed as Darth Vader and shared the emcee duties with Kim.

During the tournament, which took place in the Johnson Ice Rink, all of the course teachers and assistants were dressed in various “Star Wars” costumes, and a packed audience of fellow students, families, and visitors of all ages cheered their encouragement with great enthusiasm. During a break, each of the teaching assistants was presented with a special memento: a beaver-cut twig from a beaver dam in Nova Scotia, symbolizing MIT’s beaver mascot, and nature’s original mechanical engineer.

Echoing the sentiments of many students in the class, sophomore James Li said of the class in a pre-taped video: “I had a bit of building experience, but I never had to design and build anything of this complexity. … It was a great experience.”

On the future of human-centered robotics

“The new frontier is learning how to design the relationships between people, robots, and infrastructure,” says David Mindell, the Dibner Professor of the History of Engineering and Manufacturing, and a professor of aeronautics and astronautics. “We need new sensors, new software, new ways of architecting systems.” Photo: Len Rubenstein

Science and technology are essential tools for innovation, and to reap their full potential, we also need to articulate and solve the many aspects of today’s global issues that are rooted in the political, cultural, and economic realities of the human world. With that mission in mind, MIT’s School of Humanities, Arts, and Social Sciences has launched The Human Factor — an ongoing series of stories and interviews that highlight research on the human dimensions of global challenges. Contributors to this series also share ideas for cultivating the multidisciplinary collaborations needed to solve the major civilizational issues of our time.

David Mindell, the Frances and David Dibner Professor of the History of Engineering and Manufacturing and Professor of Aeronautics and Astronautics at MIT, researches the intersections of human behavior, technological innovation, and automation. Mindell is the author of five acclaimed books, most recently “Our Robots, Ourselves: Robotics and the Myths of Autonomy” (Viking, 2015). He is also the co-founder of Humatics Corporation, which develops technologies for human-centered automation. SHASS Communications recently asked him to share his thoughts on the relationship of robotics to human activities, and the role of multidisciplinary research in solving complex global issues.

Q: A major theme in recent political discourse has been the perceived impact of robots and automation on the United States labor economy. In your research into the relationship between human activity and robotics, what insights have you gained that inform the future of human jobs, and the direction of technological innovation?

A: In looking at how people have designed, used, and adopted robotics in extreme environments like the deep ocean, aviation, or space, my most recent work shows how robotics and automation carry with them human assumptions about how work gets done, and how technology alters those assumptions. For example, the U.S. Air Force’s Predator drones were originally envisioned as fully autonomous — able to fly without any human assistance. In the end, these drones require hundreds of people to operate.

The new success of robots will depend on how well they situate into human environments. As in chess, the strongest players are often the combinations of human and machine. I increasingly see that the three critical elements are people, robots, and infrastructure — all interdependent.

Q: In your recent book “Our Robots, Ourselves,” you describe the success of a human-centered robotics, and explain why it is the more promising research direction — rather than research that aims for total robotic autonomy. How is your perspective being received by robotic engineers and other technologists, and do you see examples of research projects that are aiming at human-centered robotics?

A: One still hears researchers describe full autonom as the only way to go; often they overlook the multitude of human intentions built into even the most autonomous systems, and the infrastructure that surrounds them. My work describes situated autonomy, where autonomous systems can be highly functional within human environments such as factories or cities. Autonomy as a means of moving through physical environments has made enormous strides in the past ten years. As a means of moving through human environments, we are only just beginning. The new frontier is learning how to design the relationships between people, robots, and infrastructure. We need new sensors, new software, new ways of architecting systems.

Q: What can the study of the history of technology teach us about the future of robotics?

A: The history of technology does not predict the future, but it does offer rich examples of how people build and interact with technology, and how it evolves over time. Some problems just keep coming up over and over again, in new forms in each generation. When the historian notices such patterns, he can begin to ask: Is there some fundamental phenomenon here? If it is fundamental, how is it likely to appear in the next generation? Might the dynamics be altered in unexpected ways by human or technical innovations?

One such pattern is how autonomous systems have been rendered less autonomous when they make their way into real world human environments. Like the Predator drone, future military robots will likely be linked to human commanders and analysts in some ways as well. Rather than eliding those links, designing them to be as robust and effective as possible is a worthy focus for researchers’ attention.

Q: MIT President L. Rafael Reif has said that the solutions to today’s challenges depend on marrying advanced technical and scientific capabilities with a deep understanding of the world’s political, cultural, and economic realities. What barriers do you see to multidisciplinary, sociotechnical collaborations, and how can we overcome them?

A: I fear that as our technical education and research continues to excel, we are building human perspectives into technologies in ways not visible to our students. All data, for example, is socially inflected, and we are building systems that learn from those data and act in the world. As a colleague from Stanford recently observed, go to Google image search and type in “Grandma” and you’ll see the social bias that can leak into data sets — the top results all appear white and middle class.

Now think of those data sets as bases of decision making for vehicles like cars or trucks, and we become aware of the social and political dimensions that we need to build into systems to serve human needs. For example, should driverless cars adjust their expectations for pedestrian behavior according to the neighborhoods they’re in?

Meanwhile, too much of the humanities has developed islands of specialized discourse that is inaccessible to outsiders. I used to be more optimistic about multidisciplinary collaborations to address these problems. Departments and schools are great for organizing undergraduate majors and graduate education, but the old two-cultures divides remain deeply embedded in the daily practices of how we do our work. I’ve long believed MIT needs a new school to address these synthetic, far-reaching questions and train students to think in entirely new ways.

Interview prepared by MIT SHASS Communications
Editorial team: Emily Hiestand (series editor), Daniel Evans Pritchard

Page 11 of 12
1 9 10 11 12