Page 11 of 12
1 9 10 11 12

Gregory Falco: Protecting urban infrastructure against cyberterrorism

“The concept of my startup is, ‘Let’s use hacker tools to defeat hackers,’” PhD student Gregory Falco says. “If you don’t know how to break it, you don’t know how to fix it.”
Photo: Ian MacLellan

by Dara Farhadi

While working for the global management consulting company Accenture, Gregory Falco discovered just how vulnerable the technologies underlying smart cities and the “internet of things” — everyday devices that are connected to the internet or a network — are to cyberterrorism attacks.

“What happened was, I was telling sheiks and government officials all around the world about how amazing the internet of things is and how it’s going to solve all their problems and solve sustainability issues and social problems,” Falco says. “And then they asked me, ‘Is it secure?’ I looked at the security guys and they said, ‘There’s no problem.’ And then I looked under the hood myself, and there was nothing going on there.”

Falco is currently transitioning into the third and final year of his PhD within the Department of Urban Studies and Planning (DUSP). Currently, his is carrying out his research at the Computer Science and Artificial Intelligence Laboratory (CSAIL). His focus is on cybersecurity for urban critical infrastructure, and the internet of things, or IoT, is at the center of his work. A washing machine, for example, that is connected to an app on its owner’s smartphone is considered part of the IoT. There are billions of IoT devices that don’t have traditional security software because they’re built with small amounts of memory and low-power processors. This makes these devices susceptible to cyberattacks and may provide a gate for hackers to breach other devices on the same network.

Falco’s concentration is on industrial controls and embedded systems such as automatic switches found in subway systems.

“If someone decides to figure out how to access a switch by hacking another access point that is communicating with that switch, then that subway is not going to stop, and people are going to die,” Falco says. “We rely on these systems for our life functions — critical infrastructure like electric grids, water grids, or transportation systems, but also our health care systems. Insulin pumps, for example, are now connected to your smartphone.”

Citing real-world examples, Falco notes that Russian hackers were able to take down the Ukrainian capital city’s electric grid, and that Iranian hackers interfered with the computer-guided controls of a small dam in Rye Brook, New York.

Falco aims to help combat potential cyberattacks through his research. One arm of his dissertation, which he is working on with renown negotiation Professor Lawrence Susskind, is aimed at conflict negotiation, and looks at how best to negotiate with cyberterrorists. Also, with CSAIL Principal Research Scientist Howard Shrobe, Falco seeks to determine the possibility of predicting which control-systems vulnerabilities could be exploited in critical urban infrastructure. The final branch of Falco’s dissertation is in collaboration with NASA’s Jet Propulsion Laboratory. He has secured a contract to develop an artificial intelligence-powered automated attack generator that can identify all the possible ways someone could hack and destroy NASA’s systems.

“What I really intend to do for my PhD is something that is actionable to the communities I’m working with,” Falco says. “I don’t want to publish something in a book that will sit on a shelf where nobody would read it.”

“Not science fiction anymore”

Falco’s battle against cyberterrorism has also lead him to co-found NeuroMesh, a startup dedicated to protecting IoT devices by using the same techniques hackers use.

“The concept of my startup is, ‘Let’s use hacker tools to defeat hackers,’” Falco says. “If you don’t know how to break it, you don’t know how to fix it.”

One tool hackers use is called a botnet. Once botnets get on a device, they often kill off other malware on the device so that they use all the processing power on the device for themselves. Botnets also play “king of the hill” on the device, and don’t let other botnets latch on.

NeuroMesh uses a botnet’s features against itself to create a good botnet. By re-engineering the botnet, programmers can use them to defeat any kind of malware that comes onto a device.

“The benefit is also that when you look at securing IoT devices with low memory and low processing power, it’s impossible to put any security on them, but these botnets have no problem getting on there because they are so small,” Falco says.

Much like a vaccine protects against diseases, NeuroMesh applies a cyber vaccine to protect industrial devices from cyberattacks. And, by leveraging the bitcoin blockchain to update devices, NeuroMesh further fortifies the security system to block other malware from attacking vital IoT devices.

Recently, Falco and his team pitched their botnet vaccine at MIT’s $100K Accelerate competition and placed second. Falco’s infant son was in the audience while Falco was presenting how NeuroMesh’s technology could secure a baby monitor, as an example, from being hacked. The startup advanced to MIT’s prestigious 100K Launch startup competition, where they finished among the top eight competitors. NeuroMesh is now further developing its technology with the help of a grant from the Department of Energy, working with Stuart Madnick, who is the John Norris Maguire Professor at MIT, and Michael Siegel, a principal research scientist at MIT’s Sloan School of Management.

“Enemies are here. They are on our turf and in our wires. It’s not science fiction anymore,” Falco says. “We’re protecting against this. That’s what NeuroMesh is meant to do.”  

The human tornado

Falco’s abundant energy has led his family to call him “the tornado.”

“One-fourth of my mind is on my startup, one-fourth on finishing my dissertation, and other half is on my 11-month-old because he comes with me when my wife works,” Falco says. “He comes to all our venture capital meetings and my presentations. He’s always around and he’s generally very good.”

As a high school student, Falco’s energy and excitement for engineering drove him to discover a new physics wave theory. Applying this to the tennis racket, he invented a new, control-enhanced method of stringing, with which he won various science competitions (and tennis matches). He used this knowledge to start a small business for stringing rackets. The thrill of business took him on a path to Cornell University’s School of Hotel Administration. After graduating early, Falco transitioned into the field of sustainability technology and energy systems, and returned to his engineering roots by earning his LEED AP (Leadership in Energy and Environmental Design) accreditation and a master’s degree in sustainability management from Columbia University.

His excitement followed him to Accenture, where he founded the smart cities division and eventually learned about the vulnerability of IoT devices. For the past three years, Falco has also been sharing his newfound knowledge about sustainability and computer science as an adjunct professor at Columbia University.

“My challenge is always to find these interdisciplinary holes because my background is so messed up. You can’t say, this guy is a computer scientist or he’s a business person or an environmental scientist because I’m all over the place,” he says.

That’s part of the reason why Falco enjoys taking care of his son, Milo, so much.

“He’s the most awesome thing ever. I see him learning and it’s really amazing,” Falco says. “Spending so much time with him is very fun. He does things that my wife gets frustrated at because he’s a ball of energy and all over the place — just like me.”

Robotic system monitors specific neurons

MIT engineers have devised a way to automate the process of monitoring neurons in a living brain using a computer algorithm that analyzes microscope images and guides a robotic arm to the target cell. In this image, a pipette guided by a robotic arm approaches a neuron identified with a fluorescent stain.
Credit: Ho-Jun Suk

by Anne Trafton

Recording electrical signals from inside a neuron in the living brain can reveal a great deal of information about that neuron’s function and how it coordinates with other cells in the brain. However, performing this kind of recording is extremely difficult, so only a handful of neuroscience labs around the world do it.

To make this technique more widely available, MIT engineers have now devised a way to automate the process, using a computer algorithm that analyzes microscope images and guides a robotic arm to the target cell.

This technology could allow more scientists to study single neurons and learn how they interact with other cells to enable cognition, sensory perception, and other brain functions. Researchers could also use it to learn more about how neural circuits are affected by brain disorders.

“Knowing how neurons communicate is fundamental to basic and clinical neuroscience. Our hope is this technology will allow you to look at what’s happening inside a cell, in terms of neural computation, or in a disease state,” says Ed Boyden, an associate professor of biological engineering and brain and cognitive sciences at MIT, and a member of MIT’s Media Lab and McGovern Institute for Brain Research.

Boyden is the senior author of the paper, which appears in the Aug. 30 issue of Neuron. The paper’s lead author is MIT graduate student Ho-Jun Suk.

Precision guidance

For more than 30 years, neuroscientists have been using a technique known as patch clamping to record the electrical activity of cells. This method, which involves bringing a tiny, hollow glass pipette in contact with the cell membrane of a neuron, then opening up a small pore in the membrane, usually takes a graduate student or postdoc several months to learn. Learning to perform this on neurons in the living mammalian brain is even more difficult.

There are two types of patch clamping: a “blind” (not image-guided) method, which is limited because researchers cannot see where the cells are and can only record from whatever cell the pipette encounters first, and an image-guided version that allows a specific cell to be targeted.

Five years ago, Boyden and colleagues at MIT and Georgia Tech, including co-author Craig Forest, devised a way to automate the blind version of patch clamping. They created a computer algorithm that could guide the pipette to a cell based on measurements of a property called electrical impedance — which reflects how difficult it is for electricity to flow out of the pipette. If there are no cells around, electricity flows and impedance is low. When the tip hits a cell, electricity can’t flow as well and impedance goes up.

Once the pipette detects a cell, it can stop moving instantly, preventing it from poking through the membrane. A vacuum pump then applies suction to form a seal with the cell’s membrane. Then, the electrode can break through the membrane to record the cell’s internal electrical activity.

The researchers achieved very high accuracy using this technique, but it still could not be used to target a specific cell. For most studies, neuroscientists have a particular cell type they would like to learn about, Boyden says.

“It might be a cell that is compromised in autism, or is altered in schizophrenia, or a cell that is active when a memory is stored. That’s the cell that you want to know about,” he says. “You don’t want to patch a thousand cells until you find the one that is interesting.”

To enable this kind of precise targeting, the researchers set out to automate image-guided patch clamping. This technique is difficult to perform manually because, although the scientist can see the target neuron and the pipette through a microscope, he or she must compensate for the fact that nearby cells will move as the pipette enters the brain.

“It’s almost like trying to hit a moving target inside the brain, which is a delicate tissue,” Suk says. “For machines it’s easier because they can keep track of where the cell is, they can automatically move the focus of the microscope, and they can automatically move the pipette.”

By combining several imaging processing techniques, the researchers came up with an algorithm that guides the pipette to within about 25 microns of the target cell. At that point, the system begins to rely on a combination of imagery and impedance, which is more accurate at detecting contact between the pipette and the target cell than either signal alone.

The researchers imaged the cells with two-photon microscopy, a commonly used technique that uses a pulsed laser to send infrared light into the brain, lighting up cells that have been engineered to express a fluorescent protein.

Using this automated approach, the researchers were able to successfully target and record from two types of cells — a class of interneurons, which relay messages between other neurons, and a set of excitatory neurons known as pyramidal cells. They achieved a success rate of about 20 percent, which is comparable to the performance of highly trained scientists performing the process manually.

Unraveling circuits

This technology paves the way for in-depth studies of the behavior of specific neurons, which could shed light on both their normal functions and how they go awry in diseases such as Alzheimer’s or schizophrenia. For example, the interneurons that the researchers studied in this paper have been previously linked with Alzheimer’s. In a recent study of mice, led by Li-Huei Tsai, director of MIT’s Picower Institute for Learning and Memory, and conducted in collaboration with Boyden, it was reported that inducing a specific frequency of brain wave oscillation in interneurons in the hippocampus could help to clear amyloid plaques similar to those found in Alzheimer’s patients.

“You really would love to know what’s happening in those cells,” Boyden says. “Are they signaling to specific downstream cells, which then contribute to the therapeutic result? The brain is a circuit, and to understand how a circuit works, you have to be able to monitor the components of the circuit while they are in action.”

This technique could also enable studies of fundamental questions in neuroscience, such as how individual neurons interact with each other as the brain makes a decision or recalls a memory.

Bernardo Sabatini, a professor of neurobiology at Harvard Medical School, says he is interested in adapting this technique to use in his lab, where students spend a great deal of time recording electrical activity from neurons growing in a lab dish.

“It’s silly to have amazingly intelligent students doing tedious tasks that could be done by robots,” says Sabatini, who was not involved in this study. “I would be happy to have robots do more of the experimentation so we can focus on the design and interpretation of the experiments.”

To help other labs adopt the new technology, the researchers plan to put the details of their approach on their web site, autopatcher.org.

Other co-authors include Ingrid van Welie, Suhasa Kodandaramaiah, and Brian Allen. The research was funded by Jeremy and Joyce Wertheimer, the National Institutes of Health (including the NIH Single Cell Initiative and the NIH Director’s Pioneer Award), the HHMI-Simons Faculty Scholars Program, and the New York Stem Cell Foundation-Robertson Award.

Robot learns to follow orders like Alexa

ComText allows robots to understand contextual commands such as, “Pick up the box I put down.”
Photo: Tom Buehler/MIT CSAIL

by Adam Conner-Simons & Rachel Gordon

Despite what you might see in movies, today’s robots are still very limited in what they can do. They can be great for many repetitive tasks, but their inability to understand the nuances of human language makes them mostly useless for more complicated requests.

For example, if you put a specific tool in a toolbox and ask a robot to “pick it up,” it would be completely lost. Picking it up means being able to see and identify objects, understand commands, recognize that the “it” in question is the tool you put down, go back in time to remember the moment when you put down the tool, and distinguish the tool you put down from other ones of similar shapes and sizes.

Recently researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have gotten closer to making this type of request easier: In a new paper, they present an Alexa-like system that allows robots to understand a wide range of commands that require contextual knowledge about objects and their environments. They’ve dubbed the system “ComText,” for “commands in context.”

The toolbox situation above was among the types of tasks that ComText can handle. If you tell the system that “the tool I put down is my tool,” it adds that fact to its knowledge base. You can then update the robot with more information about other objects and have it execute a range of tasks like picking up different sets of objects based on different commands.

“Where humans understand the world as a collection of objects and people and abstract concepts, machines view it as pixels, point-clouds, and 3-D maps generated from sensors,” says CSAIL postdoc Rohan Paul, one of the lead authors of the paper. “This semantic gap means that, for robots to understand what we want them to do, they need a much richer representation of what we do and say.”

The team tested ComText on Baxter, a two-armed humanoid robot developed for Rethink Robotics by former CSAIL director Rodney Brooks.

The project was co-led by research scientist Andrei Barbu, alongside research scientist Sue Felshin, senior research scientist Boris Katz, and Professor Nicholas Roy. They presented the paper at last week’s International Joint Conference on Artificial Intelligence (IJCAI) in Australia.

How it works

Things like dates, birthdays, and facts are forms of “declarative memory.” There are two kinds of declarative memory: semantic memory, which is based on general facts like the “sky is blue,” and episodic memory, which is based on personal facts, like remembering what happened at a party.

Most approaches to robot learning have focused only on semantic memory, which obviously leaves a big knowledge gap about events or facts that may be relevant context for future actions. ComText, meanwhile, can observe a range of visuals and natural language to glean “episodic memory” about an object’s size, shape, position, type and even if it belongs to somebody. From this knowledge base, it can then reason, infer meaning and respond to commands.

“The main contribution is this idea that robots should have different kinds of memory, just like people,” says Barbu. “We have the first mathematical formulation to address this issue, and we’re exploring how these two types of memory play and work off of each other.”

With ComText, Baxter was successful in executing the right command about 90 percent of the time. In the future, the team hopes to enable robots to understand more complicated information, such as multi-step commands, the intent of actions, and using properties about objects to interact with them more naturally.

For example, if you tell a robot that one box on a table has crackers, and one box has sugar, and then ask the robot to “pick up the snack,” the hope is that the robot could deduce that sugar is a raw material and therefore unlikely to be somebody’s “snack.”

By creating much less constrained interactions, this line of research could enable better communications for a range of robotic systems, from self-driving cars to household helpers.

“This work is a nice step towards building robots that can interact much more naturally with people,” says Luke Zettlemoyer, an associate professor of computer science at the University of Washington who was not involved in the research. “In particular, it will help robots better understand the names that are used to identify objects in the world, and interpret instructions that use those names to better do what users ask.”

The work was funded, in part, by the Toyota Research Institute, the National Science Foundation, the Robotics Collaborative Technology Alliance of the U.S. Army, and the Air Force Research Laboratory.

New robot rolls with the rules of pedestrian conduct


by Jennifer Chu
Engineers at MIT have designed an autonomous robot with “socially aware navigation,” that can keep pace with foot traffic while observing these general codes of pedestrian conduct.
Credit: MIT

Just as drivers observe the rules of the road, most pedestrians follow certain social codes when navigating a hallway or a crowded thoroughfare: Keep to the right, pass on the left, maintain a respectable berth, and be ready to weave or change course to avoid oncoming obstacles while keeping up a steady walking pace.

Now engineers at MIT have designed an autonomous robot with “socially aware navigation,” that can keep pace with foot traffic while observing these general codes of pedestrian conduct.

In drive tests performed inside MIT’s Stata Center, the robot, which resembles a knee-high kiosk on wheels, successfully avoided collisions while keeping up with the average flow of pedestrians. The researchers have detailed their robotic design in a paper that they will present at the IEEE Conference on Intelligent Robots and Systems in September.

“Socially aware navigation is a central capability for mobile robots operating in environments that require frequent interactions with pedestrians,” says Yu Fan “Steven” Chen, who led the work as a former MIT graduate student and is the lead author of the study. “For instance, small robots could operate on sidewalks for package and food delivery. Similarly, personal mobility devices could transport people in large, crowded spaces, such as shopping malls, airports, and hospitals.”

Chen’s co-authors are graduate student Michael Everett, former postdoc Miao Liu, and Jonathan How, the Richard Cockburn Maclaurin Professor of Aeronautics and Astronautics at MIT.

Social drive

In order for a robot to make its way autonomously through a heavily trafficked environment, it must solve four main challenges: localization (knowing where it is in the world), perception (recognizing its surroundings), motion planning (identifying the optimal path to a given destination), and control (physically executing its desired path).

Chen and his colleagues used standard approaches to solve the problems of localization and perception. For the latter, they outfitted the robot with off-the-shelf sensors, such as webcams, a depth sensor, and a high-resolution lidar sensor. For the problem of localization, they used open-source algorithms to map the robot’s environment and determine its position. To control the robot, they employed standard methods used to drive autonomous ground vehicles.

“The part of the field that we thought we needed to innovate on was motion planning,” Everett says. “Once you figure out where you are in the world, and know how to follow trajectories, which trajectories should you be following?”

That’s a tricky problem, particularly in pedestrian-heavy environments, where individual paths are often difficult to predict. As a solution, roboticists sometimes take a trajectory-based approach, in which they program a robot to compute an optimal path that accounts for everyone’s desired trajectories. These trajectories must be inferred from sensor data, because people don’t explicitly tell the robot where they are trying to go. 

“But this takes forever to compute. Your robot is just going to be parked, figuring out what to do next, and meanwhile the person’s already moved way past it before it decides ‘I should probably go to the right,’” Everett says. “So that approach is not very realistic, especially if you want to drive faster.”

Others have used faster, “reactive-based” approaches, in which a robot is programmed with a simple model, using geometry or physics, to quickly compute a path that avoids collisions.

The problem with reactive-based approaches, Everett says, is the unpredictability of human nature — people rarely stick to a straight, geometric path, but rather weave and wander, veering off to greet a friend or grab a coffee. In such an unpredictable environment, such robots tend to collide with people or look like they are being pushed around by avoiding people excessively.

 “The knock on robots in real situations is that they might be too cautious or aggressive,” Everett says. “People don’t find them to fit into the socially accepted rules, like giving people enough space or driving at acceptable speeds, and they get more in the way than they help.”

Training days

The team found a way around such limitations, enabling the robot to adapt to unpredictable pedestrian behavior while continuously moving with the flow and following typical social codes of pedestrian conduct.

They used reinforcement learning, a type of machine learning approach, in which they performed computer simulations to train a robot to take certain paths, given the speed and trajectory of other objects in the environment. The team also incorporated social norms into this offline training phase, in which they encouraged the robot in simulations to pass on the right, and penalized the robot when it passed on the left.

“We want it to be traveling naturally among people and not be intrusive,” Everett says. “We want it to be following the same rules as everyone else.”

The advantage to reinforcement learning is that the researchers can perform these training scenarios, which take extensive time and computing power, offline. Once the robot is trained in simulation, the researchers can program it to carry out the optimal paths, identified in the simulations, when the robot recognizes a similar scenario in the real world.

The researchers enabled the robot to assess its environment and adjust its path, every one-tenth of a second. In this way, the robot can continue rolling through a hallway at a typical walking speed of 1.2 meters per second, without pausing to reprogram its route.

“We’re not planning an entire path to the goal — it doesn’t make sense to do that anymore, especially if you’re assuming the world is changing,” Everett says. “We just look at what we see, choose a velocity, do that for a tenth of a second, then look at the world again, choose another velocity, and go again. This way, we think our robot looks more natural, and is anticipating what people are doing.”

Crowd control

Everett and his colleagues test-drove the robot in the busy, winding halls of MIT’s Stata Building, where the robot was able to drive autonomously for 20 minutes at a time. It rolled smoothly with the pedestrian flow, generally keeping to the right of hallways, occasionally passing people on the left, and avoiding any collisions.

“We wanted to bring it somewhere where people were doing their everyday things, going to class, getting food, and we showed we were pretty robust to all that,” Everett says. “One time there was even a tour group, and it perfectly avoided them.”

Everett says going forward, he plans to explore how robots might handle crowds in a pedestrian environment.

“Crowds have a different dynamic than individual people, and you may have to learn something totally different if you see five people walking together,” Everett says. “There may be a social rule of, ‘Don’t move through people, don’t split people up, treat them as one mass.’ That’s something we’re looking at in the future.”

This research was funded by Ford Motor Company.  

Custom robots in a matter of minutes

Interactive Robogami enables the fabrication of a wide range of robot designs. Photo: MIT CSAIL

Even as robots become increasingly common, they remain incredibly difficult to make. From designing and modeling to fabricating and testing, the process is slow and costly: Even one small change can mean days or weeks of rethinking and revising important hardware.

But what if there were a way to let non-experts craft different robotic designs — in one sitting?

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) are getting closer to doing exactly that. In a new paper, they present a system called “Interactive Robogami” that lets you design a robot in minutes, and then 3-D print and assemble it in as little as four hours.
 

One of the key features of the system is that it allows designers to determine both the robot’s movement (“gait”) and shape (“geometry”), a capability that’s often separated in design systems.

“Designing robots usually requires expertise that only mechanical engineers and roboticists have,” says PhD student and co-lead author Adriana Schulz. “What’s exciting here is that we’ve created a tool that allows a casual user to design their own robot by giving them this expert knowledge.”

The paper, which is being published in the new issue of the International Journal of Robotics Research, was co-led by PhD graduate Cynthia Sung alongside MIT professors Wojciech Matusik and Daniela Rus.

The other co-authors include PhD student Andrew Spielberg, former master’s student Wei Zhao, former undergraduate Robin Cheng, and Columbia University professor Eitan Grinspun. (Sung is now an assistant professor at the University of Pennsylvania.)

How it works

3-D printing has transformed the way that people can turn ideas into real objects, allowing users to move away from more traditional manufacturing. Despite these developments, current design tools still have space and motion limitations, and there’s a steep learning curve to understanding the various nuances.

Interactive Robogami aims to be much more intuitive. It uses simulations and interactive feedback with algorithms for design composition, allowing users to focus on high-level conceptual design. Users can choose from a library of over 50 different bodies, wheels, legs, and “peripherals,” as well as a selection of different steps (“gaits”).

Importantly, the system is able to guarantee that a design is actually possible, analyzing factors such as speed and stability to make suggestions and ensure that, for example, the user doesn’t create a robot so top-heavy that it can’t move without tipping over.

Once designed, the robot is then fabricated. The team’s origami-inspired “3-D print and fold” technique involves printing the design as flat faces connected at joints, and then folding the design into the final shape, combining the most effective parts of 2-D and 3-D printing.  

“3-D printing lets you print complex, rigid structures, while 2-D fabrication gives you lightweight but strong structures that can be produced quickly,” Sung says. “By 3-D printing 2-D patterns, we can leverage these advantages to develop strong, complex designs with lightweight materials.”

Results

To test the system, the team used eight subjects who were given 20 minutes of training and asked to perform two tasks.

One task involved creating a mobile, stable car design in just 10 minutes. In a second task, users were given a robot design and asked to create a trajectory to navigate the robot through an obstacle course in the least amount of travel time.

The team fabricated a total of six robots, each of which took 10 to 15 minutes to design, three to seven hours to print and 30 to 90 minutes to assemble. The team found that their 3-D print-and-fold method reduced printing time by 73 percent and the amount of material used by 70 percent. The robots also demonstrated a wide range of movement, like using single legs to walk, using different step sequences, and using legs and wheels simultaneously.

“You can quickly design a robot that you can print out, and that will help you do these tasks very quickly, easily, and cheaply,” says Sung. “It’s lowering the barrier to have everyone design and create their own robots.”

Rus hopes people will be able to incorporate robots to help with everyday tasks, and that similar systems with rapid printing technologies will enable large-scale customization and production of robots.

“These tools enable new approaches to teaching computational thinking and creating,” says Rus. “Students can not only learn by coding and making their own robots, but by bringing to life conceptual ideas about what their robots can actually do.”

While the current version focuses on designs that can walk, the team hopes that in the future, the robots can take flight. Another goal is to have the user be able to go into the system and define the behavior of the robot in terms of tasks it can perform.

“This tool enables rapid exploration of dynamic robots at an early stage in the design process,” says Moritz Bächer, a research scientist and head of the computational design and manufacturing group at Disney Research. “The expert defines the building blocks, with constraints and composition rules, and paves the way for non-experts to make complex robotic systems. This system will likely inspire follow-up work targeting the computational design of even more intricate robots.”

This research was supported by the National Science Foundation’s Expeditions in Computing program.

Using machine learning to improve patient care

Credit: Shutterstock / MIT

Doctors are often deluged by signals from charts, test results, and other metrics to keep track of. It can be difficult to integrate and monitor all of these data for multiple patients while making real-time treatment decisions, especially when data is documented inconsistently across hospitals.

In a new pair of papers, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) explore ways for computers to help doctors make better medical decisions.

One team created a machine-learning approach called “ICU Intervene” that takes large amounts of intensive-care-unit (ICU) data, from vitals and labs to notes and demographics, to determine what kinds of treatments are needed for different symptoms. The system uses “deep learning” to make real-time predictions, learning from past ICU cases to make suggestions for critical care, while also explaining the reasoning behind these decisions.

“The system could potentially be an aid for doctors in the ICU, which is a high-stress, high-demand environment,” says PhD student Harini Suresh, lead author on the paper about ICU Intervene. “The goal is to leverage data from medical records to improve health care and predict actionable interventions.”

Another team developed an approach called “EHR Model Transfer” that can facilitate the application of predictive models on an electronic health record (EHR) system, despite being trained on data from a different EHR system. Specifically, using this approach the team showed that predictive models for mortality and prolonged length of stay can be trained on one EHR system and used to make predictions in another.

ICU Intervene was co-developed by Suresh, undergraduate student Nathan Hunt, postdoc Alistair Johnson, researcher Leo Anthony Celi, MIT Professor Peter Szolovits, and PhD student Marzyeh Ghassemi. It was presented this month at the Machine Learning for Healthcare Conference in Boston.

EHR Model Transfer was co-developed by lead authors Jen Gong and Tristan Naumann, both PhD students at CSAIL, as well as Szolovits and John Guttag, who is the Dugald C. Jackson Professor in Electrical Engineering. It was presented at the ACM’s Special Interest Group on Knowledge Discovery and Data Mining in Halifax, Canada.

Both models were trained using data from the critical care database MIMIC, which includes de-identified data from roughly 40,000 critical care patients and was developed by the MIT Lab for Computational Physiology.

ICU Intervene

Integrated ICU data is vital to automating the process of predicting patients’ health outcomes.

“Much of the previous work in clinical decision-making has focused on outcomes such as mortality (likelihood of death), while this work predicts actionable treatments,” Suresh says. “In addition, the system is able to use a single model to predict many outcomes.”

ICU Intervene focuses on hourly prediction of five different interventions that cover a wide variety of critical care needs, such as breathing assistance, improving cardiovascular function, lowering blood pressure, and fluid therapy.

At each hour, the system extracts values from the data that represent vital signs, as well as clinical notes and other data points. All of the data are represented with values that indicate how far off a patient is from the average (to then evaluate further treatment).

Importantly, ICU Intervene can make predictions far into the future. For example, the model can predict whether a patient will need a ventilator six hours later rather than just 30 minutes or an hour later. The team also focused on providing reasoning for the model’s predictions, giving physicians more insight.

“Deep neural-network-based predictive models in medicine are often criticized for their black-box nature,” says Nigam Shah, an associate professor of medicine at Stanford University who was not involved in the paper. “However, these authors predict the start and end of medical interventions with high accuracy, and are able to demonstrate interpretability for the predictions they make.”

The team found that the system outperformed previous work in predicting interventions, and was especially good at predicting the need for vasopressors, a medication that tightens blood vessels and raises blood pressure.

In the future, the researchers will be trying to improve ICU Intervene to be able to give more individualized care and provide more advanced reasoning for decisions, such as why one patient might be able to taper off steroids, or why another might need a procedure like an endoscopy.

EHR Model Transfer

Another important consideration for leveraging ICU data is how it’s stored and what happens when that storage method gets changed. Existing machine-learning models need data to be encoded in a consistent way, so the fact that hospitals often change their EHR systems can create major problems for data analysis and prediction.

That’s where EHR Model Transfer comes in. The approach works across different versions of EHR platforms, using natural language processing to identify clinical concepts that are encoded differently across systems and then mapping them to a common set of clinical concepts (such as “blood pressure” and “heart rate”).

For example, a patient in one EHR platform could be switching hospitals and would need their data transferred to a different type of platform. EHR Model Transfer aims to ensure that the model could still predict aspects of that patient’s ICU visit, such as their likelihood of a prolonged stay or even of dying in the unit.

“Machine-learning models in health care often suffer from low external validity, and poor portability across sites,” says Shah. “The authors devise a nifty strategy for using prior knowledge in medical ontologies to derive a shared representation across two sites that allows models trained at one site to perform well at another site. I am excited to see such creative use of codified medical knowledge in improving portability of predictive models.”

With EHR Model Transfer, the team tested their model’s ability to predict two outcomes: mortality and the need for a prolonged stay. They trained it on one EHR platform and then tested its predictions on a different platform. EHR Model Transfer was found to outperform baseline approaches and demonstrated better transfer of predictive models across EHR versions compared to using EHR-specific events alone.

In the future, the EHR Model Transfer team plans to evaluate the system on data and EHR systems from other hospitals and care settings.

Both papers were supported, in part, by the Intel Science and Technology Center for Big Data and the National Library of Medicine. The paper detailing EHR Model Transfer was additionally supported by the National Science Foundation and Quanta Computer, Inc.

New AI algorithm monitors sleep with radio waves

Researchers have devised a new way to monitor sleep stages without sensors attached to the body. Their device uses an advanced artificial intelligence algorithm to analyze the radio signals around the person and translate those measurements into sleep stages: light, deep, or rapid eye movement (REM). Image: Christine Daniloff/MIT

More than 50 million Americans suffer from sleep disorders, and diseases including Parkinson’s and Alzheimer’s can also disrupt sleep. Diagnosing and monitoring these conditions usually requires attaching electrodes and a variety of other sensors to patients, which can further disrupt their sleep.

To make it easier to diagnose and study sleep problems, researchers at MIT and Massachusetts General Hospital have devised a new way to monitor sleep stages without sensors attached to the body. Their device uses an advanced artificial intelligence algorithm to analyze the radio signals around the person and translate those measurements into sleep stages: light, deep, or rapid eye movement (REM).

“Imagine if your Wi-Fi router knows when you are dreaming, and can monitor whether you are having enough deep sleep, which is necessary for memory consolidation,” says Dina Katabi, the Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science, who led the study. “Our vision is developing health sensors that will disappear into the background and capture physiological signals and important health metrics, without asking the user to change her behavior in any way.”

Katabi worked on the study with Matt Bianchi, chief of the division of sleep medicine at MGH, and Tommi Jaakkola, the Thomas Siebel Professor of Electrical Engineering and Computer Science and a member of the Institute for Data, Systems, and Society at MIT. Mingmin Zhao, an MIT graduate student, is the paper’s first author, and Shichao Yue, another MIT graduate student, is also a co-author.

The researchers will present their new sensor at the International Conference on Machine Learning on Aug. 9.

Remote sensing

Katabi and members of her group in MIT’s Computer Science and Artificial Intelligence Laboratory have previously developed radio-based sensors that enable them to remotely measure vital signs and behaviors that can be indicators of health. These sensors consist of a wireless device, about the size of a laptop computer, that emits low-power radio frequency (RF) signals. As the radio waves reflect off of the body, any slight movement of the body alters the frequency of the reflected waves. Analyzing those waves can reveal vital signs such as pulse and breathing rate.

“It’s a smart Wi-Fi-like box that sits in the home and analyzes these reflections and discovers all of these changes in the body, through a signature that the body leaves on the RF signal,” Katabi says.

Katabi and her students have also used this approach to create a sensor called WiGait that can measure walking speed using wireless signals, which could help doctors predict cognitive decline, falls, certain cardiac or pulmonary diseases, or other health problems.

After developing those sensors, Katabi thought that a similar approach could also be useful for monitoring sleep, which is currently done while patients spend the night in a sleep lab hooked up to monitors such as electroencephalography (EEG) machines.

“The opportunity is very big because we don’t understand sleep well, and a high fraction of the population has sleep problems,” says Zhao. “We have this technology that, if we can make it work, can move us from a world where we do sleep studies once every few months in the sleep lab to continuous sleep studies in the home.”

To achieve that, the researchers had to come up with a way to translate their measurements of pulse, breathing rate, and movement into sleep stages. Recent advances in artificial intelligence have made it possible to train computer algorithms known as deep neural networks to extract and analyze information from complex datasets, such as the radio signals obtained from the researchers’ sensor. However, these signals have a great deal of information that is irrelevant to sleep and can be confusing to existing algorithms. The MIT researchers had to come up with a new AI algorithm based on deep neural networks, which eliminates the irrelevant information.

“The surrounding conditions introduce a lot of unwanted variation in what you measure. The novelty lies in preserving the sleep signal while removing the rest,” says Jaakkola. Their algorithm can be used in different locations and with different people, without any calibration.

Using this approach in tests of 25 healthy volunteers, the researchers found that their technique was about 80 percent accurate, which is comparable to the accuracy of ratings determined by sleep specialists based on EEG measurements.

“Our device allows you not only to remove all of these sensors that you put on the person, and make it a much better experience that can be done at home, it also makes the job of the doctor and the sleep technologist much easier,” Katabi says. “They don’t have to go through the data and manually label it.”

Sleep deficiencies

Other researchers have tried to use radio signals to monitor sleep, but these systems are accurate only 65 percent of the time and mainly determine whether a person is awake or asleep, not what sleep stage they are in. Katabi and her colleagues were able to improve on that by training their algorithm to ignore wireless signals that bounce off of other objects in the room and include only data reflected from the sleeping person.

The researchers now plan to use this technology to study how Parkinson’s disease affects sleep.

“When you think about Parkinson’s, you think about it as a movement disorder, but the disease is also associated with very complex sleep deficiencies, which are not very well understood,” Katabi says.

The sensor could also be used to learn more about sleep changes produced by Alzheimer’s disease, as well as sleep disorders such as insomnia and sleep apnea. It may also be useful for studying epileptic seizures that happen during sleep, which are usually difficult to detect.

Automatic image retouching system

A new system can automatically retouch images in the style of a professional photographer. It can run on a cellphone and display retouched images in real-time. Courtesy of the researchers (edited by MIT News)

The data captured by today’s digital cameras is often treated as the raw material of a final image. Before uploading pictures to social networking sites, even casual cellphone photographers might spend a minute or two balancing color and tuning contrast, with one of the many popular image-processing programs now available.

This week at Siggraph, the premier digital graphics conference, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory and Google are presenting a new system that can automatically retouch images in the style of a professional photographer. It’s so energy-efficient, however, that it can run on a cellphone, and it’s so fast that it can display retouched images in real-time, so that the photographer can see the final version of the image while still framing the shot.

The same system can also speed up existing image-processing algorithms. In tests involving a new Google algorithm for producing high-dynamic-range images, which capture subtleties of color lost in standard digital images, the new system produced results that were visually indistinguishable from those of the algorithm in about one-tenth the time — again, fast enough for real-time display.

The system is a machine-learning system, meaning that it learns to perform tasks by analyzing training data; in this case, for each new task it learned, it was trained on thousands of pairs of images, raw and retouched.

The work builds on an earlier project from the MIT researchers, in which a cellphone would send a low-resolution version of an image to a web server. The server would send back a “transform recipe” that could be used to retouch the high-resolution version of the image on the phone, reducing bandwidth consumption.

“Google heard about the work I’d done on the transform recipe,” says Michaël Gharbi, an MIT graduate student in electrical engineering and computer science and first author on both papers. “They themselves did a follow-up on that, so we met and merged the two approaches. The idea was to do everything we were doing before but, instead of having to process everything on the cloud, to learn it. And the first goal of learning it was to speed it up.”

Short cuts

In the new work, the bulk of the image processing is performed on a low-resolution image, which drastically reduces time and energy consumption. But this introduces a new difficulty, because the color values of the individual pixels in the high-res image have to be inferred from the much coarser output of the machine-learning system.

In the past, researchers have attempted to use machine learning to learn how to “upsample” a low-res image, or increase its resolution by guessing the values of the omitted pixels. During training, the input to the system is a low-res image, and the output is a high-res image. But this doesn’t work well in practice; the low-res image just leaves out too much data.

Gharbi and his colleagues — MIT professor of electrical engineering and computer science Frédo Durand and Jiawen Chen, Jon Barron, and Sam Hasinoff of Google — address this problem with two clever tricks. The first is that the output of their machine-learning system is not an image; rather, it’s a set of simple formulae for modifying the colors of image pixels. During training, the performance of the system is judged according to how well the output formulae, when applied to the original image, approximate the retouched version.

Taking bearings

The second trick is a technique for determining how to apply those formulae to individual pixels in the high-res image. The output of the researchers’ system is a three-dimensional grid, 16 by 16 by 8. The 16-by-16 faces of the grid correspond to pixel locations in the source image; the eight layers stacked on top of them correspond to different pixel intensities. Each cell of the grid contains formulae that determine modifications of the color values of the source images.

That means that each cell of one of the grid’s 16-by-16 faces has to stand in for thousands of pixels in the high-res image. But suppose that each set of formulae corresponds to a single location at the center of its cell. Then any given high-res pixel falls within a square defined by four sets of formulae.

Roughly speaking, the modification of that pixel’s color value is a combination of the formulae at the square’s corners, weighted according to distance. A similar weighting occurs in the third dimension of the grid, the one corresponding to pixel intensity.

The researchers trained their system on a data set created by Durand’s group and Adobe Systems, the creators of Photoshop. The data set includes 5,000 images, each retouched by five different photographers. They also trained their system on thousands of pairs of images produced by the application of particular image-processing algorithms, such as the one for creating high-dynamic-range (HDR) images. The software for performing each modification takes up about as much space in memory as a single digital photo, so in principle, a cellphone could be equipped to process images in a range of styles.

Finally, the researchers compared their system’s performance to that of a machine-learning system that processed images at full resolution rather than low resolution. During processing, the full-res version needed about 12 gigabytes of memory to execute its operations; the researchers’ version needed about 100 megabytes, or one-hundredth as much. The full-resolution version of the HDR system took about 10 times as long to produce an image as the original algorithm, or 100 times as long as the researchers’ system.

“This technology has the potential to be very useful for real-time image enhancement on mobile platforms,” says Barron. “Using machine learning for computational photography is an exciting prospect but is limited by the severe computational and power constraints of mobile phones. This paper may provide us with a way to sidestep these issues and produce new, compelling, real-time photographic experiences without draining your battery or giving you a laggy viewfinder experience.”

SMART trials self-driving wheelchair at hospital

Image: MIT CSAIL

Singapore and MIT have been at the forefront of autonomous vehicle development. First, there were self-driving golf buggies. Then, an autonomous electric car. Now, leveraging similar technology, MIT and Singaporean researchers have developed and deployed a self-driving wheelchair at a hospital.

Spearheaded by Daniela Rus, the Andrew (1956) and Erna Viterbi Professor of Electrical Engineering and Computer Science and director of MIT’s Computer Science and Artificial Intelligence Laboratory, this autonomous wheelchair is an extension of the self-driving scooter that launched at MIT last year — and it is a testament to the success of the Singapore-MIT Alliance for Research and Technology, or SMART, a collaboration between researchers at MIT and in Singapore.

Rus, who is also the principal investigator of the SMART Future Urban Mobility research group, says this newest innovation can help nurses focus more on patient care as they can get relief from logistics work which includes searching for wheelchairs and wheeling patients in the complex hospital network.

“When we visited several retirement communities, we realized that the quality of life is dependent on mobility. We want to make it really easy for people to move around,” Rus says.

Reshaping computer-aided design

Adriana Schulz, an MIT PhD student in the Computer Science and Artificial Intelligence Laboratory, demonstrates the InstantCAD computer-aided-design-optimizing interface. Photo: Rachel Gordon/MIT CSAIL

Almost every object we use is developed with computer-aided design (CAD). Ironically, while CAD programs are good for creating designs, using them is actually very difficult and time-consuming if you’re trying to improve an existing design to make the most optimal product. Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Columbia University are trying to make the process faster and easier: In a new paper, they’ve developed InstantCAD, a tool that lets designers interactively edit, improve, and optimize CAD models using a more streamlined and intuitive workflow.

 InstantCAD integrates seamlessly with existing CAD programs as a plug-in, meaning that designers don’t have to learn new tools to use it.

“From more ergonomic desks to higher-performance cars, this is really about creating better products in less time,” says Department of Electrical Engineering and Computer Science PhD student and lead author Adriana Schulz, who will be presenting the paper at this month’s SIGGRAPH computer-graphics conference in Los Angeles. “We think this could be a real game changer for automakers and other companies that want to be able to test and improve complex designs in a matter of seconds to minutes, instead of hours to days.”

The paper was co-written by Associate Professor Wojciech Matusik, PhD student Jie Xu, and postdoc Bo Zhu of CSAIL, as well as Associate Professor Eitan Grinspun and Assistant Professor Changxi Zheng of Columbia University.

Traditional CAD systems are “parametric,” which means that when engineers design models, they can change properties like shape and size (“parameters”) based on different priorities. For example, when designing a wind turbine you might have to make trade-offs between how much airflow you can get versus how much energy it will generate.

InstantCAD enables designers to interactively edit, improve, and optimize CAD models using a more streamlined and intuitive workflow. Photo: Rachel Gordon/MIT CSAIL

However, it can be difficult to determine the absolute best design for what you want your object to do, because there are many different options for modifying the design. On top of that, the process is time-consuming because changing a single property means having to wait to regenerate the new design, run a simulation, see the result, and then figure out what to do next.

With InstantCAD, the process of improving and optimizing the design can be done in real-time, saving engineers days or weeks. After an object is designed in a commercial CAD program, it is sent to a cloud platform where multiple geometric evaluations and simulations are run at the same time.

With this precomputed data, you can instantly improve and optimize the design in two ways. With “interactive exploration,” a user interface provides real-time feedback on how design changes will affect performance, like how the shape of a plane wing impacts air pressure distribution. With “automatic optimization,” you simply tell the system to give you a design with specific characteristics, like a drone that’s as lightweight as possible while still being able to carry the maximum amount of weight.

The reason it’s hard to optimize an object’s design is because of the massive size of the design space (the number of possible design options).

“It’s too data-intensive to compute every single point, so we have to come up with a way to predict any point in this space from just a small number of sampled data points,” says Schulz. “This is called ‘interpolation,’ and our key technical contribution is a new algorithm we developed to take these samples and estimate points in the space.”

Matusik says InstantCAD could be particularly helpful for more intricate designs for objects like cars, planes, and robots, particularly for industries like car manufacturing that care a lot about squeezing every little bit of performance out of a product.

“Our system doesn’t just save you time for changing designs, but has the potential to dramatically improve the quality of the products themselves,” says Matusik. “The more complex your design gets, the more important this kind of a tool can be.”

Because of the system’s productivity boosts and CAD integration, Schulz is confident that it will have immediate applications for industry. Down the line, she hopes that InstantCAD can also help lower the barrier for entry for casual users.

“In a world where 3-D printing and industrial robotics are making manufacturing more accessible, we need systems that make the actual design process more accessible, too,” Schulz says. “With systems like this that make it easier to customize objects to meet your specific needs, we hope to be paving the way to a new age of personal manufacturing and DIY design.”

The project was supported by the National Science Foundation.

Artificial intelligence suggests recipes based on food photos

Pic2Recipe, an artificial intelligence system developed at MIT, can take a photo of an entree and suggest a similar recipe to it. Photo: Jason Dorfman/MIT CSAIL

There are few things social media users love more than flooding their feeds with photos of food. Yet we seldom use these images for much more than a quick scroll on our cellphones. Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) believe that analyzing photos like these could help us learn recipes and better understand people’s eating habits.

In a new paper with the Qatar Computing Research Institute (QCRI), the team trained an artificial intelligence system called Pic2Recipe to look at a photo of food and be able to predict the ingredients and suggest similar recipes.

“In computer vision, food is mostly neglected because we don’t have the large-scale datasets needed to make predictions,” says Yusuf Aytar, an MIT postdoc who co-wrote a paper about the system with MIT Professor Antonio Torralba. “But seemingly useless photos on social media can actually provide valuable insight into health habits and dietary preferences.”

 The paper will be presented later this month at the Computer Vision and Pattern Recognition conference in Honolulu. CSAIL graduate student Nick Hynes was lead author alongside Amaia Salvador of the Polytechnic University of Catalonia in Spain. Co-authors include CSAIL postdoc Javier Marin, as well as scientist Ferda Ofli and research director Ingmar Weber of QCRI.

How it works

The web has spurred a huge growth of research in the area of classifying food data, but the majority of it has used much smaller datasets, which often leads to major gaps in labeling foods.

In 2014 Swiss researchers created the “Food-101” dataset and used it to develop an algorithm that could recognize images of food with 50 percent accuracy. Future iterations only improved accuracy to about 80 percent, suggesting that the size of the dataset may be a limiting factor.

Even the larger datasets have often been somewhat limited in how well they generalize across populations. A database from the City University in Hong Kong has over 110,000 images and 65,000 recipes, each with ingredient lists and instructions, but only contains Chinese cuisine.

The CSAIL team’s project aims to build off of this work but dramatically expand in scope. Researchers combed websites like All Recipes and Food.com to develop “Recipe1M,” a database of over 1 million recipes that were annotated with information about the ingredients in a wide range of dishes. They then used that data to train a neural network to find patterns and make connections between the food images and the corresponding ingredients and recipes.

Given a photo of a food item, Pic2Recipe could identify ingredients like flour, eggs, and butter, and then suggest several recipes that it determined to be similar to images from the database. (The team has an online demo where people can upload their own food photos to test it out.)

“You can imagine people using this to track their daily nutrition, or to photograph their meal at a restaurant and know what’s needed to cook it at home later,” says Christoph Trattner, an assistant professor at MODUL University Vienna in the New Media Technology Department who was not involved in the paper. “The team’s approach works at a similar level to human judgement, which is remarkable.”

The system did particularly well with desserts like cookies or muffins, since that was a main theme in the database. However, it had difficulty determining ingredients for more ambiguous foods, like sushi rolls and smoothies.

It was also often stumped when there were similar recipes for the same dishes. For example, there are dozens of ways to make lasagna, so the team needed to make sure that system wouldn’t “penalize” recipes that are similar when trying to separate those that are different. (One way to solve this was by seeing if the ingredients in each are generally similar before comparing the recipes themselves).

In the future, the team hopes to be able to improve the system so that it can understand food in even more detail. This could mean being able to infer how a food is prepared (i.e. stewed versus diced) or distinguish different variations of foods, like mushrooms or onions.

The researchers are also interested in potentially developing the system into a “dinner aide” that could figure out what to cook given a dietary preference and a list of items in the fridge.

“This could potentially help people figure out what’s in their food when they don’t have explicit nutritional information,” says Hynes. “For example, if you know what ingredients went into a dish but not the amount, you can take a photo, enter the ingredients, and run the model to find a similar recipe with known quantities, and then use that information to approximate your own meal.”

The project was funded, in part, by QCRI, as well as the European Regional Development Fund (ERDF) and the Spanish Ministry of Economy, Industry, and Competitiveness.

Bringing neural networks to cellphones

Image: Jose-Luis Olivares/MIT

In recent years, the best-performing artificial-intelligence systems — in areas such as autonomous driving, speech recognition, computer vision, and automatic translation — have come courtesy of software systems known as neural networks.

But neural networks take up a lot of memory and consume a lot of power, so they usually run on servers in the cloud, which receive data from desktop or mobile devices and then send back their analyses.

Last year, MIT associate professor of electrical engineering and computer science Vivienne Sze and colleagues unveiled a new, energy-efficient computer chip optimized for neural networks, which could enable powerful artificial-intelligence systems to run locally on mobile devices.

Now, Sze and her colleagues have approached the same problem from the opposite direction, with a battery of techniques for designing more energy-efficient neural networks. First, they developed an analytic method that can determine how much power a neural network will consume when run on a particular type of hardware. Then they used the method to evaluate new techniques for paring down neural networks so that they’ll run more efficiently on handheld devices.

The researchers describe the work in a paper they’re presenting next week at the Computer Vision and Pattern Recognition Conference. In the paper, they report that the methods offered as much as a 73 percent reduction in power consumption over the standard implementation of neural networks, and as much as a 43 percent reduction over the best previous method for paring the networks down.

Energy evaluator

Loosely based on the anatomy of the brain, neural networks consist of thousands or even millions of simple but densely interconnected information-processing nodes, usually organized into layers. Different types of networks vary according to their number of layers, the number of connections between the nodes, and the number of nodes in each layer.

The connections between nodes have “weights” associated with them, which determine how much a given node’s output will contribute to the next node’s computation. During training, in which the network is presented with examples of the computation it’s learning to perform, those weights are continually readjusted, until the output of the network’s last layer consistently corresponds with the result of the computation.

“The first thing we did was develop an energy-modeling tool that accounts for data movement, transactions, and data flow,” Sze says. “If you give it a network architecture and the value of its weights, it will tell you how much energy this neural network will take. One of the questions that people had is ‘Is it more energy efficient to have a shallow network and more weights or a deeper network with fewer weights?’ This tool gives us better intuition as to where the energy is going, so that an algorithm designer could have a better understanding and use this as feedback. The second thing we did is that, now that we know where the energy is actually going, we started to use this model to drive our design of energy-efficient neural networks.”

In the past, Sze explains, researchers attempting to reduce neural networks’ power consumption used a technique called “pruning.” Low-weight connections between nodes contribute very little to a neural network’s final output, so many of them can be safely eliminated, or pruned.

Principled pruning

With the aid of their energy model, Sze and her colleagues — first author Tien-Ju Yang and Yu-Hsin Chen, both graduate students in electrical engineering and computer science — varied this approach. Although cutting even a large number of low-weight connections can have little effect on a neural net’s output, cutting all of them probably would, so pruning techniques must have some mechanism for deciding when to stop.

The MIT researchers thus begin pruning those layers of the network that consume the most energy. That way, the cuts translate to the greatest possible energy savings. They call this method “energy-aware pruning.”

Weights in a neural network can be either positive or negative, so the researchers’ method also looks for cases in which connections with weights of opposite sign tend to cancel each other out. The inputs to a given node are the outputs of nodes in the layer below, multiplied by the weights of their connections. So the researchers’ method looks not only at the weights but also at the way the associated nodes handle training data. Only if groups of connections with positive and negative weights consistently offset each other can they be safely cut. This leads to more efficient networks with fewer connections than earlier pruning methods did.

“Recently, much activity in the deep-learning community has been directed toward development of efficient neural-network architectures for computationally constrained platforms,” says Hartwig Adam, the team lead for mobile vision at Google. “However, most of this research is focused on either reducing model size or computation, while for smartphones and many other devices energy consumption is of utmost importance because of battery usage and heat restrictions. This work is taking an innovative approach to CNN [convolutional neural net] architecture optimization that is directly guided by minimization of power consumption using a sophisticated new energy estimation tool, and it demonstrates large performance gains over computation-focused methods. I hope other researchers in the field will follow suit and adopt this general methodology to neural-network-model architecture design.”

Miniaturizing the brain of a drone


By Jennifer Chu

In recent years, engineers have worked to shrink drone technology, building flying prototypes that are the size of a bumblebee and loaded with even tinier sensors and cameras. Thus far, they have managed to miniaturize almost every part of a drone, except for the brains of the entire operation — the computer chip.

Standard computer chips for quadcoptors and other similarly sized drones process an enormous amount of streaming data from cameras and sensors, and interpret that data on the fly to autonomously direct a drone’s pitch, speed, and trajectory. To do so, these computers use between 10 and 30 watts of power, supplied by batteries that would weigh down a much smaller, bee-sized drone.

Now, engineers at MIT have taken a first step in designing a computer chip that uses a fraction of the power of larger drone computers and is tailored for a drone as small as a bottlecap. They will present a new methodology and design, which they call “Navion,” at the Robotics: Science and Systems conference, held this week at MIT.

The team, led by Sertac Karaman, the Class of 1948 Career Development Associate Professor of Aeronautics and Astronautics at MIT, and Vivienne Sze, an associate professor in MIT’s Department of Electrical Engineering and Computer Science, developed a low-power algorithm, in tandem with pared-down hardware, to create a specialized computer chip.

The key contribution of their work is a new approach for designing the chip hardware and the algorithms that run on the chip. “Traditionally, an algorithm is designed, and you throw it over to a hardware person to figure out how to map the algorithm to hardware,” Sze says. “But we found by designing the hardware and algorithms together, we can achieve more substantial power savings.”

“We are finding that this new approach to programming robots, which involves thinking about hardware and algorithms jointly, is key to scaling them down,” Karaman says.

The new chip processes streaming images at 20 frames per second and automatically carries out commands to adjust a drone’s orientation in space. The streamlined chip performs all these computations while using just below 2 watts of power — making it an order of magnitude more efficient than current drone-embedded chips.

Karaman, says the team’s design is the first step toward engineering “the smallest intelligent drone that can fly on its own.” He ultimately envisions disaster-response and search-and-rescue missions in which insect-sized drones flit in and out of tight spaces to examine a collapsed structure or look for trapped individuals. Karaman also foresees novel uses in consumer electronics.

“Imagine buying a bottlecap-sized drone that can integrate with your phone, and you can take it out and fit it in your palm,” he says. “If you lift your hand up a little, it would sense that, and start to fly around and film you. Then you open your hand again and it would land on your palm, and you could upload that video to your phone and share it with others.”

Karaman and Sze’s co-authors are graduate students Zhengdong Zhang and Amr Suleiman, and research scientist Luca Carlone.

From the ground up

Current minidrone prototypes are small enough to fit on a person’s fingertip and are extremely light, requiring only 1 watt of power to lift off from the ground. Their accompanying cameras and sensors use up an additional half a watt to operate.

“The missing piece is the computers — we can’t fit them in terms of size and power,” Karaman says. “We need to miniaturize the computers and make them low power.”

The group quickly realized that conventional chip design techniques would likely not produce a chip that was small enough and provided the required processing power to intelligently fly a small autonomous drone.

“As transistors have gotten smaller, there have been improvements in efficiency and speed, but that’s slowing down, and now we have to come up with specialized hardware to get improvements in efficiency,” Sze says.

The researchers decided to build a specialized chip from the ground up, developing algorithms to process data, and hardware to carry out that data-processing, in tandem.

Tweaking a formula

Specifically, the researchers made slight changes to an existing algorithm commonly used to determine a drone’s “ego-motion,” or awareness of its position in space. They then implemented various versions of the algorithm on a field-programmable gate array (FPGA), a very simple programmable chip. To formalize this process, they developed a method called iterative splitting co-design that could strike the right balance of achieving accuracy while reducing the power consumption and the number of gates.

A typical FPGA consists of hundreds of thousands of disconnected gates, which researchers can connect in desired patterns to create specialized computing elements. Reducing the number gates with co-design allowed the team to chose an FPGA chip with fewer gates, leading to substantial power savings.

“If we don’t need a certain logic or memory process, we don’t use them, and that saves a lot of power,” Karaman explains.

Each time the researchers tweaked the ego-motion algorithm, they mapped the version onto the FPGA’s gates and connected the chip to a circuit board. They then fed the chip data from a standard drone dataset — an accumulation of streaming images and accelerometer measurements from previous drone-flying experiments that had been carried out by others and made available to the robotics community.

“These experiments are also done in a motion-capture room, so you know exactly where the drone is, and we use all this information after the fact,” Karaman says.

Memory savings

For each version of the algorithm that was implemented on the FPGA chip, the researchers observed the amount of power that the chip consumed as it processed the incoming data and estimated its resulting position in space.

The team’s most efficient design processed images at 20 frames per second and accurately estimated the drone’s orientation in space, while consuming less than 2 watts of power.

The power savings came partly from modifications to the amount of memory stored in the chip. Sze and her colleagues found that they were able to shrink the amount of data that the algorithm needed to process, while still achieving the same outcome. As a result, the chip itself was able to store less data and consume less power.

“Memory is really expensive in terms of power,” Sze says. “Since we do on-the-fly computing, as soon as we receive any data on the chip, we try to do as much processing as possible so we can throw it out right away, which enables us to keep a very small amount of memory on the chip without accessing off-chip memory, which is much more expensive.”

In this way, the team was able to reduce the chip’s memory storage to 2 megabytes without using off-chip memory, compared to a typical embedded computer chip for drones, which uses off-chip memory on the order of a few gigabytes.

“Any which way you can reduce the power so you can reduce battery size or extend battery life, the better,” Sze says.

This summer, the team will mount the FPGA chip onto a drone to test its performance in flight. Ultimately, the team plans to implement the optimized algorithm on an application-specific integrated circuit, or ASIC, a more specialized hardware platform that allows engineers to design specific types of gates, directly onto the chip.

“We think we can get this down to just a few hundred milliwatts,” Karaman says. “With this platform, we can do all kinds of optimizations, which allows tremendous power savings.”

This research was supported, in part, by Air Force Office of Scientific Research and the National Science Foundation.

Peering into neural networks

Neural networks learn to perform computational tasks by analyzing large sets of training data. But once they’ve been trained, even their designers rarely have any idea what data elements they’re processing.
Image: Christine Daniloff/MIT

By Larry Hardesty

Neural networks, which learn to perform computational tasks by analyzing large sets of training data, are responsible for today’s best-performing artificial intelligence systems, from speech recognition systems, to automatic translators, to self-driving cars.

But neural nets are black boxes. Once they’ve been trained, even their designers rarely have any idea what they’re doing — what data elements they’re processing and how.

Two years ago, a team of computer-vision researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) described a method for peering into the black box of a neural net trained to identify visual scenes. The method provided some interesting insights, but it required data to be sent to human reviewers recruited through Amazon’s Mechanical Turk crowdsourcing service.

At this year’s Computer Vision and Pattern Recognition conference, CSAIL researchers will present a fully automated version of the same system. Where the previous paper reported the analysis of one type of neural network trained to perform one task, the new paper reports the analysis of four types of neural networks trained to perform more than 20 tasks, including recognizing scenes and objects, colorizing grey images, and solving puzzles. Some of the new networks are so large that analyzing any one of them would have been cost-prohibitive under the old method.

The researchers also conducted several sets of experiments on their networks that not only shed light on the nature of several computer-vision and computational-photography algorithms, but could also provide some evidence about the organization of the human brain.

Neural networks are so called because they loosely resemble the human nervous system, with large numbers of fairly simple but densely connected information-processing “nodes.” Like neurons, a neural net’s nodes receive information signals from their neighbors and then either “fire” — emitting their own signals — or don’t. And as with neurons, the strength of a node’s firing response can vary.

In both the new paper and the earlier one, the MIT researchers doctored neural networks trained to perform computer vision tasks so that they disclosed the strength with which individual nodes fired in response to different input images. Then they selected the 10 input images that provoked the strongest response from each node.

In the earlier paper, the researchers sent the images to workers recruited through Mechanical Turk, who were asked to identify what the images had in common. In the new paper, they use a computer system instead.

“We catalogued 1,100 visual concepts — things like the color green, or a swirly texture, or wood material, or a human face, or a bicycle wheel, or a snowy mountaintop,” says David Bau, an MIT graduate student in electrical engineering and computer science and one of the paper’s two first authors. “We drew on several data sets that other people had developed, and merged them into a broadly and densely labeled data set of visual concepts. It’s got many, many labels, and for each label we know which pixels in which image correspond to that label.”

The paper’s other authors are Bolei Zhou, co-first author and fellow graduate student; Antonio Torralba, MIT professor of electrical engineering and computer science; Aude Oliva, CSAIL principal research scientist; and Aditya Khosla, who earned his PhD as a member of Torralba’s group and is now the chief technology officer of the medical-computing company PathAI.

The researchers also knew which pixels of which images corresponded to a given network node’s strongest responses. Today’s neural nets are organized into layers. Data are fed into the lowest layer, which processes them and passes them to the next layer, and so on. With visual data, the input images are broken into small chunks, and each chunk is fed to a separate input node.

For every strong response from a high-level node in one of their networks, the researchers could trace back the firing patterns that led to it, and thus identify the specific image pixels it was responding to. Because their system could frequently identify labels that corresponded to the precise pixel clusters that provoked a strong response from a given node, it could characterize the node’s behavior with great specificity.

The researchers organized the visual concepts in their database into a hierarchy. Each level of the hierarchy incorporates concepts from the level below, beginning with colors and working upward through textures, materials, parts, objects, and scenes. Typically, lower layers of a neural network would fire in response to simpler visual properties — such as colors and textures — and higher layers would fire in response to more complex properties.

But the hierarchy also allowed the researchers to quantify the emphasis that networks trained to perform different tasks placed on different visual properties. For instance, a network trained to colorize black-and-white images devoted a large majority of its nodes to recognizing textures. Another network, when trained to track objects across several frames of video, devoted a higher percentage of its nodes to scene recognition than it did when trained to recognize scenes; in that case, many of its nodes were in fact dedicated to object detection.

One of the researchers’ experiments could conceivably shed light on a vexed question in neuroscience. Research involving human subjects with electrodes implanted in their brains to control severe neurological disorders has seemed to suggest that individual neurons in the brain fire in response to specific visual stimuli. This hypothesis, originally called the grandmother-neuron hypothesis, is more familiar to a recent generation of neuroscientists as the Jennifer-Aniston-neuron hypothesis, after the discovery that several neurological patients had neurons that appeared to respond only to depictions of particular Hollywood celebrities.

Many neuroscientists dispute this interpretation. They argue that shifting constellations of neurons, rather than individual neurons, anchor sensory discriminations in the brain. Thus, the so-called Jennifer Aniston neuron is merely one of many neurons that collectively fire in response to images of Jennifer Aniston. And it’s probably part of many other constellations that fire in response to stimuli that haven’t been tested yet.

Because their new analytic technique is fully automated, the MIT researchers were able to test whether something similar takes place in a neural network trained to recognize visual scenes. In addition to identifying individual network nodes that were tuned to particular visual concepts, they also considered randomly selected combinations of nodes. Combinations of nodes, however, picked out far fewer visual concepts than individual nodes did — roughly 80 percent fewer.

“To my eye, this is suggesting that neural networks are actually trying to approximate getting a grandmother neuron,” Bau says. “They’re not trying to just smear the idea of grandmother all over the place. They’re trying to assign it to a neuron. It’s this interesting hint of this structure that most people don’t believe is that simple.”

Shrinking data for surgical training

Image: MIT News

Laparoscopy is a surgical technique in which a fiber-optic camera is inserted into a patient’s abdominal cavity to provide a video feed that guides the surgeon through a minimally invasive procedure. Laparoscopic surgeries can take hours, and the video generated by the camera — the laparoscope — is often recorded. Those recordings contain a wealth of information that could be useful for training both medical providers and computer systems that would aid with surgery, but because reviewing them is so time consuming, they mostly sit idle.

Researchers at MIT and Massachusetts General Hospital hope to change that, with a new system that can efficiently search through hundreds of hours of video for events and visual features that correspond to a few training examples.

In work they presented at the International Conference on Robotics and Automation this month, the researchers trained their system to recognize different stages of an operation, such as biopsy, tissue removal, stapling, and wound cleansing.

But the system could be applied to any analytical question that doctors deem worthwhile. It could, for instance, be trained to predict when particular medical instruments — such as additional staple cartridges — should be prepared for the surgeon’s use, or it could sound an alert if a surgeon encounters rare, aberrant anatomy.

“Surgeons are thrilled by all the features that our work enables,” says Daniela Rus, an Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science and senior author on the paper. “They are thrilled to have the surgical tapes automatically segmented and indexed, because now those tapes can be used for training. If we want to learn about phase two of a surgery, we know exactly where to go to look for that segment. We don’t have to watch every minute before that. The other thing that is extraordinarily exciting to the surgeons is that in the future, we should be able to monitor the progression of the operation in real-time.”

Joining Rus on the paper are first author Mikhail Volkov, who was a postdoc in Rus’ group when the work was done and is now a quantitative analyst at SMBC Nikko Securities in Tokyo; Guy Rosman, another postdoc in Rus’ group; and Daniel Hashimoto and Ozanan Meireles of Massachusetts General Hospital (MGH).

Representative frames

The new paper builds on previous work from Rus’ group on “coresets,” or subsets of much larger data sets that preserve their salient statistical characteristics. In the past, Rus’ group has used coresets to perform tasks such as deducing the topics of Wikipedia articles or recording the routes traversed by GPS-connected cars.

In this case, the coreset consists of a couple hundred or so short segments of video — just a few frames each. Each segment is selected because it offers a good approximation of the dozens or even hundreds of frames surrounding it. The coreset thus winnows a video file down to only about one-tenth its initial size, while still preserving most of its vital information.

For this research, MGH surgeons identified seven distinct stages in a procedure for removing part of the stomach, and the researchers tagged the beginnings of each stage in eight laparoscopic videos. Those videos were used to train a machine-learning system, which was in turn applied to the coresets of four laparoscopic videos it hadn’t previously seen. For each short video snippet in the coresets, the system was able to assign it to the correct stage of surgery with 93 percent accuracy.

“We wanted to see how this system works for relatively small training sets,” Rosman explains. “If you’re in a specific hospital, and you’re interested in a specific surgery type, or even more important, a specific variant of a surgery — all the surgeries where this or that happened — you may not have a lot of examples.”

Selection criteria

The general procedure that the researchers used to extract the coresets is one they’ve previously described, but coreset selection always hinges on specific properties of the data it’s being applied to. The data included in the coreset — here, frames of video — must approximate the data being left out, and the degree of approximation is measured differently for different types of data.

Machine learning can be thought of as a problem of approximation, however. In this case, the system had to learn to identify similarities between frames of video in separate laparoscopic feeds that denoted the same phases of a surgical procedure. The metric of similarity that it arrived at also served to assess the similarity of video frames that were included in the coreset, to those that were omitted.

“Interventional medicine — surgery in particular — really comes down to human performance in many ways,” says Gregory Hager, a professor of computer science at Johns Hopkins University who investigates medical applications of computer and robotic technologies. “As in many other areas of human endeavor, like sports, the quality of the human performance determines the quality of the outcome that you achieve, but we don’t know a lot about, if you will, the analytics of what creates a good surgeon. Work like what Daniela is doing and our work really goes to the question of: Can we start to quantify what the process in surgery is, and then within that process, can we develop measures where we can relate human performance to the quality of care that a patient receives?”

“Right now, efficiency” — of the kind provided by coresets — “is probably not that important, because we’re dealing with small numbers of these things,” Hager adds. “But you could imagine that, if you started to record every surgery that’s performed — we’re talking tens of millions of procedures in the U.S. alone — now it starts to be interesting to think about efficiency.”

Page 11 of 12
1 9 10 11 12