Page 438 of 525
1 436 437 438 439 440 525

Robots that learn to adapt

Figure 1: Our model-based meta reinforcement learning algorithm enables a legged robot to adapt online in the face of an unexpected system malfunction (note the broken front right leg).

By Anusha Nagabandi and Ignasi Clavera

Humans have the ability to seamlessly adapt to changes in their environments: adults can learn to walk on crutches in just a few seconds, people can adapt almost instantaneously to picking up an object that is unexpectedly heavy, and children who can walk on flat ground can quickly adapt their gait to walk uphill without having to relearn how to walk. This adaptation is critical for functioning in the real world.

Robots, on the other hand, are typically deployed with a fixed behavior (be it hard-coded or learned), allowing them succeed in specific settings, but leading to failure in others: experiencing a system malfunction, encountering a new terrain or environment changes such as wind, or needing to cope with a payload or other unexpected perturbations. The idea behind our latest research is that the mismatch between predicted and observed recent states should inform the robot to update its model into one that more accurately describes the current situation. Noticing our car skidding on the road, for example, informs us that our actions are having a different effect than expected, and thus allows us to plan our consequent actions accordingly (Fig. 2). In order for our robots to be successful in the real world, it is critical that they have this ability to use their past experience to quickly and flexibly adapt. To this effect, we developed a model-based meta-reinforcement learning algorithm capable of fast adaptation.


Figure 2: The driver normally makes decisions based on his/her model of the world. Suddenly encountering a slippery road, however, leads to unexpected skidding. Online adaptation of the driver’s world model based on just a few of these observations of model mismatch allows for fast recovery.

Fast Adaptation

Prior work has used (a) trial-and-error adaptation approaches (Cully et al., 2015) as well as (b) model-free meta-RL approaches (Wang et al., 2016; Finn et al., 2017) to enable agents to adapt after a handful of trials. However, our work takes this adaptation ability to the extreme. Rather than adaptation requiring a few episodes of experience under the new settings, our adaptation happens online on the scale of just a few timesteps (i.e., milliseconds): so fast that it can hardly be noticed.

We achieve this fast adaptation through the use of meta-learning (discussed below) in a model-based learning setup. In the model-based setting, rather than adapting based on the rewards that are achieved during rollouts, data for updating the model is readily available at every timestep in the form of model prediction errors on recent experiences. This model-based approach enables the robot to meaningfully update the model using only a small amount of recent data.

Method Overview


Fig 3. The agent uses recent experience to fine-tune the prior model into an adapted one, which the planner then uses to perform its action selection. Note that we omit details of the update rule in this post, but we experiment with two such options in our work.

Our method follows the general formulation shown in Fig. 3 of using observations from recent data to perform adaptation of a model, and it is analogous to the overall framework of adaptive control (Sastry and Isidori, 1989; Åström and Wittenmark, 2013). The real challenge here, however, is how to successfully enable model adaptation when the models are complex, nonlinear, high-capacity function approximators (i.e., neural networks). Naively implementing SGD on the model weights is not effective, as neural networks require much larger amounts of data in order to perform meaningful learning.

Thus, we enable fast adaptation at test time by explicitly training with this adaptation objective during (meta-)training time, as explained in the following section. Once we meta-train across data from various settings in order to get this prior model (with weights denoted as ) that is good at adaptation, the robot can then adapt from this at each time step (Fig. 3) by using this prior in conjunction with recent experience to fine-tune its model to the current setting at hand, thus allowing for fast online adaptation.

Meta-training:

At any given time step , we are in state , we take action , and we end up in some resulting state according to the underlying dynamics function . The true dynamics are unknown to us, so we instead want to fit some learned dynamics model that makes predictions as well as possible on observed data points of the form . Our planner can use this estimated dynamics model in order to perform action selection.

Assuming that any detail or setting could have changed at any time step along the rollout, we consider temporally-close time steps as being able to inform us about the “task” details of our current situation: operating in different parts of the state space, enduring disturbances, attempting new goals/reward, experiencing a system malfunction, etc. Thus, in order for our model to be the most useful for planning, we want to first update it using our recently observed data.

At training time (Fig. 4), what this amounts to is selecting a consecutive sequence of (M+K) data points, using the first M to update our model weights from to , and then optimizing for this new to be good at predicting the state transitions for the next K time steps. This newly formulated loss function represents prediction error on the future K points, after adapting the weights using information from the past K points:

where

In other words, does not need to result in good dynamics predictions. Instead, needs to be such that it can use task-specific (i.e. recent) data points to quickly adapt itself into new weights that do result in good dynamics predictions. See the MAML blog post for more intuition on this formulation.


Fig 4. Meta-training procedure for obtaining a $\theta$ such that the adaptation of $\theta$ using the past $M$ timesteps of experience produces a model that performs well for the future $K$ timesteps.

Simulation Experiments

We conducted experiments on simulated robotic systems to test the ability of our method to adapt to sudden changes in the environment, as well as to generalize beyond the training environments. Note that we meta-trained all agents on some distribution of tasks/environments (see paper for details), but we then evaluated their adaptation ability on unseen and changing environments at test time. Figure 5 shows a cheetah robot that was trained on piers of varying random buoyancy, and then tested on a pier with sections of varying buoyancy in the water. This environment demonstrates the need for not only adaptation, but for fast/online adaptation. Figure 6 also demonstrates the need for online adaptation by showing an ant robot that was trained with different crippled legs, but tested on an unseen leg failure occurring part-way through a rollout. In these qualitative results below, we compare our gradient-based adaptive learner (‘GrBAL’) to a standard model-based learner (‘MB’) that was trained on the same variation of training tasks but has no explicit mechanism for adaptation.


Fig 5. Cheetah: Both methods are trained on piers of varying buoyancy. Ours is able to perform fast online adaptation at run-time to cope with changing buoyancy over the course of a new pier.


Fig 6. Ant: Both methods are trained on different joints being crippled. Ours is able to use its recent experiences to adapt its knowledge and cope with an unexpected and new malfunction in the form of a crippled leg (for a leg that was never seen as crippled during training).

The fast adaptation capabilities of this model-based meta-RL method allow our simulated robotic systems to attain substantial improvement in performance and/or sample efficiency over prior state-of-the-art methods, as well as over ablations of this method with the choice of yes/no online adaptation, yes/no meta-training, and yes/no dynamics model. Please refer to our paper for these quantitative comparisons.

Hardware Experiments


Fig 7. Our real dynamic legged millirobot, on which we successfully employ our model-based meta-reinforcement learning algorithm to enable online adaptation to disturbances and new settings such as traversing a slippery slope, accommodating payloads, accounting for pose miscalibration errors, and adjusting to a missing leg.

To highlight not only the sample efficiency of our meta reinforcement learning approach, but also the importance of fast online adaptation in the real world, we demonstrate our approach on a real dynamic legged millirobot (see Fig 7). This small 6-legged robot presents a modeling and control challenge in the form of highly stochastic and dynamic movement. This robot is an excellent candidate for online adaptation for many reasons: the rapid manufacturing techniques and numerous custom-design steps used to construct this robot make it impossible to reproduce the same dynamics each time, its linkages and other body parts deteriorate over time, and it moves very quickly and dynamically as a function of its terrain.

We meta-train this legged robot on various terrains, and we then test the agent’s learned ability to adapt online to new tasks (at run-time) including a missing leg, novel slippery terrains and slopes, miscalibration or errors in pose estimation, and new payloads to be pulled. Our hardware experiments compare our method to (a) standard model-based learning (‘MB’), with neither adaptation nor meta-learning, and well as (b) a dynamic evaluation (‘MB+DE’) comparison having adaptation, but performing the adaptation from a non-meta-learned prior. These results (Fig. 8-10) show the need for not only adaptation, but adaptation from an explicitly meta-learned prior.


Fig 8. Missing leg.


Fig 9. Payload.


Fig 10. Miscalibrated Pose.

By effectively adapting online, our method prevents drift from a missing leg, prevents sliding sideways down a slope, accounts for pose miscalibration errors, and adjusts to pulling payloads. Note that these tasks/environments share enough commonalities with the locomotion behaviors learned during the meta-training phase such that it would be useful to draw from that prior knowledge (rather than learn from scratch), but they are different enough that they do require effective online adaptation for success.


Fig 11. The ability to draw from prior knowledge as well as to learn from recent knowledge enables GrBAL (ours) to clearly outperform both MB and MB+DE when tested on environments that (1) require online adaptation and/or (2) were never seen during training.

Future Directions

This work enables online adaptation of high-capacity neural network dynamics models, through the use of meta-learning. By allowing local fine-tuning of a model starting from a meta-learned prior, we preclude the need for an accurate global model, as well as allow for fast adaptation to new situations such as unexpected environmental changes. Although we showed results of adaptation on various tasks in both simulation and hardware, there remain numerous relevant avenues for improvement.

First, although this setup of always fine-tuning from our pre-trained prior can be powerful, one limitation of this approach is that even numerous times of seeing a new setting would result in the same performance as the 1st time of seeing it. In this follow-up work, we take steps to address precisely this issue of improving over time, while simultaneously not forgetting older skills as a consequence of experiencing new ones.

Another area for improvement includes formulating conditions or an analysis of the capabilities and limitations of this adaptation: what can or cannot be adapted to, given the knowledge contained in the prior? For example, consider two humans learning to ride a bicycle who suddenly experience a slippery road. Assume that neither of them have ridden a bike before, so they have never fallen off a bike before. Human A might fall, break their wrist, and require months of physical therapy. Human B, on the other hand, might draw from his/her prior knowledge of martial arts and thus implement a good “falling” procedure (i.e., roll onto your back instead of trying to break a fall with the wrist). This is a case when both humans are trying to execute a new task, but other experiences from their prior knowledge significantly affect the result of their adaptation attempt. Thus, having some mechanism for understanding limitations of adaptation, under the existing prior, would be interesting.


We would like to thank Sergey Levine and Chelsea Finn for their feedback during the preparation of this blog post. We would also like to thank our co-authors Simin Liu, Ronald Fearing, and Pieter Abbeel. This post is based on the following paper:

  • Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning
    A Nagabandi*, I Clavera*, S Liu, R Fearing, P Abbeel, S Levine, C Finn
    International Conference on Learning Representations (ICLR) 2019
    Arxiv, Code, Project Page

This article was initially published on the BAIR blog, and appears here with the authors’ permission.

Using hydraulics for robots: Introduction

From the Reservoir the fluid goes to the Pump where there are three connections. 1. Accumulator(top) 2. Relief Valve(bottom) & 3. Control Valve. The Control Valve goes to the Cylinder which returns to a filter and then back to the Reservoir.

Hydraulics are sometimes looked at as an alternative to electric motors.

Some of the primary reasons for this include:

  • Linear motion
  • Very high torque applications
  • Small package for a given torque
  • Large number of motors that can share the reservoir/pump can increase volume efficiency
  • You can add dampening for shock absorption

However there are also some downsides to using hydraulics including:

  • More parts are required (however they can be separated from the robot in some applications)
  • Less precise control (unless you use a proportional valve)
  • Hydraulic fluid (mess, leaks, mess, and more mess)

Hydraulic systems use an incompressible liquid (as opposed to pneumatics that use a compressible gas) to transfer force from one place to another. Since the hydraulic system will be a closed system (ignore relief valves for now) when you apply a force to one end of the system that force is transferred to another part of that system. By manipulating the volume of fluid in different parts of the system you can change the forces in different parts of the system (Remember Pascal’s Law from high school??).

So here are some of the basic components used (or needed) to develop a hydraulic system.

Pump

The pump is the heart of your hydraulic system. The pump controls the flow and pressure of the hydraulic fluid in your system that is used for moving the actuators.

The size and speed of the pump determines the flow rate and the load at the actuator determines the pressure. For those familiar with electric motors the pressure in the system is like the voltage, and the flow rate is like the electrical current.

Pump Motor

We know what the pump is, but you need a way to “power” the pump so that it can pump the hydraulic fluid. Generally the way you power the pump is by connecting it to an electric motor or gas/diesel engine.

Hydraulic Fluid

Continuing the analogy where the pump is the heart, the hydraulic fluid is the blood of the system. The fluid is what is used to transfer the pressure from the pump to the motor.

Hydraulic Hoses (and fittings to connect things)

These are the arteries and veins of the system that allows for the transfer of hydraulic fluid.

Hydraulic Actuators – Motor/Cylinder

cylinder
Cylinder [Source]
Motor [Source]

The actuator is generally the reason we are designing this hydraulic system. The motor is essentially the same as the pump; however instead of going from a mechanical input to generating the pressure, the motor converts the pressure into mechanical motion.

Actuators can come in the form of linear motion (referred to as a hydraulic cylinder) or rotary motion motors.

For cylinders, you generally apply a force and the cylinder end extends, and then if you release the force and the cylinder gets pushed back in (think of a car lift). This is the classic and most common use of hydraulics.

For rotary motors there are generally 3 connections on the motor.

  • A – Hydraulic fluid input/output line
  • B – Hydraulic fluid input/output line
  • Drain – Hydraulic fluid output line (generally only on motors, not cylinders)

Depending on the motor you can either only use A as the fluid input and B as the fluid output and the motor only spins in one direction. Or some motors can spin in either direction based on if A or B is used as the input or output of the hydraulic fluid.

The drain line is used so when the system is turned off, the fluid has a way to get out of the motor (to deal with internal leakage and to not blow out seals). In some motors the drain line is connected to one of the A or B lines. Also their are sometimes multiple drain lines so that you can route the hydraulic hoses from different locations.

Note: While the pump and motor are basically the same component. You usually can not switch their role due to how they are designed to handle pressure and the pumps usually not being backdrivable.

There are some actuators that are designed to be leakless and hold the fluid and pressure (using valves) so that the force from the actuator is held even without the pump. For example these are used in things like automobile carrying trucks that need to stack cars for transport.

Reservoir

This is essentially a bucket that holds the fluid. They are usually a little fancier so that they have over pressure relief valves, lids, filters, etc..

The reservoir is also often a place where the hydraulic fluid can cool down if it is getting hot within the system. As the fluid gets hotter it can get thinner which can result in increased wear of your motor and pump.

Filter

Keeps your hydraulic fluid clean before going back to the reservoir. Kind of like a persons kidneys.

Valves (and Solenoids)

solenoid valve
Valve (metal) with Solenoid (black) attached on top [Source]

Valves are things that open and close to allow the control of fluid. These can be controlled by hand (ie. manual), or more often my some other means.

One common method is to use a solenoid which is a device that when you apply a voltage can be used to open a valve. Some solenoids are latching, which means you quickly apply a voltage and it opens the valves, and then you apply a voltage again (usually switching polarity) to close the valve.

There are many types of valves, I will detail a few below.

Check Valves (One Way Valve)

These are a type of valve that can be inline to allow the flow of hydraulic fluid in only one direction.

Relief Valve

These are a type of valve that automatically opens (And lets fluid out) when the pressure gets to high. This is a safety feature so you don’t damage other components and/or cause an explosion.

Pilot Valve

These are another special class of valve that can use a small pressure to control a much larger pressure valve.

Pressure & Flow-rate Sensors/Gauges 

You need to have sensors (with a gauge or computer output) to measure the pressure and/or flow-rate so you know how the system is operating and if it is operating how you expect it to operate.

Accumulator

The accumulator is essentially just a tank that holds fluid under pressure that has its own pressure source. This is used to help smooth out the pressure and take any sudden loads from the motor by having this pressure reserve. This is almost like how capacitors are used in electrical power circuits.

The pressure source in the accumulator is often a weight, springs, or a gas.

There will often be a check valve to make sure the fluid in the accumulator does not go back to the pump.


I am not an expert on hydraulic systems. But I hope this quick introduction helps people. Liked it? Take a second to support me on Patreon!

How to tell whether machine-learning systems are robust enough for the real world

Adversarial examples are slightly altered inputs that cause neural networks to make classification mistakes they normally wouldn’t, such as classifying an image of a cat as a dog.
Image: MIT News Office

By Rob Matheson

MIT researchers have devised a method for assessing how robust machine-learning models known as neural networks are for various tasks, by detecting when the models make mistakes they shouldn’t.

Convolutional neural networks (CNNs) are designed to process and classify images for computer vision and many other tasks. But slight modifications that are imperceptible to the human eye — say, a few darker pixels within an image — may cause a CNN to produce a drastically different classification. Such modifications are known as “adversarial examples.” Studying the effects of adversarial examples on neural networks can help researchers determine how their models could be vulnerable to unexpected inputs in the real world.

For example, driverless cars can use CNNs to process visual input and produce an appropriate response. If the car approaches a stop sign, it would recognize the sign and stop. But a 2018 paper found that placing a certain black-and-white sticker on the stop sign could, in fact, fool a driverless car’s CNN to misclassify the sign, which could potentially cause it to not stop at all.

However, there has been no way to fully evaluate a large neural network’s resilience to adversarial examples for all test inputs. In a paper they are presenting this week at the International Conference on Learning Representations, the researchers describe a technique that, for any input, either finds an adversarial example or guarantees that all perturbed inputs — that still appear similar to the original — are correctly classified. In doing so, it gives a measurement of the network’s robustness for a particular task.

Similar evaluation techniques do exist but have not been able to scale up to more complex neural networks. Compared to those methods, the researchers’ technique runs three orders of magnitude faster and can scale to more complex CNNs.

The researchers evaluated the robustness of a CNN designed to classify images in the MNIST dataset of handwritten digits, which comprises 60,000 training images and 10,000 test images. The researchers found around 4 percent of test inputs can be perturbed slightly to generate adversarial examples that would lead the model to make an incorrect classification.

“Adversarial examples fool a neural network into making mistakes that a human wouldn’t,” says first author Vincent Tjeng, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL). “For a given input, we want to determine whether it is possible to introduce small perturbations that would cause a neural network to produce a drastically different output than it usually would. In that way, we can evaluate how robust different neural networks are, finding at least one adversarial example similar to the input or guaranteeing that none exist for that input.”

Joining Tjeng on the paper are CSAIL graduate student Kai Xiao and Russ Tedrake, a CSAIL researcher and a professor in the Department of Electrical Engineering and Computer Science (EECS).

CNNs process images through many computational layers containing units called neurons. For CNNs that classify images, the final layer consists of one neuron for each category. The CNN classifies an image based on the neuron with the highest output value. Consider a CNN designed to classify images into two categories: “cat” or “dog.” If it processes an image of a cat, the value for the “cat” classification neuron should be higher. An adversarial example occurs when a tiny modification to that image causes the “dog” classification neuron’s value to be higher.

The researchers’ technique checks all possible modifications to each pixel of the image. Basically, if the CNN assigns the correct classification (“cat”) to each modified image, no adversarial examples exist for that image.

Behind the technique is a modified version of “mixed-integer programming,” an optimization method where some of the variables are restricted to be integers. Essentially, mixed-integer programming is used to find a maximum of some objective function, given certain constraints on the variables, and can be designed to scale efficiently to evaluating the robustness of complex neural networks.

The researchers set the limits allowing every pixel in each input image to be brightened or darkened by up to some set value. Given the limits, the modified image will still look remarkably similar to the original input image, meaning the CNN shouldn’t be fooled. Mixed-integer programming is used to find the smallest possible modification to the pixels that could potentially cause a misclassification.

The idea is that tweaking the pixels could cause the value of an incorrect classification to rise. If cat image was fed in to the pet-classifying CNN, for instance, the algorithm would keep perturbing the pixels to see if it can raise the value for the neuron corresponding to “dog” to be higher than that for “cat.”

If the algorithm succeeds, it has found at least one adversarial example for the input image. The algorithm can continue tweaking pixels to find the minimum modification that was needed to cause that misclassification. The larger the minimum modification — called the “minimum adversarial distortion” — the more resistant the network is to adversarial examples. If, however, the correct classifying neuron fires for all different combinations of modified pixels, then the algorithm can guarantee that the image has no adversarial example.

“Given one input image, we want to know if we can modify it in a way that it triggers an incorrect classification,” Tjeng says. “If we can’t, then we have a guarantee that we searched across the whole space of allowable modifications, and found that there is no perturbed version of the original image that is misclassified.”

In the end, this generates a percentage for how many input images have at least one adversarial example, and guarantees the remainder don’t have any adversarial examples. In the real world, CNNs have many neurons and will train on massive datasets with dozens of different classifications, so the technique’s scalability is critical, Tjeng says.

“Across different networks designed for different tasks, it’s important for CNNs to be robust against adversarial examples,” he says. “The larger the fraction of test samples where we can prove that no adversarial example exists, the better the network should perform when exposed to perturbed inputs.”

“Provable bounds on robustness are important as almost all [traditional] defense mechanisms could be broken again,” says Matthias Hein, a professor of mathematics and computer science at Saarland University, who was not involved in the study but has tried the technique. “We used the exact verification framework to show that our networks are indeed robust … [and] made it also possible to verify them compared to normal training.”

Concept Systems and ATI Deburring Tools Reshape Aluminum Manufacturing Processes

With ATI Deburring Tools, Concept Systems developed a robotic material removal solution for their customer that enables them to produce higher quality parts in less time. The solution reduces scrap and rework, and alleviates safety concerns associated with hand-grinding...

Does artificial intelligence deserve the same ethical protections we give to animals?

In the HBO show Westworld, robots designed to display emotion, feel pain, and die like humans populate a sprawling western-style theme park for wealthy guests who pay to act out their fantasies. As the show progresses, and the robots learn more about the world in which they live, they begin to realize that they are the playthings of the person who programmed them.

LIDAR: How Smart Active Alignment Mechanisms can Reduce Manufacturing Costs

Today, tantalizing futuristic transportation applications have motivated this mushrooming industry to develop an impressive array of clever new implementations, accompanied by a wave of investment from venture capitalists, software giants, and established players ...

#285: On Storytelling Robots for Children, with Hae Won Park

dam-prod.media.mit.edu

In this episode, Lauren Klein interviews Hae Won Park, a Research Scientist in the Personal Robots Group at the MIT Media Lab, about storytelling robots for children. Dr. Park elaborates on enabling robots to understand how children are learning, and how they can help children with literacy skills and encourage exploration.

Hae Won Park

Dr. Hae Won Park is a Research Scientist in the Personal Robots Group at the MIT Media Lab. She is leading the group’s project to enable long-term personalization of artificially intelligent systems, specifically in areas such as early childhood education, healthcare, eldercare, family interaction, and emotional wellness. Prior to her work at the Media Lab, Dr. Park received her PhD in the Human-Automation Systems (HumAnS) Laboratory at Georgia Tech, where she was advised by Professor Ayanna Howard. Dr. Park is also a co-founder of Zyrobotics, a company that uses technology to assist in childhood education.

Links

Page 438 of 525
1 436 437 438 439 440 525