Archive 19.03.2021

Page 3 of 6
1 2 3 4 5 6

RoboEYE: A semi-autonomous and gaze-guided wheelchair

Recent technological advancements have enabled the development of new tools to assist people with different types of disabilities, allowing them to move more freely in their surroundings and complete a number of everyday tasks. These include a broad range of smart technologies and devices, ranging from home assistants to mobile robots and bionic limbs.

Webinar: 8 Robotic & Automation Applications of Flexible 3D Printed Parts – Watch Now

Watch this webinar to learn how Lubrizol ESTANE® 3D TPU M95A and HP Multi Jet Fusion technology are enabling 3D printing of new and complex parts for robotic and automation applications. Learn why Forerunner 3D Printing uses flexible TPU in conjunction with rigid PA12 to provide customers with creative solutions to complex manufacturing problems.

HP Industrial 3D Printing – Robotics and End of arm tooling (EOAT)

From molds to final and spare parts, produce quality 3D printed parts with optimal mechanical properties without the long lead times. See how HP 3D Printing with HP Multi Jet Fusion helps these companies reinvent the design and manufacturing of custom robotics and grippers.

Webinar: 8 Robotic & Automation Applications of Flexible 3D Printed Parts

Join this webinar to learn how Lubrizol ESTANE® 3D TPU M95A and HP Multi Jet Fusion technology are enabling 3D printing of new and complex parts for robotic and automation applications. Learn why Forerunner 3D Printing uses flexible TPU in conjunction with rigid PA12 to provide customers with creative solutions to complex manufacturing problems.

Chad Jenkins’ talk – That Ain’t Right: AI Mistakes and Black Lives (with video)

In this technical talk, Chad Jenkins from the University of Michigan posed the following question: “who will pay the cost for the likely mistakes and potential misuse of AI systems?” As he states, “we are increasingly seeing how AI is having a pervasing impact on our lives, both for good and for bad. So, how do we ensure equal opportunity in science and technology?”

Abstract

It would be great to talk about the many compelling ideas, innovations, and new questions emerging in robotics research. I am fascinated by the ongoing NeRF Explosion, prospects for declarative robot programming by demonstration, and potential for a reemergence of probabilistic generative inference. However, there is a larger issue facing our intellectual enterprise: who will pay the cost for the likely mistakes and potential misuse of AI systems? My nation is poised to invest billions of dollars to remain the leader in artificial intelligence as well as quantum computing. This investment is critically needed to reinvigorate the science that will shape our future. In order to get the most from this investment, we have to create an environment that will produce innovations that are not just technical advancements but will also benefit and uplift everybody in our society. We are increasingly seeing how AI is having a pervasing impact on our lives, both for good and for bad. So, how do we ensure equal opportunity in science and technology? It starts with how we invest in scientific research. Currently, when we make investments, we only think about technological advancement. Equal opportunity is a non-priority and, at best, a secondary consideration. The fix is simple really — and something we can do almost immediately: we must start enforcing existing civil rights statutes for how government funds are distributed in support of scientific advancement. This will mostly affect universities, as the springwell that generates the intellectual foundation and workforce for other organizations that are leading the way in artificial intelligence.This talk will explore the causes of systemic inequality in AI, the impact of this inequity within the field of AI and across society today, and offer thoughts for the next wave of AI inference systems for robotics that could provide introspectability and accountability. Ideas explored build upon the BlackInComputing.org open letter and “Before we put $100 billion into AI…” opinion. Equal opportunity for anyone requires equal opportunity for everyone.

Biography

Odest Chadwicke Jenkins, Ph.D., is a Professor of Computer Science and Engineering and Associate Director of the Robotics Institute at the University of Michigan. Prof. Jenkins earned his B.S. in Computer Science and Mathematics at Alma College (1996), M.S. in Computer Science at Georgia Tech (1998), and Ph.D. in Computer Science at the University of Southern California (2003). He previously served on the faculty of Brown University in Computer Science (2004-15). His research addresses problems in interactive robotics and human-robot interaction, primarily focused on mobile manipulation, robot perception, and robot learning from demonstration. Prof. Jenkins has been recognized as a Sloan Research Fellow and is a recipient of the Presidential Early Career Award for Scientists and Engineers (PECASE). His work has also been supported by Young Investigator awards from the Office of Naval Research (ONR), the Air Force Office of Scientific Research (AFOSR) and the National Science Foundation (NSF). Prof. Jenkins is currently serving as Editor-in-Chief for the ACM Transactions on Human-Robot Interaction. He is a Fellow of the American Association for the Advancement of Science and Association for the Advancement of Artificial Intelligence, and Senior Member of the Association for Computing Machinery and the Institute of Electrical and Electronics Engineers. He is an alumnus of the Defense Science Study Group (2018-19).

Featuring Guest Panelist: Sarah Brown, Hadas Kress-Gazit, Aisha Walcott


The next technical talk will be delivered by Raia Hadsell from DeepMind, and it will take place on March 26 at 3pm EST. Keep up to date on this website.

Engineers utilize ‘swarmalation’ to design active materials for self-regulating soft robots

During the swarming of birds or fish, each entity coordinates its location relative to the others, so that the swarm moves as one larger, coherent unit. Fireflies on the other hand coordinate their temporal behavior: within a group, they eventually all flash on and off at the same time and thus act as synchronized oscillators.

Maximum Entropy RL (Provably) Solves Some Robust RL Problems

By Ben Eysenbach

Nearly all real-world applications of reinforcement learning involve some degree of shift between the training environment and the testing environment. However, prior work has observed that even small shifts in the environment cause most RL algorithms to perform markedly worse. As we aim to scale reinforcement learning algorithms and apply them in the real world, it is increasingly important to learn policies that are robust to changes in the environment.




Robust reinforcement learning maximizes reward on an adversarially-chosen environment.

Broadly, prior approaches to handling distribution shift in RL aim to maximize performance in either the average case or the worst case. The first set of approaches, such as domain randomization, train a policy on a distribution of environments, and optimize the average performance of the policy on these environments. While these methods have been successfully applied to a number of areas (e.g., self-driving cars, robot locomotion and manipulation), their success rests critically on the design of the distribution of environments. Moreover, policies that do well on average are not guaranteed to get high reward on every environment. The policy that gets the highest reward on average might get very low reward on a small fraction of environments. The second set of approaches, typically referred to as robust RL, focus on the worst-case scenarios. The aim is to find a policy that gets high reward on every environment within some set. Robust RL can equivalently be viewed as a two-player game between the policy and an environment adversary. The policy tries to get high reward, while the environment adversary tries to tweak the dynamics and reward function of the environment so that the policy gets lower reward. One important property of the robust approach is that, unlike domain randomization, it is invariant to the ratio of easy and hard tasks. Whereas robust RL always evaluates a policy on the most challenging tasks, domain randomization will predict that the policy is better if it is evaluated on a distribution of environments with more easy tasks.

Prior work has suggested a number of algorithms for solving robust RL problems. Generally, these algorithms all follow the same recipe: take an existing RL algorithm and add some additional machinery on top to make it robust. For example, robust value iteration uses Q-learning as the base RL algorithm, and modifies the Bellman update by solving a convex optimization problem in the inner loop of each Bellman backup. Similarly, Pinto ‘17 uses TRPO as the base RL algorithm and periodically updates the environment based on the behavior of the current policy. These prior approaches are often difficult to implement and, even once implemented correctly, they requiring tuning of many additional hyperparameters. Might there be a simpler approach, an approach that does not require additional hyperparameters and additional lines of code to debug?

To answer this question, we are going to focus on a type of RL algorithm known as maximum entropy RL, or MaxEnt RL for short (Todorov ‘06, Rawlik ‘08, Ziebart ‘10). MaxEnt RL is a slight variant of standard RL that aims to learn a policy that gets high reward while acting as randomly as possible; formally, MaxEnt maximizes the entropy of the policy. Some prior work has observed empirically that MaxEnt RL algorithms appear to be robust to some disturbances the environment. To the best of our knowledge, no prior work has actually proven that MaxEnt RL is robust to environmental disturbances.

In a recent paper, we prove that every MaxEnt RL problem corresponds to maximizing a lower bound on a robust RL problem. Thus, when you run MaxEnt RL, you are implicitly solving a robust RL problem. Our analysis provides a theoretically-justified explanation for the empirical robustness of MaxEnt RL, and proves that MaxEnt RL is itself a robust RL algorithm. In the rest of this post, we’ll provide some intuition into why MaxEnt RL should be robust and what sort of perturbations MaxEnt RL is robust to. We’ll also show some experiments demonstrating the robustness of MaxEnt RL.

Intuition

So, why would we expect MaxEnt RL to be robust to disturbances in the environment? Recall that MaxEnt RL trains policies to not only maximize reward, but to do so while acting as randomly as possible. In essence, the policy itself is injecting as much noise as possible into the environment, so it gets to “practice” recovering from disturbances. Thus, if the change in dynamics appears like just a disturbance in the original environment, our policy has already been trained on such data. Another way of viewing MaxEnt RL is as learning many different ways of solving the task (Kappen ‘05). For example, let’s look at the task shown in videos below: we want the robot to push the white object to the green region. The top two videos show that standard RL always takes the shortest path to the goal, whereas MaxEnt RL takes many different paths to the goal. Now, let’s imagine that we add a new obstacle (red blocks) that wasn’t included during training. As shown in the videos in the bottom row, the policy learned by standard RL almost always collides with the obstacle, rarely reaching the goal. In contrast, the MaxEnt RL policy often chooses routes around the obstacle, continuing to reach the goal for a large fraction of trials.

Standard RL MaxEnt RL

Trained and evaluated without the obstacle:

Trained without the obstacle, but evaluated with
the obstacle:

Theory

We now formally describe the technical results from the paper. The aim here is not to provide a full proof (see the paper Appendix for that), but instead to build some intuition for what the technical results say. Our main result is that, when you apply MaxEnt RL with some reward function and some dynamics, you are actually maximizing a lower bound on the robust RL objective. To explain this result, we must first define the MaxEnt RL objective: $J_{MaxEnt}(\pi; p, r)$ is the entropy-regularized cumulative return of policy $\pi$ when evaluated using dynamics $p(s’ \mid s, a)$ and reward function $r(s, a)$. While we will train the policy using one dynamics $p$, we will evaluate the policy on a different dynamics, $\tilde{p}(s’ \mid s, a)$, chosen by the adversary. We can now formally state our main result as follows:

The left-hand-side is the robust RL objective. It says that the adversary gets to choose whichever dynamics function $\tilde{p}(s’ \mid s, a)$ makes our policy perform as poorly as possible, subject to some constraints (as specified by the set $\tilde{\mathcal{P}}$). On the right-hand-side we have the MaxEnt RL objective (note that $\log T$ is a constant, and the function $\exp(\cdots)$ is always increasing). Thus, this objective says that a policy that has a high entropy-regularized reward (right hand-side) is guaranteed to also get high reward when evaluated on an adversarially-chosen dynamics.

The most important part of this equation is the set $\tilde{\mathcal{P}}$ of dynamics that the adversary can choose from. Our analysis describes precisely how this set is constructed and shows that, if we want a policy to be robust to a larger set of disturbances, all we have to do is increase the weight on the entropy term and decrease the weight on the reward term. Intuitively, the adversary must choose dynamics that are “close” to the dynamics on which the policy was trained. For example, in the special case where the dynamics are linear-Gaussian, this set corresponds to all perturbations where the original expected next state and the perturbed expected next state have a Euclidean distance less than $\epsilon$.

More Experiments

Our analysis predicts that MaxEnt RL should be robust to many types of disturbances. The first set of videos in this post showed that MaxEnt RL is robust to static obstacles. MaxEnt RL is also robust to dynamic perturbations introduced in the middle of an episode. To demonstrate this, we took the same robotic pushing task and knocked the puck out of place in the middle of the episode. The videos below show that the policy learned by MaxEnt RL is more robust at handling these perturbations, as predicted by our analysis.

Standard RL

MaxEnt RL

The policy learned by MaxEntRL is robust to dynamic perturbations of the puck (red frames).

Our theoretical results suggest that, even if we optimize the environment perturbations so the agent does as poorly as possible, MaxEnt RL policies will still be robust. To demonstrate this capability, we trained both standard RL and MaxEnt RL on a peg insertion task shown below. During evaluation, we changed the position of the hole to try to make each policy fail. If we only moved the hole position a little bit ($\le$ 1 cm), both policies always solved the task. However, if we moved the hole position up to 2cm, the policy learned by standard RL almost never succeeded in inserting the peg, while the MaxEnt RL policy succeeded in 95% of trials. This experiment validates our theoretical findings that MaxEnt really is robust to (bounded) adversarial disturbances in the environment.

Standard RL

MaxEnt RL

Evaluation on adversarial perturbations

MaxEnt RL is robust to adversarial perturbations of the hole (where the robot
inserts the peg).

Conclusion

In summary, our paper shows that a commonly-used type of RL algorithm, MaxEnt RL, is already solving a robust RL problem. We do not claim that MaxEnt RL will outperform purpose-designed robust RL algorithms. However, the striking simplicity of MaxEnt RL compared with other robust RL algorithms suggests that it may be an appealing alternative to practitioners hoping to equip their RL policies with an ounce of robustness.

Acknowledgements
Thanks to Gokul Swamy, Diba Ghosh, Colin Li, and Sergey Levine for feedback on drafts of this post, and to Chloe Hsu and Daniel Seita for help with the blog.


This post is based on the following paper:

Page 3 of 6
1 2 3 4 5 6