Robot reinforcement learning: safety in real-world applications
How can we make a robot learn in the real world while ensuring safety? In this work, we show how it’s possible to face this problem. The key idea to exploit domain knowledge and use the constraint definition to our advantage. Following our approach, it’s possible to implement learning robotic agents that can explore and learn in an arbitrary environment while ensuring safety at the same time.
Safety and learning in robots
Safety is a fundamental feature in real-world robotics applications: robots should not cause damage to the environment, to themselves, and they must ensure the safety of people operating around them. To ensure safety when we deploy a new application, we want to avoid constraint violation at any time. These stringent safety constraints are difficult to enforce in a reinforcement learning setting. This is the reason why it is hard to deploy learning agents in the real world. Classical reinforcement learning agents use random exploration, such as Gaussian policies, to act in the environment and extract useful knowledge to improve task performance. However, random exploration may cause constraint violations. These constraint violations must be avoided at all costs in robotic platforms, as they often result in a major system failure.
While the robotic framework is challenging, it is also a very well-known and well-studied problem: thus, we can exploit some key results and knowledge from the field. Indeed, often a robot’s kinematics and dynamics are known and can be exploited by the learning systems. Also, physical constraints e.g., avoiding collisions and enforcing joint limits, can be written in analytical form. All this information can be exploited by the learning robot.
Our approach
Many reinforcement learning approaches try to solve the safety problem by incorporating the constraint information in the learning process. This approach often results in slower learning performances, while not being able to ensure safety during the whole learning process. Instead, we present a novel point of view to the problem, introducing ATACOM (Acting on the TAngent space of the COnstraint Manifold). Different from other state-of-the-art approaches, ATACOM tries to create a safe action space in which every action is inherently safe. To do so, we need to construct the constraint manifold and exploit the basic domain knowledge of the agent. Once we have the constraint manifold, we define our action space as the tangent space to the constraint manifold.
We can construct the constraint manifold using arbitrary differentiable constraints. The only requirement is that the constraint function must depend only on controllable variables i.e. the variables that we can directly control with our control action. An example could be the robot joint positions and velocities.
We can support both equality and inequality constraints. Inequality constraints are particularly important as they can be used to avoid specific areas of the state space or to enforce the joint limits. However, they don’t define a manifold. To obtain a manifold, we transform the inequality constraints into equality constraints by introducing slack variables.
With ATACOM, we can ensure safety by taking action on the tangent space of the constraint manifold. An intuitive way to see why this is true is to consider the motion on the surface of a sphere: any point with a velocity tangent to the sphere itself will keep moving on the surface of the sphere. The same idea can be extended to more complex robotic systems, considering the acceleration of system variables (or the generalized coordinates, when considering a mechanical system) instead of velocities.
The above-mentioned framework only works if we consider continuous-time systems, when the control action is the instantaneous velocity or acceleration. Unfortunately, the vast majority of robotic controllers and reinforcement learning approaches are discrete-time digital controllers. Thus, even taking the tangent direction of the constraint manifold will result in a constraint violation. It is always possible to reduce the violations by increasing the control frequency. However, error accumulates over time, causing a drift from the constraint manifold. To solve this issue, we introduce an error correction term that ensures that the system stays on the reward manifold. In our work, we implement this term as a simple proportional controller.
Finally, many robotics systems cannot be controlled directly by velocity or accelerations. However, if an inverse dynamics model or a tracking controller is available, we can use it and compute the correct control action.
Results
We tried ATACOM on a simulated air hockey task. We use two different types of robots. The first one is a planar robot. In this task, we enforce joint velocities and we avoid the collision of the end-effector with table boundaries.
The second robot is a Kuka Iiwa 14 arm. In this scenario, we constrained the end-effector to move on the planar surface and we ensure no collision will occur between the robot arm and the table.
In both experiments, we can learn a safe policy using the Soft Actor-Critic algorithm as a learning algorithm in combination with the ATACOM framework. With our approach, we are able to learn good policies fast and we can ensure low constraint violations at any timestep. Unfortunately, the constraint violation cannot be zero due to discretization, but it can be reduced to be arbitrarily small. This is not a major issue in real-world systems, as they are affected by noisy measurements and non-ideal actuation.
Is the safety problem solved now?
The key question to ask is if we can ensure any safety guarantees with ATACOM. Unfortunately, this is not true in general. What we can enforce are state constraints at each timestep. This includes a wide class of constraints, such as fixed obstacle avoidance, joint limits, surface constraints. We can extend our method to constraints considering not (directly) controllable variables. While we can ensure safety to a certain extent also in this scenario, we cannot ensure that the constraint violation will not be violated during the whole trajectory. Indeed, if the not controllable variables act in an adversarial way, they might find a long-term strategy to cause constraint violation in the long term. An easy example is a prey-predator scenario: even if we ensure that the prey avoids each predator, a group of predators can perform a high-level strategy and trap the agent in the long term.
Thus, with ATACOM we can ensure safety at a step level, but we are not able to ensure long-term safety, which requires reasoning at trajectory level. To ensure this kind of safety, more advanced techniques will be needed.
Find out more
The authors were best paper award finalists at CoRL this year, for their work: Robot reinforcement learning on the constraint manifold.
- Read the paper.
- The GitHub page for the work is here.
- Read more about the winning and shortlisted papers for the CoRL awards here.
2021 Top Article – How a Team United in Just 18 Months to Overhaul How America Destroys Its Most Dangerous Chemical Weapons
Best AI and Deep learning books to read in 2022
Flexible tentaclelike robotic manipulators inspired by nature
Moving toward the first flying humanoid robot
Setting Yourself Up for Future Robotics Industry Success
Sensors set to revolutionise brain-controlled robotics
Robot density nearly doubled globally

The use of industrial robots in factories around the world is accelerating at a high rate: 126 robots per 10,000 employees is the new average of global robot density in the manufacturing industries – nearly double the number five years ago (2015: 66 units). This is according to the 2021 World Robot Report.
By regions, the average robot density in Asia/Australia is 134 units, in Europe 123 units and in the Americas 111 units. The top 5 most automated countries in the world are: South Korea, Singapore, Japan, Germany, and Sweden.
“Robot density is the barometer to track the degree of automation adoption in the manufacturing industry around the world,” says Milton Guerry, President of the International Federation of Robotics.
Asia
The development of robot density in China is the most dynamic worldwide: Due to the significant growth of robot installations, the density rate rose from 49 units in 2015 to 246 units in 2020. Today, China’s robot density ranks 9th globally compared to 25th just five years ago.
Asia is also the home of the country with the world´s highest robot density in the manufacturing industry: the Republic of Korea has held this position since 2010. The country’s robot density exceeds the global average seven-fold (932 units per 10,000 workers). Robot density had been increasing by 10% on average each year since 2015. With its globally recognized electronics industry and a distinct automotive industry, the Korean economy is based on the two largest areas for industrial robots.
Singapore takes second place with a rate of 605 robots per 10,000 employees in 2020. Singapore’s robot density had been growing by 27% on average each year since 2015.
Japan ranked third in the world: In 2020, 390 robots were installed per 10,000 employees in the manufacturing industry. Japan is the world´s predominant industrial robot manufacturer: The production capacity of Japanese suppliers reached 174,000 units in 2020. Today, Japan´s manufacturers deliver 45% of the global robot supply.
North America
Robot density in the United States rose from 176 units in 2015 to 255 units in 2020. The country ranks seventh in the world – ahead of Chinese Taipei (248 units) and China (246 units). The modernization of domestic production facilities has boosted robot sales in the United States. The use of industrial robots also aids to achieve decarbonization targets e.g. in the cost-efficient production of solar panels and in the continued transition towards electric vehicles. Several car manufacturers have announced investments to further equip their factories for new electric drive car models or to increase capacity for battery production. These major projects will create demand for industrial robots in the next few years.
Europe
Europe´s most automated country is Germany – ranking 4th worldwide with 371 units. The annual supply had a share of 33% of total robot sales in Europe 2020 – 38% of Europe’s operational stock is in Germany. The German robotics industry is recovering, mainly driven by strong overseas business rather than by the domestic or European market. Robot demand in Germany is expected to grow slowly, mainly supported by demand for low-cost robots in the general industries and outside traditional manufacturing.
France has a robot density of 194 units (ranking 16th in the world), which is well above the global average of 126 robots and relatively similar compared to other EU countries like Spain (203 units), Austria (205 units) or The Netherlands (209 units). EU members like Sweden (289 units), Denmark (246 units) or Italy (224 units), have a significantly higher degree of automation in the manufacturing segment.
As the only G7 country – the UK has a robot density below the world average of 126 units with 101 units, ranking 24th. Five years ago, the UK´s robot density was 71 units. The exodus of foreign labor after Brexit increased the demand for robots in 2020. This situation is expected to prevail in near future, the modernization of the UK manufacturing industry will also be boosted by massive tax incentives, the “super-deduction”: From April 2021 until March 2023, companies can claim 130% of capital allowances as a tax relief for plant and machinery investments.
Call for robot holiday videos 2021

That’s right! You better not run, you better not hide, you better watch out for brand new robot holiday videos on Robohub!
Drop your submissions down our chimney at daniel.carrillozapata@robohub.org and share the spirit of the season.
For inspiration, here are our two first videos:
New haptic device communicates emotion with nearly 80% accuracy of human touch
You Know Your Company Needs Robotics, but Don’t Know Where to Start. Here’s How.
A new micro aerial robot based on dielectric elastomer actuators
Heliogen Demonstrates AI-Powered Autonomous Robot Designed to Lower Installation and Maintenance Costs of Full-Scale Concentrated Solar Plants
Giving bug-like bots a boost

MIT researchers have pioneered a new fabrication technique that enables them to produce low-voltage, power-dense, high endurance soft actuators for an aerial microrobot. Credits: Courtesy of the researchers
By Adam Zewe | MIT News Office
When it comes to robots, bigger isn’t always better. Someday, a swarm of insect-sized robots might pollinate a field of crops or search for survivors amid the rubble of a collapsed building.
MIT researchers have demonstrated diminutive drones that can zip around with bug-like agility and resilience, which could eventually perform these tasks. The soft actuators that propel these microrobots are very durable, but they require much higher voltages than similarly-sized rigid actuators. The featherweight robots can’t carry the necessary power electronics that would allow them fly on their own.
Now, these researchers have pioneered a fabrication technique that enables them to build soft actuators that operate with 75 percent lower voltage than current versions while carrying 80 percent more payload. These soft actuators are like artificial muscles that rapidly flap the robot’s wings.

The artificial muscles vastly improve the robot’s payload and allow it to achieve best-in-class hovering performance. Image: Kevin Chen
This new fabrication technique produces artificial muscles with fewer defects, which dramatically extends the lifespan of the components and increases the robot’s performance and payload.
“This opens up a lot of opportunity in the future for us to transition to putting power electronics on the microrobot. People tend to think that soft robots are not as capable as rigid robots. We demonstrate that this robot, weighing less than a gram, flies for the longest time with the smallest error during a hovering flight. The take-home message is that soft robots can exceed the performance of rigid robots,” says Kevin Chen, who is the D. Reid Weedon, Jr. ’41 assistant professor in the Department of Electrical Engineering and Computer Science, the head of the Soft and Micro Robotics Laboratory in the Research Laboratory of Electronics (RLE), and the senior author of the paper.
Chen’s coauthors include Zhijian Ren and Suhan Kim, co-lead authors and EECS graduate students; Xiang Ji, a research scientist in EECS; Weikun Zhu, a chemical engineering graduate student; Farnaz Niroui, an assistant professor in EECS; and Jing Kong, a professor in EECS and principal investigator in RLE. The research has been accepted for publication in Advanced Materials and is included in the jounal’s Rising Stars series, which recognizes outstanding works from early-career researchers.
Making muscles
The rectangular microrobot, which weighs less than one-fourth of a penny, has four sets of wings that are each driven by a soft actuator. These muscle-like actuators are made from layers of elastomer that are sandwiched between two very thin electrodes and then rolled into a squishy cylinder. When voltage is applied to the actuator, the electrodes squeeze the elastomer, and that mechanical strain is used to flap the wing.

The rectangular microrobot, which weighs less than one-fourth of a penny, has four sets of wings that are each driven by a soft actuator. Credits: Courtesy of the researchers
The more surface area the actuator has, the less voltage is required. So, Chen and his team build these artificial muscles by alternating between as many ultrathin layers of elastomer and electrode as they can. As elastomer layers get thinner, they become more unstable.
For the first time, the researchers were able to create an actuator with 20 layers, each of which is 10 micrometers in thickness (about the diameter of a red blood cell). But they had to reinvent parts of the fabrication process to get there.
One major roadblock came from the spin coating process. During spin coating, an elastomer is poured onto a flat surface and rapidly rotated, and the centrifugal force pulls the film outward to make it thinner.
“In this process, air comes back into the elastomer and creates a lot of microscopic air bubbles. The diameter of these air bubbles is barely 1 micrometer, so previously we just sort of ignored them. But when you get thinner and thinner layers, the effect of the air bubbles becomes stronger and stronger. That is traditionally why people haven’t been able to make these very thin layers,” Chen explains.
He and his collaborators found that if they perform a vacuuming process immediately after spin coating, while the elastomer was still wet, it removes the air bubbles. Then, they bake the elastomer to dry it.
Removing these defects increases the power output of the actuator by more than 300 percent and significantly improves its lifespan, Chen says.
The researchers also optimized the thin electrodes, which are composed of carbon nanotubes, super-strong rolls of carbon that are about 1/50,000 the diameter of human hair. Higher concentrations of carbon nanotubes increase the actuator’s power output and reduce voltage, but dense layers also contain more defects.
For instance, the carbon nanotubes have sharp ends and can pierce the elastomer, which causes the device to short out, Chen explains. After much trial and error, the researchers found the optimal concentration.
Another problem comes from the curing stage — as more layers are added, the actuator takes longer and longer to dry.
“The first time I asked my student to make a multilayer actuator, once he got to 12 layers, he had to wait two days for it to cure. That is totally not sustainable, especially if you want to scale up to more layers,” Chen says.
They found that baking each layer for a few minutes immediately after the carbon nanotubes are transferred to the elastomer cuts down the curing time as more layers are added.
Best-in-class performance
After using this technique to create a 20-layer artificial muscle, they tested it against their previous six-layer version and state-of-the-art, rigid actuators.
During liftoff experiments, the 20-layer actuator, which requires less than 500 volts to operate, exerted enough power to give the robot a lift-to-weight ratio of 3.7 to 1, so it could carry items that are nearly three times its weight.

“We demonstrate that this robot, weighing less than a gram, flies for the longest time with the smallest error during a hovering flight,” says Kevin Chen. Credits: Courtesy of the researchers
They also demonstrated a 20-second hovering flight, which Chen says is the longest ever recorded by a sub-gram robot. Their hovering robot held its position more stably than any of the others. The 20-layer actuator was still working smoothly after being driven for more than 2 million cycles, far outpacing the lifespan of other actuators.
“Two years ago, we created the most power-dense actuator and it could barely fly. We started to wonder, can soft robots ever compete with rigid robots? We observed one defect after another, so we kept working and we solved one fabrication problem after another, and now the soft actuator’s performance is catching up. They are even a little bit better than the state-of-the-art rigid ones. And there are still a number of fabrication processes in material science that we don’t understand. So, I am very excited to continue to reduce actuation voltage,” he says.
Chen looks forward to collaborating with Niroui to build actuators in a clean room at MIT.nano and leverage nanofabrication techniques. Now, his team is limited to how thin they can make the layers due to dust in the air and a maximum spin coating speed. Working in a clean room eliminates this problem and would allow them to use methods, such as doctor blading, that are more precise than spin coating.
While Chen is thrilled about producing 10-micrometer actuator layers, his hope is to reduce the thickness to only 1 micrometer, which would open the door to many applications for these insect-sized robots.
This work is supported, in part, by the MIT Research Laboratory of Electronics and a Mathworks Graduate Fellowship.