Category robots in business

Page 306 of 476
1 304 305 306 307 308 476

Self-supervised policy adaptation during deployment





Our method learns a task in a fixed, simulated environment and quickly adapts
to new environments (e.g. the real world) solely from online interaction during
deployment.

The ability for humans to generalize their knowledge and experiences to new situations is remarkable, yet poorly understood. For example, imagine a human driver that has only ever driven around their city in clear weather. Even though they never encountered true diversity in driving conditions, they have acquired the fundamental skill of driving, and can adapt reasonably fast to driving in neighboring cities, in rainy or windy weather, or even driving a different car, without much practice nor additional driver’s lessons. While humans excel at adaptation, building intelligent systems with common-sense knowledge and the ability to quickly adapt to new situations is a long-standing problem in artificial intelligence.



A robot trained to perform a given task in a lab environment may not generalize
to other environments, e.g. an environment with moving disco lights, even
though the task itself remains the same.

In recent years, learning both perception and behavioral policies in an end-to-end framework by deep Reinforcement Learning (RL) has been widely successful, and has achieved impressive results such as superhuman performance on Atari games played directly from screen pixels. Although impressive, it has become commonly understood that such policies fail to generalize to even subtle changes in the environment – changes that humans are easily able to adapt to. For this reason, RL has shown limited success beyond the game or environment in which it was originally trained, which presents a significant challenge in deployment of policies trained by RL in our diverse and unstructured real world.

Generalization by Randomization

In applications of RL, practitioners have sought to improve the generalization ability of policies by introducing randomization into the training environment (e.g. a simulation), also known as domain randomization. By randomizing elements of the training environment that are also expected to vary at test-time, it is possible to learn policies that are invariant to certain factors of variation. For autonomous driving, we may for example want our policy to be robust to changes in lighting, weather, and road conditions, as well as car models, nearby buildings, different city layouts, and so forth. While the randomization quickly evolves into an elaborate engineering challenge as more and more factors of variation are considered, the learning problem itself also becomes harder, greatly decreasing the sample efficiency of learning algorithms. It is therefore natural to ask: rather than learning a policy robust to all conceivable environmental changes, can we instead adapt a pre-trained policy to the new environment through interaction?




Left: training in a fixed environment. Right: training with
domain randomization.

Policy Adaptation

A naïve way to adapt a policy to new environments is by fine-tuning parameters using a reward signal. In real-world deployments, however, obtaining a reward signal often requires human feedback or careful engineering, neither of which are scalable solutions.

In recent work from our lab, we show that it is possible to adapt a pre-trained policy to unseen environments, without any reward signal or human supervision. A key insight is that, in the context of many deployments of RL, the fundamental goal of the task remains the same, even though there may be a mismatch in both visuals and underlying dynamics compared to the training environment, e.g. a simulation. When training a policy in simulation and deploying it in the real world (sim2real), there are often differences in dynamics due to imperfections in the simulation, and visual inputs captured by a camera are likely to differ from renderings of the simulation. Hence, the source of these errors often lie in an imperfect world understanding rather than misspecification of the task itself, and an agent’s interactions with a new environment can therefore provide us with valuable information about the disparity between its world understanding and reality.





Illustration of our framework for adaptation. Left: training before
deployment. The RL objective is optimized together with a self-supervised
objective. Right: adaptation during deployment. We optimize only the
self-supervised objective, using observations collected through interaction
with the environment.

To take advantage of this information we turn to the literature of self-supervised learning. We propose PAD, a general framework for adaptation of policies during deployment, by using self-supervision as a proxy for the absent reward signal. A given policy network $\pi$ parameterized by a collection of parameters $\theta$ is split sequentially into an encoder $\pi_{e}$ and a policy head $\pi_{a}$ such that $a_{t} = \pi(s_{t}; \theta) = \pi_{a} (\pi_{e}(s_{t}; \theta_{e}) ;\theta_{a})$ for a state $s_{t}$ and action $a_{t}$ at time $t$. We then let $\pi_{s}$ be a self-supervised task head and similarly let $\pi_{s}$ share the encoder $\pi_{e}$ with the policy head. During training, we optimize a self-supervised objective jointly together with the RL task, where the two tasks share part of a neural network. During deployment, we can no longer assume access to a reward signal and are unable to optimize the RL objective. However, we can still continue to optimize the self-supervised objective using observations collected through interaction with the new environment. At every step in the new environment, we update the policy through self-supervision, using only the most recently collected observation:

$$s_t \sim p(s_t | a_{t-1}, s_{t-1}) \\
\theta_{e}(t) = \theta_{e}(t-1) – \nabla_{\theta_{e}}L(s_{t}; \theta_{s}(t-1), \theta_{e}(t-1))$$

where L is a self-supervised objective. Assuming that gradients of the self-supervised objective are sufficiently correlated with those of the RL objective, any adaptation in the self-supervised task may also influence and correct errors in the perception and decision-making of the policy.

In practice, we use an inverse dynamics model $a_{t} = \pi_{s}( \pi_e(s_{t}), \pi_e(s_{t+1}))$, predicting the action taken in between two consecutive observations. Because an inverse dynamics model connects observations directly to actions, the policy can be adjusted for disparities both in visuals and dynamics (e.g. lighting conditions or friction) between training and test environments, solely through interaction with the new environment.

Adapting policies to the real world

We demonstrate the effectiveness of self-supervised policy adaptation (PAD) by training policies for robotic manipulation tasks in simulation and adapting them to the real world during deployment on a physical robot, taking observations directly from an uncalibrated camera. We evaluate generalization to a real robot environment that resembles the simulation, as well as two more challenging settings: a table cloth with increased friction, and continuously moving disco lights. In the demonstration below, we consider a Soft Actor-Critic (SAC) agent trained with an Inverse Dynamics Model (IDM), with and without the PAD adaptation mechanism.



Transferring a policy from simulation to the real world. SAC+IDM is a
Soft Actor-Critic (SAC) policy trained with an Inverse Dynamics Model (IDM),
and SAC+IDM (PAD) is the same policy but with the addition of policy
adaptation during deployment on the robot.

PAD adapts to changes in both visuals and dynamics, and nearly recovers the original success rate of the simulated environment. Policy adaptation is especially effective when the test environment differs from the training environment in multiple ways, e.g. where both visuals and physical properties such as object dimensionality and friction differ. Because it is often difficult to formally specify the elements that vary between a simulation and the real world, policy adaptation may be a promising alternative to domain randomization techniques in such settings.

Benchmarking generalization

Simulations provide a good platform for more comprehensive evaluation of RL algorithms. Together with PAD, we release DMControl Generalization Benchmark, a new benchmark for generalization in RL based on the DeepMind Control Suite, a popular benchmark for continuous control from images. In the DMControl Generalization Benchmark, agents are trained in a fixed environment and deployed in new environments with e.g. randomized colors or continuously changing video backgrounds. We consider an SAC agent trained with an IDM, with and without adaptation, and compare to CURL, a contrastive method discussed in a previous post. We compare the generalization ability of methods in the visualization below, and generally find that PAD can adapt even in non-stationary environments, a challenging problem setting where non-adaptive methods tend to fail. While CURL is found to generalize no better than the non-adaptive SAC trained with an IDM, agents can still benefit from the training signal that CURL provides during the training phase. Algorithms that learn both during training and deployment, and from multiple training signals, may therefore be preferred.



Generalization to an environment with video background. CURL is a
contrastive method, SAC+IDM is a Soft Actor-Critic (SAC) policy trained
with an Inverse Dynamics Model (IDM), and SAC+IDM (PAD) is the same
policy but with the addition of policy adaptation during deployment.

Summary

Previous work addresses the problem of generalization in RL by randomization, which requires anticipation of environmental changes and is known to not scale well. We formulate an alternative problem setting in vision-based RL: can we instead adapt a pre-trained policy to unseen environments, without any rewards or human feedback? We find that adapting policies through a self-supervised objective – solely from interactions in the new environment – is a promising alternative to domain randomization when the target environment is truly unknown. In the future, we ultimately envision agents that continuously learn and adapt to their surroundings, and are capable of learning both from explicit human feedback and through unsupervised interaction with the environment.

This post is based on the following paper:

  • Self-Supervised Policy Adaptation during Deployment
    Nicklas Hansen, Rishabh Jangir, Yu Sun, Guillem Alenyá, Pieter Abbeel, Alexei A. Efros, Lerrel Pinto, Xiaolong Wang
    Ninth International Conference on Learning Representations (ICLR), 2021
    arXiv, Project Website, Code

#329: Robots-as-a-Service, with Afshin Doust

Image credit: Techcouver

In this episode, Lilly interviews Afshin Doust, CEO of Advanced Intelligent Systems. Doust explains the company’s modular, robots-as-a-service subscription business model. They discuss robotic solutions for the agricultural industry, disinfecting robots to combat COVID19, and other exciting new developments at AIS.

Afshin Doust

Afshin Doust is the Chief Executive Officer of Advanced Intelligent Systems (AIS). He is a seasoned entrepreneur with professional experience in finance, sales, business consulting, and strategic management with a keen interest in assembling teams to resolve business challenges.  Afshin took the role of CEO at AIS in 2016, with the goal to lead the team towards the vision of creating innovations in autonomous robotic solutions for a wide range of applications.

 

 

 

Links

#329: Robots-as-a-Service, with Afshin Doust

Image credit: Techcouver

In this episode, Lilly interviews Afshin Doust, CEO of Advanced Intelligent Systems. Doust explains the company’s modular, robots-as-a-service subscription business model. They discuss robotic solutions for the agricultural industry, disinfecting robots to combat COVID19, and other exciting new developments at AIS.

Afshin Doust

Afshin Doust is the Chief Executive Officer of Advanced Intelligent Systems (AIS). He is a seasoned entrepreneur with professional experience in finance, sales, business consulting, and strategic management with a keen interest in assembling teams to resolve business challenges.  Afshin took the role of CEO at AIS in 2016, with the goal to lead the team towards the vision of creating innovations in autonomous robotic solutions for a wide range of applications.

 

 

 

Links

Robots4Humanity in next Society, Robots and Us

Speakers in tonight’s Society, Robots and Us at 6pm PST Tuesday Feb 23 include Henry Evans, mute quadriplegic and founder of Robots4Humanity and Aaron Edsinger, founder of Hello Robot. We’ll also being talking about robots for people with disabilities with Disability Advocate Adriana Mallozi, founder of Puffin Innovations and Daniel Seita, who is a deaf roboticist. The event is free and open to the public.

As a result of a sudden stroke, Henry Evans turned from being a Silicon Valley tech builder into searching for technologies and robots that would improve his life, and the life of his family and caregivers, as the founder of Robots4Humanity. Since then Henry has shaved himself with the help of the PR2 robot, and spoken on the TED stage with Chad Jenkins in a Suitable Tech Beam. Now he’s working with Aaron Edsinger and the Stretch Robot which is a very affordable household robot and teleoperation platform.

We’ll also be hearing from Adriana Mallozi, Disability Advocate and founder of Puffin Innovations which is a woman-owned assistive technology startup with a diverse team focused on developing solutions for people with disabilities to lead more inclusive and independent lives. The team at Puffin Innovations is dedicated to leveling the playing field for people with disabilities using Smart Assistive Technology (SAT).  SAT incorporates internet of things connectivity, machine learning, and artificial intelligence to provide maximum access with the greatest of ease. By tailoring everything they do, from user interfaces to our portable, durable, and affordable products, Puffin Innovations will use its Smart Assistive Technology to provide much needed solutions the disabled community has been longing for.

This continues our monthly exploration of Inclusive Robotics from CITRIS People and Robots Lab at the Universities of California, in partnership with Silicon Valley Robotics. On January 19, we discussed diversity with guest speakers Dr Michelle Johnson from the GRASP Lab at UPenn, Dr Ariel Anders from Women in Robotics and first technical hire at Robust.ai, Alka Roy from The Responsible Innovation Project, and Kenechukwu C. Mbanesi and Kenya Andrews from Black in Robotics, with discussion moderated by Dr Ken Goldberg, artist, roboticist and Director of the CITRIS People and Robots Lab, and Andra Keay from Silicon Valley Robotics.

You can see the full playlist of all the Society, Robots and Us conversations on the Silicon Valley Robotics youtube channel.

A robot that allows users to virtually navigate remote environments

Two students who graduated from VR Siddartha Engineering College in Kanuru, India, have created a virtual telepresence robot that allows users to see what is happening in a remote location as if they were actually there. Their project, supervised by Professor V.N. Prudhvi Raj, provides a valuable example of how robots can be used to capture video data in real time and monitor places that are momentarily or permanently inaccessible to humans.

How to build a robotics startup: the product idea

In this podcast series of episodes we are going to explain how to create a robotics startup step by step.

We are going to learn how to select your co-founders, your team, how to look for investors, how to test your ideas, how to get customers, how to reach your market, how to build your product… Starting from zero, how to build a successful robotics startup.

I’m Ricardo Tellez, CEO and co-founder of The Construct startup, a robotics startup at which we deliver the best learning experience to become a ROS Developer, that is, to learn how to program robots with ROS.

Our company is already 5 years long, we are a team of 10 people working around the world. We have more than 100.000 students, and tens of Universities around the world use our online academy to provide the teaching environment to their students.

We have bootstrapped our startup, but we also (unsuccessfully) tried getting investors. We have done a few pivots and finally ended at the point that we are right now.

With all this experience, I’m going to teach you how to build your own startup. And we are going to go through the process by creating ourselves another startup, so you can see in the path how to create your own. So you are going to witness the creation of such robotics startup.

This episode is about deciding the product your startup will produce.

Related links

Subscribe to the podcast using any of the following methods

Or watch the video

The post 89. How to build a robotics startup: the product idea appeared first on The Construct.

The appearance of robots affects our perception of the morality of their decisions

'Moralities of Intelligent Machines' is a project that investigates people's attitudes towards moral choices made by artificial intelligence. In the latest study completed under the project, study participants read short narratives where either a robot, a somewhat humanoid robot known as iRobot, a robot with a strong humanoid appearance called iClooney or a human being encounters a moral problem along the lines of the trolley dilemma, making a specific decision. The participants were also shown images of these agents, after which they assessed the morality of their decisions. The study was funded by the Jane and Aatos Erkko Foundation and the Academy of Finland.

Back to Robot Coding part 2: the ethical black box

In the last few days I started some serious coding. The first for 20 years, in fact, when I built the software for the BRL LinuxBots. (The coding I did six months ago doesn’t really count as I was only writing or modifying small fragments of Python).

My coding project is to start building an ethical black box (EBB), or to be more accurate, a module that will allow a software EBB to be incorporated into a robot. Conceptually the EBB is very simple, it is a data logger – the robot equivalent of an aircraft Flight Data Recorder, or an automotive Event Data Recorder. Nearly five years ago I made the case, with Marina Jirotka, that all robots (and AIs) should be fitted with an EBB as standard. Our argument is very simple: without an EBB, it will be more or less impossible to investigate robot accidents, or near-misses, and in a recent paper on Robot Accident Investigation we argue that with the increasing use of social robots accidents are inevitable and will need to be investigated.

Developing and demonstrating the EBB is a foundational part of our 5-year EPSRC funded project RoboTIPS, so it’s great to be doing some hands-on practical research. Something I’ve not done for awhile.

Here is a block diagram showing the EBB and its relationship with a robot controller.

Box diagram of sensor, embedded artificial intelligence and actuation data being logged by the ethical black box

As shown here the data flows from the robot controller to the EBB are strictly one way. The EBB cannot and must not interfere with the operation of the robot. Coding an EBB for a particular robot would be straightforward, but I have set myself a tougher goal: a generic EBB module (i.e. library of functions) that would – with some inevitable customisation – apply to any robot. And I set myself the additional challenge of coding in Python, making use of skills learned from the excellent online Codecademy Python 2 course.

There are two elements of the EBB that must be customised for a particular robot. The first is the data structure used to fetch and save the sensor, actuator and decision data in the diagram above. Here is an example from my first stab at an EBB framework, using the Python dictionary structure:

# This dictionary structure serves as both 
# 1 specification of the type of robot, and each data field that
#   will be logged for this robot, &
# 2 the data structure we use to deliver live data to the EBB

# for this model let us create a minimal spec for an ePuck robot
epuckSpec = {
    # the first field *always* identifies the type of robot plus            # version and serial nos
    “robot” : [“ePuck”, “v1”, “SN123456”],
    # the remaining fields are data we will log, 
    # starting with the motors
    # ..of which the ePuck has just 2: left and right
    “motors” : [0,0],
    # then 8 infra red sensors
    “irSensors” : [0,0,0,0,0,0,0,0],
    # ..note the ePuck has more sensors: accelerometer, camera etc, 
    # but this will do for now
    # ePuck battery level
    “batteryLevel” : [0],
    # then 1 decision code – i.e. what the robot is doing now
    # what these codes mean will be specific to both the robot 
    # and the application
    “decisionCode” : [0]
    }

Whether a dictionary is the best way of doing this I’m not 100% sure, being new to Python (any thoughts from experienced Pythonistas welcome).

The idea is that all robot EBBs will need to define a data structure like this. All must contain the first field “robot”, which names the robot’s type, its version number and serial number. Then the following fields must use keywords from a standard menu, as needed. As shown in this example each keyword is followed by a list of placeholder values – in which the number of values in the list reflects the specification of the actual robot. The ePuck robot, for instance, has 2 motors and 8 infra-red sensors.

The final field in the data structure is “decisionCode”. The values stored in this field would be both robot and applications specific; for the ePuck robot these might be 1 = ‘stop’, 2 = ‘turn left’, 3 = ‘turn right’ and so on. We could add another value for a parameter, so the robot might decide for instance to turn left 40 degrees, so “decisionCode” : [2,40]. We could also add a ‘reason’ field, which would save the high-level reason for the decision, as in “decisionCode” : [2,40,”avoid obstacle right”] noting that the decision field could be a string as shown here, or a numeric code.

As I hope I have shown here the design of this data structure and its fields is at the heart of the EBB.

The second element of the EBB library that must be written for the particular robot and application, is the function which fetches data from the robot

# Get data from the robot and store it in data structure spec
def getRobotData(spec):

How this function is implemented will vary hugely between robots and robot applications. For our Linux enhanced ePucks with WiFi connections this is likely to be via a TCP/IP client-server, with the server running on the robot, sending data following a request from the client getRobotData(ePuckspec).   For simpler setups in which the EBB module is folded into the robot controller then accessing the required data within getRobotData() should be very straightforward.

The generic part of the EBB module will define the class EBB, with methods for both initialising the EBB and saving a new data record to the EBB. I will cover that in another blog post.

Before closing let me add that it is our intention to publish the specification of the EBB, together with the model EBB code, once it had been fully tested, as open source.

Any comments or feedback would be much appreciated.


Link to the original post here.

Page 306 of 476
1 304 305 306 307 308 476