Archive 28.02.2021

Page 1 of 7
1 2 3 7

Digital Innovation Hubs: €1.5 billion network to support green and digital transformation starts to take shape

A two-day discussion on the future of European Digital Innovation Hubs (EDIH) took place last month, as over 2000 stakeholders and policy-makers gathered online for the first time to take stock of the upcoming hubs network.

European Digital Innovation Hubs are one-stop-shops where companies and public sector organisations can access and test digital innovations, gain the required digital skills, get advice on financing support and ultimately accomplish their digital transformation in the context of the twin green and digital transition which is at the core of European industrial policy.

The EDIHs will become a network spread across the entire EU, sharing best practices and specialist knowledge and ready to support companies and public administrations in any region and economic sector. EDIHs will also play a brokering role between public administrations and companies providing e-government technologies, and are an important tool in Europe’s industrial and SME policies as they are close to local companies and ‘speak their language’.

Member States have already been invited to designate potential hubs. These potential hubs met for the first time on Tuesday the 26th of January at the Digital Innovation Hubs annual conference, organised in collaboration with the Luxembourg Ministry of Economy. Keynote speakers included Commissioner for the Internal Market Thierry Breton and Luxembourg Prime Minister Xavier Bettel. The 2000+ participants were provided with a unique opportunity to better prepare their submissions for the upcoming call and connect for further collaboration in building up the network of EDIHs.

Commissioner for the Internal Market Thierry Breton said:

Our vision is a resilient EU which is at the forefront of digital, where every citizen is digitally literate and where every single company can use the technology they need. Digital technology will help to provide the best public services in the world, and even small companies will be able to do business everywhere in Europe. This is the Europe we want, and the work of the European Digital Innovation Hubs will contribute to its construction.

The event was a discussion forum for potential EDIH candidates, policy makers and regional managers where the important role of the hubs was put in the context of Europe’s economic recovery and practical questions regarding how to set up a hub were answered. The conference is a crucial step in kick-starting the network, and is critical to building EU capacity to accelerate the digital innovation of businesses.

Half of the €1.5 billion investment in the EDIH network will come from the Digital Europe Programme and the rest through national and regional funds, financing the approximately 200 hubs from 2021 to 2027. They will act as a multiplier and widely diffuse the use of all the digital capacities built up under the different objectives of the Digital Europe Programme. Member States have worked in close collaboration with the European Commission to prepare the call.

Next steps

The potential hubs designated by Member States will be invited to submit their proposals for the call, which will select the hubs participating in the initial network of European Digital Innovation Hubs. Projects will be launched after summer 2021, and will immediately start working towards further digitalisation of all the sectors of the European economy, giving a strong contribution to the recovery after the COVID-19 crisis.

Researchers develop all-round grippers for contact-free society

The Korea Institute of Machinery and Materials (KIMM) successfully developed all-round gripper technology, enabling robots to hold objects of various shapes and stiffnesses. With the new technology, a single gripper can be used to handle different objects such as screwdrivers, bulbs, and coffee pots, and even food with delicate surfaces such as tofu, strawberries, and raw chicken. It is expected to expand applications in contact-free services such as household chores, cooking, serving, packaging, and manufacturing.

Can a robot operate effectively underwater?

If you've ever watched Planet Earth, you know the ocean is a wild place to live. The water is teeming with different ecosystems and organisms varying in complexity from an erudite octopus to a sea star. Unexpectedly, it is the sea star, a simple organism characterized by a decentralized nervous system, that offers insights into advanced adaptation to hydrodynamic forces—the forces created by water pressure and flow.

Self-supervised policy adaptation during deployment





Our method learns a task in a fixed, simulated environment and quickly adapts
to new environments (e.g. the real world) solely from online interaction during
deployment.

The ability for humans to generalize their knowledge and experiences to new situations is remarkable, yet poorly understood. For example, imagine a human driver that has only ever driven around their city in clear weather. Even though they never encountered true diversity in driving conditions, they have acquired the fundamental skill of driving, and can adapt reasonably fast to driving in neighboring cities, in rainy or windy weather, or even driving a different car, without much practice nor additional driver’s lessons. While humans excel at adaptation, building intelligent systems with common-sense knowledge and the ability to quickly adapt to new situations is a long-standing problem in artificial intelligence.



A robot trained to perform a given task in a lab environment may not generalize
to other environments, e.g. an environment with moving disco lights, even
though the task itself remains the same.

In recent years, learning both perception and behavioral policies in an end-to-end framework by deep Reinforcement Learning (RL) has been widely successful, and has achieved impressive results such as superhuman performance on Atari games played directly from screen pixels. Although impressive, it has become commonly understood that such policies fail to generalize to even subtle changes in the environment – changes that humans are easily able to adapt to. For this reason, RL has shown limited success beyond the game or environment in which it was originally trained, which presents a significant challenge in deployment of policies trained by RL in our diverse and unstructured real world.

Generalization by Randomization

In applications of RL, practitioners have sought to improve the generalization ability of policies by introducing randomization into the training environment (e.g. a simulation), also known as domain randomization. By randomizing elements of the training environment that are also expected to vary at test-time, it is possible to learn policies that are invariant to certain factors of variation. For autonomous driving, we may for example want our policy to be robust to changes in lighting, weather, and road conditions, as well as car models, nearby buildings, different city layouts, and so forth. While the randomization quickly evolves into an elaborate engineering challenge as more and more factors of variation are considered, the learning problem itself also becomes harder, greatly decreasing the sample efficiency of learning algorithms. It is therefore natural to ask: rather than learning a policy robust to all conceivable environmental changes, can we instead adapt a pre-trained policy to the new environment through interaction?




Left: training in a fixed environment. Right: training with
domain randomization.

Policy Adaptation

A naïve way to adapt a policy to new environments is by fine-tuning parameters using a reward signal. In real-world deployments, however, obtaining a reward signal often requires human feedback or careful engineering, neither of which are scalable solutions.

In recent work from our lab, we show that it is possible to adapt a pre-trained policy to unseen environments, without any reward signal or human supervision. A key insight is that, in the context of many deployments of RL, the fundamental goal of the task remains the same, even though there may be a mismatch in both visuals and underlying dynamics compared to the training environment, e.g. a simulation. When training a policy in simulation and deploying it in the real world (sim2real), there are often differences in dynamics due to imperfections in the simulation, and visual inputs captured by a camera are likely to differ from renderings of the simulation. Hence, the source of these errors often lie in an imperfect world understanding rather than misspecification of the task itself, and an agent’s interactions with a new environment can therefore provide us with valuable information about the disparity between its world understanding and reality.





Illustration of our framework for adaptation. Left: training before
deployment. The RL objective is optimized together with a self-supervised
objective. Right: adaptation during deployment. We optimize only the
self-supervised objective, using observations collected through interaction
with the environment.

To take advantage of this information we turn to the literature of self-supervised learning. We propose PAD, a general framework for adaptation of policies during deployment, by using self-supervision as a proxy for the absent reward signal. A given policy network $\pi$ parameterized by a collection of parameters $\theta$ is split sequentially into an encoder $\pi_{e}$ and a policy head $\pi_{a}$ such that $a_{t} = \pi(s_{t}; \theta) = \pi_{a} (\pi_{e}(s_{t}; \theta_{e}) ;\theta_{a})$ for a state $s_{t}$ and action $a_{t}$ at time $t$. We then let $\pi_{s}$ be a self-supervised task head and similarly let $\pi_{s}$ share the encoder $\pi_{e}$ with the policy head. During training, we optimize a self-supervised objective jointly together with the RL task, where the two tasks share part of a neural network. During deployment, we can no longer assume access to a reward signal and are unable to optimize the RL objective. However, we can still continue to optimize the self-supervised objective using observations collected through interaction with the new environment. At every step in the new environment, we update the policy through self-supervision, using only the most recently collected observation:

$$s_t \sim p(s_t | a_{t-1}, s_{t-1}) \\
\theta_{e}(t) = \theta_{e}(t-1) – \nabla_{\theta_{e}}L(s_{t}; \theta_{s}(t-1), \theta_{e}(t-1))$$

where L is a self-supervised objective. Assuming that gradients of the self-supervised objective are sufficiently correlated with those of the RL objective, any adaptation in the self-supervised task may also influence and correct errors in the perception and decision-making of the policy.

In practice, we use an inverse dynamics model $a_{t} = \pi_{s}( \pi_e(s_{t}), \pi_e(s_{t+1}))$, predicting the action taken in between two consecutive observations. Because an inverse dynamics model connects observations directly to actions, the policy can be adjusted for disparities both in visuals and dynamics (e.g. lighting conditions or friction) between training and test environments, solely through interaction with the new environment.

Adapting policies to the real world

We demonstrate the effectiveness of self-supervised policy adaptation (PAD) by training policies for robotic manipulation tasks in simulation and adapting them to the real world during deployment on a physical robot, taking observations directly from an uncalibrated camera. We evaluate generalization to a real robot environment that resembles the simulation, as well as two more challenging settings: a table cloth with increased friction, and continuously moving disco lights. In the demonstration below, we consider a Soft Actor-Critic (SAC) agent trained with an Inverse Dynamics Model (IDM), with and without the PAD adaptation mechanism.



Transferring a policy from simulation to the real world. SAC+IDM is a
Soft Actor-Critic (SAC) policy trained with an Inverse Dynamics Model (IDM),
and SAC+IDM (PAD) is the same policy but with the addition of policy
adaptation during deployment on the robot.

PAD adapts to changes in both visuals and dynamics, and nearly recovers the original success rate of the simulated environment. Policy adaptation is especially effective when the test environment differs from the training environment in multiple ways, e.g. where both visuals and physical properties such as object dimensionality and friction differ. Because it is often difficult to formally specify the elements that vary between a simulation and the real world, policy adaptation may be a promising alternative to domain randomization techniques in such settings.

Benchmarking generalization

Simulations provide a good platform for more comprehensive evaluation of RL algorithms. Together with PAD, we release DMControl Generalization Benchmark, a new benchmark for generalization in RL based on the DeepMind Control Suite, a popular benchmark for continuous control from images. In the DMControl Generalization Benchmark, agents are trained in a fixed environment and deployed in new environments with e.g. randomized colors or continuously changing video backgrounds. We consider an SAC agent trained with an IDM, with and without adaptation, and compare to CURL, a contrastive method discussed in a previous post. We compare the generalization ability of methods in the visualization below, and generally find that PAD can adapt even in non-stationary environments, a challenging problem setting where non-adaptive methods tend to fail. While CURL is found to generalize no better than the non-adaptive SAC trained with an IDM, agents can still benefit from the training signal that CURL provides during the training phase. Algorithms that learn both during training and deployment, and from multiple training signals, may therefore be preferred.



Generalization to an environment with video background. CURL is a
contrastive method, SAC+IDM is a Soft Actor-Critic (SAC) policy trained
with an Inverse Dynamics Model (IDM), and SAC+IDM (PAD) is the same
policy but with the addition of policy adaptation during deployment.

Summary

Previous work addresses the problem of generalization in RL by randomization, which requires anticipation of environmental changes and is known to not scale well. We formulate an alternative problem setting in vision-based RL: can we instead adapt a pre-trained policy to unseen environments, without any rewards or human feedback? We find that adapting policies through a self-supervised objective – solely from interactions in the new environment – is a promising alternative to domain randomization when the target environment is truly unknown. In the future, we ultimately envision agents that continuously learn and adapt to their surroundings, and are capable of learning both from explicit human feedback and through unsupervised interaction with the environment.

This post is based on the following paper:

  • Self-Supervised Policy Adaptation during Deployment
    Nicklas Hansen, Rishabh Jangir, Yu Sun, Guillem Alenyá, Pieter Abbeel, Alexei A. Efros, Lerrel Pinto, Xiaolong Wang
    Ninth International Conference on Learning Representations (ICLR), 2021
    arXiv, Project Website, Code

#329: Robots-as-a-Service, with Afshin Doust

Image credit: Techcouver

In this episode, Lilly interviews Afshin Doust, CEO of Advanced Intelligent Systems. Doust explains the company’s modular, robots-as-a-service subscription business model. They discuss robotic solutions for the agricultural industry, disinfecting robots to combat COVID19, and other exciting new developments at AIS.

Afshin Doust

Afshin Doust is the Chief Executive Officer of Advanced Intelligent Systems (AIS). He is a seasoned entrepreneur with professional experience in finance, sales, business consulting, and strategic management with a keen interest in assembling teams to resolve business challenges.  Afshin took the role of CEO at AIS in 2016, with the goal to lead the team towards the vision of creating innovations in autonomous robotic solutions for a wide range of applications.

 

 

 

Links

#329: Robots-as-a-Service, with Afshin Doust

Image credit: Techcouver

In this episode, Lilly interviews Afshin Doust, CEO of Advanced Intelligent Systems. Doust explains the company’s modular, robots-as-a-service subscription business model. They discuss robotic solutions for the agricultural industry, disinfecting robots to combat COVID19, and other exciting new developments at AIS.

Afshin Doust

Afshin Doust is the Chief Executive Officer of Advanced Intelligent Systems (AIS). He is a seasoned entrepreneur with professional experience in finance, sales, business consulting, and strategic management with a keen interest in assembling teams to resolve business challenges.  Afshin took the role of CEO at AIS in 2016, with the goal to lead the team towards the vision of creating innovations in autonomous robotic solutions for a wide range of applications.

 

 

 

Links

Page 1 of 7
1 2 3 7