Securing The Robots
Fido funeral: In Japan, a send-off for robot dogs
New robot for skull base surgery alleviates surgeon’s workload
MODEX 2018: Old and new have never been so far apart
MODEX, ProMat and CeMAT are the biggest global material handling and logistics supply chain tech trade shows. But as I walked the corridors of this year’s MODEX in Atlanta, I was particularly aware of the widening disparity between the old and new.
Stats: Material Handling and Logistics
The 2018 MHI Annual Industry Report found that the 2018 adoption rate for driverless vehicles in material handling was only 10% and the adoption rate for AI was just 5% while the rate for robotics and automation was 35%.
The report indicated the top technologies expected to be a source of either disruption or competitive advantage within the next 3-5 years to be: (shown in order of importance)
- Robotics and Automation – picking, packing, sorting orders; loading, unloading, stacking; receiving and put-away; assembly operations; QC and inspection processing
- Predictive Analytics – manage lead times; synchronize links; avoid missteps
- Internet-of-Things (IoT) – enable real-time info flow; predictive analytics; and QC
- Artificial Intelligence – faster deliveries; reduced redundancies; improved analytics
- Driverless Vehicles – autonomous vehicles (and conversion kits for existing AGVs, tows and lifts); SLAM and point-to-point navigation; some with manipulators
The report also listed the key barriers to adoption of driverless vehicles and AI: (shown in order of importance)
- lack of a clear business case
- lack of adequate talent
- lack of understanding of the technology landscape
- lack of access to capital to make investments
- cybersecurity
Other research reports were more optimistic in their forecasts:
- A $3,500 report from QY Research made rosy forecasts for global parcel sorting robots and last mile delivery robots
- A $10,000 report from Interact Analysis forecast five years of double-digit growth for AMRs and AGVs converted to AMRs
- A $4,450 report from Grand View Research forecasts significant growth through 2024 driven by increasing demand for autonomous and safe point-to-point material handling equipment
Transforming lifts, tows, carts and AGVs to AMRs and VGVs
Human-operated AGVs, tows, lifts and other warehouse and factory vehicles have been a staple in material movement for decades. Now, with low-cost cameras, sensors and advanced vision systems, they are slowly transitioning to more flexible autonomous mobile robots that can tow, lift and carry. AMRs are Automated Mobile Robots which can be human operated or autonomous or a combination of both.
- Vendors providing conversion systems for existing lifts and carts to Vision Guided Vehicles (AMRs) for line-side replenishment, pallet movement, etc. include:
- Vendors providing grasping capabilities in addition to autonomous mobility include:
- Vendors providing high speed random grasping from moving conveyors or bins:
- RightHand Robotics
- Universal Logic
- Kinema SystemsVendors providing AMRs, VGVs and AIVs (Autonomous Intelligent Vehicles) for goods-to-person, load transfer, restocking, etc. include:
Navigation systems have changed as well and often don’t require floor grid markings, barcodes or extensive indoor localization and segregation systems such as those used by Kiva Systems (and subsequently Amazon). SLAM and combinations of floor grids, SLAM, path planning, and collision avoidance systems are adding flexibility to swarms of point-to-point mobile robots.
Kiva look-alikes emerge
In March 2012, in an effort to make their fulfillment centers as efficient as possible, Amazon acquired Kiva Systems for $775 million and almost immediately took them in-house, leaving a disgruntled set of Kiva customers who couldn’t expand and a larger group of prospective clients who were left with a technological gap and no solutions. I wrote about this gap and about the whole community of new providers that had sprung up to fill the void and were beginning to offer and demonstrate their solutions. Many of those new providers are listed above.
Recently, another set of competitors has emerged in this space:
- Companies started in China who copied the Kiva Systems formula to provide Kiva-like goods-to-person robot services and dynamic free-form warehousing to the major Chinese e-commerce vendors such as:
- Now some of those companies are expanding outside of China and SE Asia to Europe and America:
Bottom Line
There are many forms of warehousing. But the area where NextGen tools are needed the most are in high-turn distribution and fulfillment centers.
The rate of acceptance of e-commerce is changing warehousing forever, particularly distribution and fulfillment centers. Total e-commerce sales for 2017 were $453.5 billion, an increase of 16.0% from 2016. E-commerce sales in 2017 accounted for 8.9% of total sales versus 8.0% of total sales in 2016 according to the U.S. Department of Commerce.
Consequently flexibility and an ability to handle an ever-increasing number of parcels is paramount and fixed costs for conveyors, elevators and old style AS/RS systems has become anathema to warehouse executives worldwide. Hence the need to invest in NextGen Supply Chain methods as shown at shows like MODEX.
Although 31,000 people went to this year’s MODEX, I wonder how many share my view about the disparity between the old and new shown at the show. Certainly there were enough new tech vendors offering “a mixed fleet of intelligent, collaborative mobile robots and fully-autonomous, zero-infrastructure AGVs designed specifically for safe and flexible material flows in dynamic, human-centric environments.” Yet the emphasis on the show – and the favorable booth space placement – was to the old-line vendors rather than the NextGen companies listed above.
Go figure!
Shared autonomy via deep reinforcement learning
By Siddharth Reddy
Imagine a drone pilot remotely flying a quadrotor, using an onboard camera to navigate and land. Unfamiliar flight dynamics, terrain, and network latency can make this system challenging for a human to control. One approach to this problem is to train an autonomous agent to perform tasks like patrolling and mapping without human intervention. This strategy works well when the task is clearly specified and the agent can observe all the information it needs to succeed. Unfortunately, many real-world applications that involve human users do not satisfy these conditions: the user’s intent is often private information that the agent cannot directly access, and the task may be too complicated for the user to precisely define. For example, the pilot may want to track a set of moving objects (e.g., a herd of animals) and change object priorities on the fly (e.g., focus on individuals who unexpectedly appear injured). Shared autonomy addresses this problem by combining user input with automated assistance; in other words, augmenting human control instead of replacing it.
A blind, autonomous pilot (left), suboptimal human pilot (center), and combined human-machine team (right) play the Lunar Lander game.
Background
The idea of combining human and machine intelligence in a shared-control system goes back to the early days of Ray Goertz’s master-slave manipulator in 1949, Ralph Mosher’s Hardiman exoskeleton in 1969, and Marvin Minsky’s call for telepresence in 1980. After decades of research in robotics, human-computer interaction, and artificial intelligence, interfacing between a human operator and a remote-controlled robot remains a challenge. According to a review of the 2015 DARPA Robotics Challenge, “the most cost effective research area to improve robot performance is Human-Robot Interaction.The biggest enemy of robot stability and performance in the DRC was operator errors. Developing ways to avoid and survive operator errors is crucial for real-world robotics. Human operators make mistakes under pressure, especially without extensive training and practice in realistic conditions.”
Master-slave robotic manipulator (Goertz, 1949) |
Brain-computer interface for neural prosthetics (Shenoy & Carmena, 2014) |
Formalism for model-based shared autonomy (Javdani et al., 2015) |
One research thrust in shared autonomy approaches this problem by inferring the user’s goals and autonomously acting to achieve them. Chapter 5 of Shervin Javdani’s Ph.D. thesis contains an excellent review of the literature. Such methods have made progress toward better driver assist, brain-computer interfaces for prosthetic limbs, and assistive teleoperation, but tend to require prior knowledge about the world; specifically, (1) a dynamics model that predicts the consequences of taking a given action in a given state of the environment, (2) the set of possible goals for the user, and (3) an observation model that describes the user’s behavior given their goal. Model-based shared autonomy algorithms are well-suited to domains in which this knowledge can be directly hard-coded or learned, but are challenged by unstructured environments with ill-defined goals and unpredictable user behavior. We approached this problem from a different angle, using deep reinforcement learning to implement model-free shared autonomy.
Deep reinforcement learning uses neural network function approximation to tackle the curse of dimensionality in high-dimensional, continuous state and action spaces, and has recently achieved remarkable success in training autonomous agents from scratch to play video games, defeat human world champions at Go, and control robots. We have taken preliminary steps toward answering the following question: can deep reinforcement learning be useful for building flexible and practical assistive systems?
Model-Free RL with a Human in the Loop
To enable shared-control teleoperation with minimal prior assumptions, we devised a model-free deep reinforcement learning algorithm for shared autonomy. The key idea is to learn an end-to-end mapping from environmental observation and user input to agent action, with task reward as the only form of supervision. From the agent’s perspective, the user acts like a prior policy that can be fine-tuned, and an additional sensor generating observations from which the agent can implicitly decode the user’s private information. From the user’s perspective, the agent behaves like an adaptive interface that learns a personalized mapping from user commands to actions that maximizes task reward.
Fig. 1: An overview of our human-in-the-loop deep Q-learning algorithm for model-free shared autonomy
One of the core challenges in this work was adapting standard deep RL techniques to leverage control input from a human without significantly interfering with the user’s feedback control loop or tiring them with a long training period. To address these issues, we used deep Q-learning to learn an approximate state-action value function that computes the expected future return of an action given the current environmental observation and the user’s input. Equipped with this value function, the assistive agent executes the closest high-value action to the user’s control input. The reward function for the agent is a combination of known terms computed for every state, and a terminal reward provided by the user upon succeeding or failing at the task. See Fig. 1 for a high-level schematic of this process.
Learning to Assist
Prior work has formalized shared autonomy as a partially-observable Markov decision process (POMDP) in which the user’s goal is initially unknown to the agent and must be inferred in order to complete the task. Existing methods tend to assume the following components of the POMDP are known ex-ante: (1) the dynamics of the environment, or the state transition distribution $T$; (2) the set of possible goals for the user, or the goal space $\mathcal{G}$; and (3) the user’s control policy given their goal, or the user model $\pi_h$. In our work, we relaxed these three standard assumptions. We introduced a model-free deep reinforcement learning method that is capable of providing assistance without access to this knowledge, but can also take advantage of a user model and goal space when they are known.
In our problem formulation, the transition distribution $T$, the user’s policy $\pi_h$, and the goal space $\mathcal{G}$ are no longer all necessarily known to the agent. The reward function, which depends on the user’s private information, is
$$
R(s, a, s’) = \underbrace{R_{\text{general}}(s, a, s’)}_\text{known} + \underbrace{R_{\text{feedback}}(s, a, s’)}_\text{unknown, but observed}.
$$
This decomposition follows a structure typically present in shared autonomy: there are some terms in the reward that are known, such as the need to avoid collisions. We capture these in $R_{\text{general}}$. $R_{\text{feedback}}$ is user-generated feedback that depends on their private information. We do not know this function. We merely assume the agent is informed when the user provides feedback (e.g., by pressing a button). In practice, the user might simply indicate once per trial whether the agent succeeded or not.
Incorporating User Input
Our method jointly embeds the agent’s observation of the environment $s_t$ with the information from the user $u_t$ by simply concatenating them. Formally,
$$
\tilde{s}_t = \left[ \begin{array}{c} s_t \\ u_t \end{array} \right].
$$
The particular form of $u_t$ depends on the available information. When we do not know the set of possible goals $\mathcal{G}$ or the user’s policy given their goal $\pi_h$, as is the case for most of our experiments, we set $u_t$ to the user’s action $a^h_t$. When we know the goal space $\mathcal{G}$, we set $u_t$ to the inferred goal $\hat{g}_t$. In particular, for problems with known goal spaces and user models, we found that using maximum entropy inverse reinforcement learning to infer $\hat{g}_t$ led to improved performance. For problems with known goal spaces but unknown user models, we found that under certain conditions we could improve performance by training an LSTM recurrent neural network to predict $\hat{g}_t$ given the sequence of user inputs using a training set of rollouts produced by the unassisted user.
Q-Learning with User Control
Model-free reinforcement learning with a human in the loop poses two challenges: (1) maintaining informative user input and (2) minimizing the number of interactions with the environment. If the user input is a suggested control, consistently ignoring the suggestion and taking a different action can degrade the quality of user input, since humans rely on feedback from their actions to perform real-time control tasks. Popular on-policy algorithms like TRPO are difficult to deploy in this setting since they give no guarantees on how often the user’s input is ignored. They also tend to require a large number of interactions with the environment, which is impractical for human users. Motivated by these two criteria, we turned to deep Q-learning.
Q-learning is an off-policy algorithm, enabling us to address (1) by modifying the behavior policy used to select actions given their expected returns and the user’s input. Drawing inspiration from the minimal intervention principle embodied in recent work on parallel autonomy and outer-loop stabilization, we execute a feasible action closest to the user’s suggestion, where an action is feasible if it isn’t that much worse than the optimal action. Formally,
$$
\pi_{\alpha}(a \mid \tilde{s}, a^h) = \delta\left(a = \mathop{\arg\max}\limits_{\{a : Q'(\tilde{s}, a) \geq (1 – \alpha) Q'(\tilde{s}, a^\ast)\}} f(a, a^h)\right),
$$
where $f$ is an action-similarity function and $Q'(\tilde{s}, a) = Q(\tilde{s}, a) – \min_{a’ \in \mathcal{A}} Q(\tilde{s}, a’)$ maintains a sane comparison for negative Q values. The constant $\alpha \in [0, 1]$ is a hyperparameter that controls the tolerance of the system to suboptimal human suggestions, or equivalently, the amount of assistance.
Mindful of (2), we note that off-policy Q-learning tends to be more sample-efficient than policy gradient and Monte Carlo value-based methods. The structure of our behavior policy also speeds up learning when the user is approximately optimal: for appropriately large $\alpha$, the agent learns to fine-tune the user’s policy instead of learning to perform the task from scratch. In practice, this means that during the early stages of learning, the combined human-machine team performs at least as well as the unassisted human instead of performing at the level of a random policy.
User Studies
We applied our method to two real-time assistive control problems: the Lunar Lander game and a quadrotor landing task. Both tasks involved controlling motion using a discrete action space and low-dimensional state observations that include position, orientation, and velocity information. In both tasks, the human pilot had private information that was necessary to complete the task, but wasn’t capable of succeeding on their own.
The Lunar Lander Game
The objective of the game was to land the vehicle between the flags without crashing or flying out of bounds using two lateral thrusters and a main engine. The assistive copilot could observe the lander’s position, orientation, and velocity, but not the position of the flags.
Human Pilot (Solo): The human pilot can’t stabilize and keeps crashing. |
Human Pilot + RL Copilot: The copilot improves stability while giving the pilot enough freedom to land between the flags. |
Humans rarely beat the Lunar Lander game on their own, but with a copilot they did much better.
Fig. 2a: Success and crash rates averaged over 30 episodes.
Fig. 2b-c: Trajectories followed by human pilots with and without a copilot on Lunar Lander. Red trajectories end in a crash or out of bounds, green in success, and gray in neither. The landing pad is marked by a star. For the sake of illustration, we only show data for a landing site on the left boundary.
In simulation experiments with synthetic pilot models (not shown here), we also observed a significant benefit to explicitly inferring the goal (i.e., the location of the landing pad) instead of simply adding the user’s raw control input to the agent’s observations, suggesting that goal spaces and user models can and should be taken advantage of when they are available.
One of the drawbacks of analyzing Lunar Lander is that the game interface
and physics do not reflect the complexity and unpredictability of a real-world robotic shared autonomy task.
To evaluate our method in a more realistic environment, we formulated a task for a human pilot flying a real quadrotor.
Quadrotor Landing Task
The objective of the task was to land a Parrot AR-Drone 2 on a small, square landing pad at some distance from its initial take-off position, such that the drone’s first-person camera was pointed at a random object in the environment (e.g., a red chair), without flying out of bounds or running out of time. The pilot used a keyboard to control velocity, and was blocked from getting a third-person view of the drone so that they had to rely on the drone’s first-person camera feed to navigate and land. The assistive copilot observed position, orientation, and velocity, but did not know which object the pilot wanted to look at.
Human Pilot (Solo): The pilot’s display only showed the drone’s first-person view, so pointing the camera was easy but finding the landing pad was hard. |
Human Pilot + RL Copilot: The copilot didn’t know where the pilot wanted to point the camera, but it knew where the landing pad was. Together, the pilot and copilot succeeded at the task. |
Humans found it challenging to simultaneously point the camera at the desired scene and navigate to the precise location of a feasible landing pad under time constraints.
The assistive copilot had little trouble navigating to and landing on the landing pad, but did not know where to point the camera because it did not know what the human wanted to observe after landing. Together, the human could focus on pointing the camera and the copilot could focus on landing precisely on the landing pad.
Fig. 3a: Success and crash rates averaged over 20 episodes.
Fig. 3b-c: A bird’s-eye view of trajectories followed by human pilots with and without a copilot on the quadrotor landing task. Red trajectories end in a crash or out of bounds, green in success, and gray in neither. The landing pad is marked by a star.
Our results showed that combined pilot-copilot teams significantly outperform individual pilots and copilots.
What’s Next?
Our method has a major weakness: model-free deep reinforcement learning typically requires lots of training data, which can be burdensome for human users operating physical robots. We mitigated this issue in our experiments by pretraining the copilot in simulation without a human pilot in the loop. Unfortunately, this is not always feasible for real-world applications due to the difficulty of building high-fidelity simulators and designing rich user-agnostic reward functions $R_{\text{general}}$. We are currently exploring different approaches to this problem.
If you want to learn more, check out our pre-print on arXiv: Siddharth Reddy, Anca Dragan, Sergey Levine, Shared Autonomy via Deep Reinforcement Learning, arXiv, 2018.
The paper will appear at Robotics: Science and Systems 2018 from June 26-30. To encourage replication and extensions, we have released our code. Additional videos are available through the project website.
This article was initially published on the BAIR blog, and appears here with the authors’ permission.
ROBOTT-NET use case: Danfoss automated assembly line
Short delivery time, high flexibility and reduced costs for handling parts before assembly. These are the main goals that Danfoss Drives wanted to achieve by creating an automated assembly line. But while the goals were clear, the way to achieve them was cloudier.
“How to do it and with what technology, we haven’t decided yet. And that’s what we’re seeking help for”, says Technology Engineer Peter Lund Andersen from Danfoss Drives.
To find out which technologies and solutions are suitable for an automated assembly line Danfoss Drive received assistance from Danish Technological Institute’s Center for Robot Technology.
Danfoss Drives is namely one of the Danish companies that has received a so-called “voucher” through ROBOTT-NET, which offers a network of the leading European technological service institutes in robotics.
With the voucher, Danfoss Drive has an easy access to high technological solutions and robot experts outside of Denmark.
The challenge for Danfoss Drives has been that all their products are delivered in many different forms of packaging. They now want to pick the products automatically.
“Having more technological service institutes involved in the project means that we can draw on the core competence within each service institute and thereby combine each competence into one joint, great solution”, says Peter Lund Andersen. Adding that, “we have given quite a few of our tasks to English MTC, that specializes in mechanical construction. In Odense at the Danish Technological Institute they are experts in vision technology, so they take care of that part”.
You can check out Danfoss Drives’ voucher page here and watch the video of the use case below.
The main purpose of ROBOTT-NET is to gather and share the latest knowledge about robot technology that can improve production in European companies.
Note: ROBOTT-NET will be at HANNOVER MESSE from April 24-27, 2018. If you are there, make sure you pass by Stand G46 in Hall 6 by the European Commission and see project results from EU-funded projects like nextgenio, ultraSURFACE, covr, fed4sae, DiFiCIL, IPP4CPPS, Smart Anything Everywhere (SAE), RADICLE, cloudSME, BEinCPPS, CloudiFacturing & Fortissimo.
Robot Launch 2018 in full swing – like Tennibot!
With the Robot Launch 2018 competition in full swing – deadline May 15 for entries wanting to compete on stage in Brisbane at ICRA 2018 – we thought it was time to look at last years’ Robot Launch finalists. And a very successful bunch they are too!
Tennibot won the CES 2018 Innovation Award, was covered in media like Times, Discovery Channel and LA Times. Tennibot also won $40,000 from the Alabama Launchpad competition and are launching a crowdfunding campaign today!
Tennibot uses computer vision and artificial intelligence to locate/pick up tennis balls and navigate on the court. Tennibot is the world’s first autonomous tennis ball collector. The Tennibot team has already won the Tennis Industry Innovation Challenge. So, if you think that Tennis + Robots = Your kind of sport – then head over to Tennibot.com to learn more and purchase your Tennibot before it’s too late!
Other 2017 finalists include
- Semio, from California have a software platform for developing and deploying social robot skills.
- Apellix from Florida who provide software controlled aerial robotic systems that utilize tethered and untethered drones to move workers from harm’s way.
- Mothership Aeronautics from Silicon Valley have a solar powered drone capable of ‘infinity cruise’ where more power is generated than consumed.
- Kinema Systems, impressive approach to logistical challenges from the original Silicon Valley team that developed ROS.
- BotsandUs, highly awarded UK startup with a beautifully designed social robot for retail.
- Fotokite, smart team from ETHZurich with a unique approach to using drones in large scale venues.
- C2RO, from Canada are creating an expansive cloud based AI platform for service robots.
- krtkl, from Silicon Valley are high end embedded board designed for both prototyping and deployment.
Apellix were also winners of Automate 2017 startup competition. Mothership have raised a $1.25 million seed round from the likes of Draper Ventures. Kinema Systems has just won the NVIDIA Inception Challenge out of more than 200 entrants and splits $1 million prize money with two other AI startups. BotsAndUs have trialled Bo in more than 11,000 customer service interactions. Krtkl is focused on revenue not fundraising, C2RO is building partnerships with companies like Qihan. And Fotokite just won the $1 million Genius NY competition.
You can watch the pitch presentation here: https://youtu.be/BzcrREvD8k0
You’ll also see some other familiar names from the shortlist for 2017, not to mention lots of success for our 2016 top startups. We can’t wait to see who will be finalists in 2018!
The Robot Launch startup competition has been running since 2014 and has helped robotics startups reach investors, build a reputation and grow their markets. We’ve had entries from all over the world and one of the significant trends has been how rapidly the stage of startup entrants has advanced. We now judge startups in several divisions: Preseed, Seed and PostSeed (or Pre Series A)
Do you have a startup idea, a prototype or a seed stage startup in robotics, sensors or AI?
Submit your entries by May 15 2018, if you want to be selected to pitch on the main stage of ICRA 2018 on May 22 in Brisbane Australia for a chance to win $3000 AUD prize from QUT bluebox!
The top 10 startups will pitch live on stage to a panel of investors and mentors including:
- Martin Duursma, Main Sequence Ventures
- Chris Moehle, The Robotics Hub Fund
- Yotam Rosenbaum, QUT bluebox
- Roland Siegwart, ETH Zurich
Entries are also in the running for a place in the QUT bluebox accelerator*, the Silicon Valley Robotics Accelerator*, mentorship from all the VC judges and potential investment of up to $250,000 from The Robotics Hub Fund*. (*conditions apply – details on application)
CONDITIONS:
Pre Seed category consists of an idea and proof of concept or prototype – customer validation is also desirable.
Seed category consists of a startup younger than 24 months, with less than $250k previous investment.
Post Seed category consists of a startup younger than 36 months, with less than $2.5m previous investment.
CAN’T MAKE IT TO AUSTRALIA?
No problems, mate! We’ll be continuing the Robot Launch competition with additional rounds in the US and in Europe through out the summer. Go ahead and enter now anyway!
Enter the Robot Launch Startup Competition at ICRA 2018 here.
FOR YOUR GUIDE ON GOOD PITCH DOCUMENTS:
A sample Investor One Pager can be seen here. And your pitch should cover the content described in Nathan Gold’s 13 slide format.