Sensors Expo & Conference Reveals “Best of Sensors” 2019 Award Winners
How the Combination of Robotics and AI Can Transform Industries
The Shift to Driverless Trucks Will Open Up Fulfilling Careers
Wearable robot ‘WalkON Suit’ off to Cybathlon 2020
For less than $200, engineering students built a realistic robotic fish
Sandia’s crawling robots, drones detect damage to save wind blades
Robots may care for you in old age—and your children will teach them
iRobot Introduces the Root Coding Robot Through Acquisition of Root Robotics
#289: On Design in Human-Robot Interaction, with Bilge Mutlu
In this episode, Audrow Nash interviews Bilge Mutlu, Associate Professor at the University of Wisconsin–Madison, about design-thinking in human-robot interaction. Professor Mutlu discusses design-thinking at a high-level, how design relates to science, and he speaks about the main areas of his work: the design space, the evaluation space, and how features are used within a context. He also gives advice on how to apply a design-oriented mindset.
Bilge Mutlu
Bilge Mutlu is an Associate Professor of Computer Science, Psychology, and Industrial Engineering at the University of Wisconsin–Madison. He directs the Wisconsin HCI Laboratory and organizes the WHCI+D Group. He received his PhD degree from Carnegie Mellon University‘s Human-Computer Interaction Institute.
Links
The world’s smallest autonomous racing drone
Racing team 2018-2019: Christophe De Wagter, Guido de Croon, Shuo Li, Phillipp Dürnay, Jiahao Lin, Simon Spronk
Autonomous drone racing
Drone racing is becoming a major e-sports. Enthusiasts – and now also professionals – transform drones into seriously fast racing platforms. Expert drone racers can reach speeds up to 190 km/h. They fly by looking at a first-person view (FPV) of their drone, which has a camera transmitting images mounted on the front.
In recent years, the advance in areas such as artificial intelligence, computer vision, and control has raised the question whether drones would not be able to fly faster than humans. The advantage for the drone could be that it can sense much more than the human pilot (like accelerations and rotation rates with its inertial sensors) and process all image data quicker on board of the drone. Moreover, its intelligence could be shaped purely for only one goal: racing as fast as possible.
In the quest for a fast-flying, autonomous racing drone, multiple autonomous racing drone competitions have been organized in the academic community. These “IROS” drone races (where IROS stands for one of the most well-known world-wide robotics conferences) have been held from 2016 on. Over these years, the speed of the drones has been gradually improving, with the faster drones in the competition now moving at ~2 m/s.
Smaller
Most of the autonomous racing drones are equipped with high-performance processors, with multiple, high-quality cameras and sometimes even with laser scanners. This allows these drones to use state-of-the-art solutions to visual perception, like building maps of the environment or tracking accurately how the drone is moving over time. However, it also makes the drones relatively heavy and expensive.
At the Micro Air Vehicle laboratory (MAVLab) of TU Delft, we have as aim to make light-weight and cheap autonomous racing drones. Such drones could be used by many drone racing enthusiasts to train with or fly against. If the drone becomes small enough, it could even be used for racing at home. Aiming for “small” means serious limitations to the sensors and processing that can be carried onboard. This is why in the IROS drone races we have always focused on monocular vision (a single camera) and on software algorithms for vision, state estimation, and control that are computationally highly efficient.
A 72-gram autonomous racing drone
Here, we report on how we made a tiny autonomous racing drone fly through a racing track with on average 2 m/s, which is competitive with other, larger state-of-the-art autonomous racing drones.
The drone, which is a modified Eachine “Trashcan”, is 10 cm in diameter and weighs 72 grams. This weight includes a 17-gram JeVois smart-camera, which consists of a single, rolling shutter CMOS camera, a 4-core ARM v7 1.34 GHz processor with 256 MB RAM, and a 2-core Mali GPU. Although limited compared to the processors used on other drones, we consider it as more than powerful enough: With the algorithms we explain below, the drone actually only uses a single CPU core. The JeVois camera communicates with a 4.2gram Crazybee F4 Pro Flight Controller running Paparazzi autopilot, via the MAVLink communication protocol. Both the JeVois code and Paparazzi code is open source and available to the community.
An important characteristic of our approach to drone racing is that we do not rely on accurate, but computationally expensive methods for visual Simultaneous Localization And Mapping (SLAM) or Visual Inertial Odometry (VIO). Instead, we focus on having the drone predict its motion as good as possible with an efficient prediction model and correct any drift of the model with vision-based gate detections.
Prediction
A typical prediction model would involve the integration of the accelerometer readings. However, on small drones the Inertial Measurement Unit (IMU) is subject to a lot of vibration, leading to noisier accelerometer readings. Integrating such noisy measurements quickly leads to an enormous drift in both the velocity and position estimates of the drone. Therefore, we have opted for a simpler solution, in which the IMU is only used to determine the attitude of the drone. This attitude can then be used to predict the forward acceleration, as illustrated in the figure below. If one assumes the drone to fly at a constant height, the force in the z-direction has to equal the gravity force. Given a specific pitch angle, this relation leads to a specific forward force due to the thrust. The prediction model then updates the velocity based on this predicted forward force and the expected drag force given the estimated velocity.
Vision-based corrections
The prediction model is corrected with the help of vision-based position measurements. First, a snake-gate algorithm is used to detect the colored gate in the image. This algorithm is extremely efficient, as it only processes a small portion of the image’s pixels. It samples random image locations and when it finds the right color, it starts following it around to determine the shape. After a detection, the known size of the gate is used to determine the drone’s relative position to the gate (see the figure below). This is a standard perspective-N-point problem. The output of this process is a relative position to a gate. Subsequently, we figure out which gate on the racing track is most likely in our view, and transform the relative position to the gate to a global position measurement. Since our vision process often outputs quite precise position estimates but sometimes also produces significant outliers, we do not use a Kalman filter but a Moving Horizon Estimator for the state estimation. This leads to much more robust position and velocity estimates in the presence of outliers.
Racing performance and future steps
The drone used the newly developed algorithms to race along a 4-gate race track in TU Delft’s Cyberzoo. It can fly multiple laps at an average speed of 2 m/s, which is competitive with larger, state-of-the-art autonomous racing drones (see the video at the top). Thanks to the central role of gate detections in the drone’s algorithms, the drone can cope with moderate displacements of the gates.
Possible future directions of research are to make the drone smaller and fly faster. In principle, being small is an advantage, since the gates are relatively bigger. This allows the drone to choose its trajectory more freely than a big drone, which may allow for faster trajectories. In order to better exploit this characteristic, we would have to fit optimal control algorithms into the onboard processing. Moreover, we want to make the vision algorithms more robust – as the current color-based snake gate algorithm is quite dependent on lighting conditions. An obvious option here is to start using deep neural networks, which would have to fit within the dual-core Mali GPU on the JeVois.
Arxiv article: Visual Model-predictive Localization for Computationally Efficient Autonomous Racing of a 72-gram Drone, Shuo Li, Erik van der Horst, Philipp Duernay, Christophe De Wagter, Guido C.H.E. de Croon.
The little robot that could
iRobot Corp. announced its acquisition of Root Robotics, Inc., whose educational Root coding robot got its start as a summer research project at the Wyss Institute for Biologically Inspired Engineering in 2011 and subsequently developed into a robust learning tool that is being used in over 500 schools to teach children between the ages of four and twelve how to code in an engaging, intuitive way. iRobot plans to incorporate the Root robot into its growing portfolio of educational robot products, and continue the work of scaling up production and expanding Root’s programming content that began when Root Robotics was founded by former Wyss Institute members in 2017.
“We’re honored that we got to see a Wyss Institute technology go from its earliest stages to where we are today, with the opportunity to make a gigantic impact on the world,” said Zivthan Dubrovsky, former Bioinspired Robotics Platform Lead at the Wyss Institute and co-founder of Root Robotics who is now the General Manager of Educational Robots at iRobot. “We’re excited to see how this new chapter in Root’s story can further amplify our mission of making STEM education accessible to students of any age in any classroom around the world.”
Root began in the lab of Wyss Core Faculty Member and Bioinspired Robotics Platform co-lead Radhika Nagpal, Ph.D., who was investigating the idea of robots that could climb metal structures using magnetic wheels. “Most whiteboards in classrooms are backed with metal, so I thought it would be wonderful if a robot could automatically erase the whiteboard as I was teaching – ironically, we referred to it as a ‘Roomba® for whiteboards,’ because many aspects were directly inspired by iRobot’s Roomba at the time,” said Nagpal, who is also the Fred Kavli Professor of Computer Science at Harvard’s John A. Paulson School of Engineering and Applied Sciences (SEAS). “Once we had a working prototype, the educational potential of this robot was immediately obvious. If it could be programmed to detect ink, navigate to it, and erase it, then it could be used to teach students about coding algorithms of increasing complexity.”
That prototype was largely built by Raphael Cherney, first as a Research Engineer in Nagpal’s group at Harvard in 2011, and then beginning in 2013 when he was hired to work on developing Root full-time along with Dubrovsky and other members of the Wyss Institute. “When Raphael and Radhika pitched me the idea of Root, I fell in love with it immediately,” said Dubrovsky. “My three daughters were all very young at the time and I wanted them to have exposure to STEM concepts like coding and engineering, but I was frustrated by the lack of educational systems that were designed for children their age. The idea of being able to create that for them was really what motivated me to throw all my weight behind the project.”
Under Cherney and Dubrovsky’s leadership, Root’s repertoire expanded to include drawing shapes on the whiteboard as it wheeled around, navigating through obstacles drawn on the whiteboard, playing music, and more. The team also developed Root’s coding interface, which has three levels of increasing complexity that are designed to help students from preschool to high school easily grasp the concepts of programming and use them to create their own projects. “The tangible nature of a robot really brings the code to life, because the robot is ‘real’ in a way that code isn’t – you can watch it physically carrying out the instructions that you’ve programmed into it,” said Cherney, who co-founded Root Robotics and is now a Principal Systems Engineer at iRobot. “It helps turn coding into a social activity, especially for kids, as they learn to work in teams and see coding as a fun and natural thing to do.”
Over the next three years the team iterated on Root’s prototype and began testing it in classrooms in and around Boston, getting feedback from students and teachers to get the robot closer to its production-ready form. “Robots are very hard to build, and the support we had from the Wyss Institute let us do it right, instead of just fast,” said Cherney. “We were able to develop Root from a prototype to a product that worked in schools and was doing what we envisioned, and the whole process was much smoother than it would have been if we had just been a team working in a garage.”
Spotting objects amid clutter
Image: Christine Daniloff, MIT
A new MIT-developed technique enables robots to quickly identify objects hidden in a three-dimensional cloud of data, reminiscent of how some people can make sense of a densely patterned “Magic Eye” image if they observe it in just the right way.
Robots typically “see” their environment through sensors that collect and translate a visual scene into a matrix of dots. Think of the world of, well, “The Matrix,” except that the 1s and 0s seen by the fictional character Neo are replaced by dots — lots of dots — whose patterns and densities outline the objects in a particular scene.
Conventional techniques that try to pick out objects from such clouds of dots, or point clouds, can do so with either speed or accuracy, but not both.
With their new technique, the researchers say a robot can accurately pick out an object, such as a small animal, that is otherwise obscured within a dense cloud of dots, within seconds of receiving the visual data. The team says the technique can be used to improve a host of situations in which machine perception must be both speedy and accurate, including driverless cars and robotic assistants in the factory and the home.
“The surprising thing about this work is, if I ask you to find a bunny in this cloud of thousands of points, there’s no way you could do that,” says Luca Carlone, assistant professor of aeronautics and astronautics and a member of MIT’s Laboratory for Information and Decision Systems (LIDS). “But our algorithm is able to see the object through all this clutter. So we’re getting to a level of superhuman performance in localizing objects.”
Carlone and graduate student Heng Yang will present details of the technique later this month at the Robotics: Science and Systems conference in Germany.
“Failing without knowing”
Robots currently attempt to identify objects in a point cloud by comparing a template object — a 3-D dot representation of an object, such as a rabbit — with a point cloud representation of the real world that may contain that object. The template image includes “features,” or collections of dots that indicate characteristic curvatures or angles of that object, such the bunny’s ear or tail. Existing algorithms first extract similar features from the real-life point cloud, then attempt to match those features and the template’s features, and ultimately rotate and align the features to the template to determine if the point cloud contains the object in question.
But the point cloud data that streams into a robot’s sensor invariably includes errors, in the form of dots that are in the wrong position or incorrectly spaced, which can significantly confuse the process of feature extraction and matching. As a consequence, robots can make a huge number of wrong associations, or what researchers call “outliers” between point clouds, and ultimately misidentify objects or miss them entirely.
Carlone says state-of-the-art algorithms are able to sift the bad associations from the good once features have been matched, but they do so in “exponential time,” meaning that even a cluster of processing-heavy computers, sifting through dense point cloud data with existing algorithms, would not be able to solve the problem in a reasonable time. Such techniques, while accurate, are impractical for analyzing larger, real-life datasets containing dense point clouds.
Other algorithms that can quickly identify features and associations do so hastily, creating a huge number of outliers or misdetections in the process, without being aware of these errors.
“That’s terrible if this is running on a self-driving car, or any safety-critical application,” Carlone says. “Failing without knowing you’re failing is the worst thing an algorithm can do.”
A relaxed view
Yang and Carlone instead devised a technique that prunes away outliers in “polynomial time,” meaning that it can do so quickly, even for increasingly dense clouds of dots. The technique can thus quickly and accurately identify objects hidden in cluttered scenes.
The MIT-developed technique quickly and smoothly matches objects to those hidden in dense point clouds (left), versus existing techniques (right) that produce incorrect, disjointed matches. Gif: Courtesy of the researchers
The researchers first used conventional techniques to extract features of a template object from a point cloud. They then developed a three-step process to match the size, position, and orientation of the object in a point cloud with the template object, while simultaneously identifying good from bad feature associations.
The team developed an “adaptive voting scheme” algorithm to prune outliers and match an object’s size and position. For size, the algorithm makes associations between template and point cloud features, then compares the relative distance between features in a template and corresponding features in the point cloud. If, say, the distance between two features in the point cloud is five times that of the corresponding points in the template, the algorithm assigns a “vote” to the hypothesis that the object is five times larger than the template object.
The algorithm does this for every feature association. Then, the algorithm selects those associations that fall under the size hypothesis with the most votes, and identifies those as the correct associations, while pruning away the others. In this way, the technique simultaneously reveals the correct associations and the relative size of the object represented by those associations. The same process is used to determine the object’s position.
The researchers developed a separate algorithm for rotation, which finds the orientation of the template object in three-dimensional space.
To do this is an incredibly tricky computational task. Imagine holding a mug and trying to tilt it just so, to match a blurry image of something that might be that same mug. There are any number of angles you could tilt that mug, and each of those angles has a certain likelihood of matching the blurry image.
Existing techniques handle this problem by considering each possible tilt or rotation of the object as a “cost” — the lower the cost, the more likely that that rotation creates an accurate match between features. Each rotation and associated cost is represented in a topographic map of sorts, made up of multiple hills and valleys, with lower elevations associated with lower cost.
But Carlone says this can easily confuse an algorithm, especially if there are multiple valleys and no discernible lowest point representing the true, exact match between a particular rotation of an object and the object in a point cloud. Instead, the team developed a “convex relaxation” algorithm that simplifies the topographic map, with one single valley representing the optimal rotation. In this way, the algorithm is able to quickly identify the rotation that defines the orientation of the object in the point cloud.
With their approach, the team was able to quickly and accurately identify three different objects — a bunny, a dragon, and a Buddha — hidden in point clouds of increasing density. They were also able to identify objects in real-life scenes, including a living room, in which the algorithm quickly was able to spot a cereal box and a baseball hat.
Carlone says that because the approach is able to work in “polynomial time,” it can be easily scaled up to analyze even denser point clouds, resembling the complexity of sensor data for driverless cars, for example.
“Navigation, collaborative manufacturing, domestic robots, search and rescue, and self-driving cars is where we hope to make an impact,” Carlone says.
This research was supported in part by the Army Research Laboratory, the Office of Naval Research, and the Google Daydream Research Program.
Tackling sustainability and urbanization with AI-enabled furniture
At the turn of the twentieth century, the swelling populations of newly arrived immigrants in New York City’s Lower East Side reached a boiling point, forcing the City to pass the 1901 Tenement House Act. Recalling this legislation, New York City’s Mayor’s Office recently responded to its own modern housing crisis by enabling developers for the first time to build affordable micro-studio apartments of 400 square feet. One of the primary drivers of allocating tens of thousands of new micro-units is the adoption of innovative design and construction technologies that enable modular and flexible housing options. As Mayor de Blasio affirmed, “Housing New York 2.0 commits us to creating 25,000 affordable homes a year and 300,000 homes by 2026. Making New York a fairer city for today and for future generations depends on it.”
Urban space density is not just a New York City problem, but a world health concern. According to the United Nations, more than half of the Earth’s population currently resides in cities and this is projected to climb to close to three-quarters by 2050. In response to this alarming trend the UN drafted the 2030 Agenda for Sustainable Development. Stressing the importance of such an effort, UN Deputy Secretary Amina J. Mohammed declared, “It is clear that it is in cities where the battle for sustainability will be won or lost. Cities are the organizing mechanisms of the twenty-first century. They are where young people in all parts of the world flock to develop new skills, attain new jobs, and find opportunities in which to innovate and create their futuresThe 2030 Agenda for Sustainable Development is the most ambitious agenda ever set forth for humanity.”
Absent from the UN study is utilizing mechatronics to address the challenges of urbanization. For example, robots have been deployed on construction sites in China to rapidly print building materials. There are also a handful of companies utilizing machines to cost effectively produce modular homes with the goal of replacing mud-huts and sheet metal shanties. However, the progress of automating low-to-middle income housing has been slow going until this week. Ikea, the world’s largest furniture retailer which specializes in low cost decorating solutions, announced on Tuesday the launch of Rognan – a morphing robotic furniture system for the micro-home. Collaborating with the Swedish design powerhouse is hardware startup, Ori Living. The MIT spin-out first introduced its chameleon-changing furniture platform two years ago with an expandable wardrobe that quickly shifted from bookcase/home office to walk-in closet at the touch of a button. Today such systems can be bought through the company’s website for a price upwards of $5,000. It is expected that the partnership with IKEA will bring enormous economies of scale with the mass production of its products.
The first markets targeted by IKEA next year for Rognan are the cramped neighborhoods of Hong Kong and Japan, where the average citizen lives in 160 square feet. Seana Strawn, IKEA’s product developer for new innovations, explains “Instead of making the furniture smaller, we transform the furniture to the function that you need at that time When you sleep, you do not need your sofa. When you use your wardrobe, you do not need your bed etc.”
Ori founder, Hasier Larrea, elaborates on his use of machine learning to size the space to the occupants requirements. “Every floor is different, so you need a product that’s smart enough to know this, and make a map of the floor,” describes Larrea. By using sensors to create an image of the space, the robot seamlessly transforms from closet-to-bed-to-desk-to-media center. To better understand the marketability of such a system, I polled a close friend who analyzes such innovations for a large Wall Street bank. This potential customer remarked that he will wait to purchase his own Rognan until it can anticipate his living habits, automatically sense when it is time for work, play or bed.
Ori’s philosophy is enabling people to “live large in a small footprint.” Larrea professes that the only way to combat urbanization is thinking differently. As the founder exclaims, “We cannot keep designing spaces the same way we’ve been designing spaces 20 years ago, or keep using all the same furniture we were using in homes that were twice the size, or three times the size. We need to start thinking about furniture that adapts to us, and not the other way around.” Larrea’s credo can be heard in the words of former Tesla technologist Sankarshan Murthy who aims to revolutionize interior design with robotic ceiling dropping storage furniture. Murthy’s startup, Bubblebee Spaces, made news last April with the announcement of a $4.3 million funding round led by Loup Ventures. Similar to a Broadway set change, Bubblebee lowers and hoists up wooden cases on an as needed basis, complete with an iPhone or iPad controller. “Instead of square feet you start looking at real estate in volume. You are already paying for all this air and ceiling space you are not using. We unlock that for you,” brags Murthy. Ori is also working on a modern Murphy Bed that lowers from ceiling, as the company’s press release stated last November its newest product, “a bed that seamlessly lowers from the ceiling, or lifts to the ceiling to reveal a stylish sofa” all at the beckon of one’s Alexa device.
In 1912, William Murphy received his patent for the “Disappearing Bed. Today’s robotic furniture, now validated by the likes of Ikea, could be the continuation of his vision. Several years ago, MIT student Daniel Leithinger first unveiled a shape shifting table. As Leithinger reminisces, “We were inspired by those pinscreen toys where you press your hand on one end, and it shows on the other side.” While it was never intended to be commercialized, the inventor was blown away by the emails he received. “One person said we should apply it to musical interfaces and another person said it would be great to use to help blind children understand art and other things. These are things we didn’t even think about,” shares Leithinger. As Ori and Bubblebee are working diligently to replace old couch springs for new gears and actuators, the benefits of such technology are sure to go beyond just better storage as we enter the new age of the AI Home.
1000x faster data augmentation
In this blog post we introduce Population Based Augmentation (PBA), an algorithm that quickly and efficiently learns a state-of-the-art approach to augmenting data for neural network training. PBA matches the previous best result on CIFAR and SVHN but uses one thousand times less compute, enabling researchers and practitioners to effectively learn new augmentation policies using a single workstation GPU. You can use PBA broadly to improve deep learning performance on image recognition tasks.
We discuss the PBA results from our recent paper and then show how to easily run PBA for yourself on a new data set in the Tune framework.
Why should you care about data augmentation?
Recent advances in deep learning models have been largely attributed to the quantity and diversity of data gathered in recent years. Data augmentation is a strategy that enables practitioners to significantly increase the diversity of data available for training models, without actually collecting new data. Data augmentation techniques such as cropping, padding, and horizontal flipping are commonly used to train large neural networks. However, most approaches used in training neural networks only use basic types of augmentation. While neural network architectures have been investigated in depth, less focus has been put into discovering strong types of data augmentation and data augmentation policies that capture data invariances.
An image of the number “3” in original form and with basic augmentations applied.
Recently, Google has been able to push the state-of-the-art accuracy on datasets such as CIFAR-10 with AutoAugment, a new automated data augmentation technique. AutoAugment has shown that prior work using just applying a fixed set of transformations like horizontal flipping or padding and cropping leaves potential performance on the table. AutoAugment introduces 16 geometric and color-based transformations, and formulates an augmentation policy that selects up to two transformations at certain magnitude levels to apply to each batch of data. These higher performing augmentation policies are learned by training models directly on the data using reinforcement learning.
What’s the catch?
AutoAugment is a very expensive algorithm which requires training 15,000 models to convergence to generate enough samples for a reinforcement learning based policy. No computation is shared between samples, and it costs 15,000 NVIDIA Tesla P100 GPU hours to learn an ImageNet augmentation policy and 5,000 GPU hours to learn an CIFAR-10 one. For example, if using Google Cloud on-demand P100 GPUs, it would cost about \$7,500 to discover a CIFAR policy, and \$37,500 to discover an ImageNet one! Therefore, a more common use case when training on a new dataset would be to transfer a pre-existing published policy, which the authors show works relatively well.
Population Based Augmentation
Our formulation of data augmentation policy search, Population Based Augmentation (PBA), reaches similar levels of test accuracy on a variety of neural network models while utilizing three orders of magnitude less compute. We learn an augmentation policy by training several copies of a small model on CIFAR-10 data, which takes five hours using a NVIDIA Titan XP GPU. This policy exhibits strong performance when used for training from scratch on larger model architectures and with CIFAR-100 data.
Relative to the several days it takes to train large CIFAR-10 networks to convergence, the cost of running PBA beforehand is marginal and significantly enhances results. For example, training a PyramidNet model on CIFAR-10 takes over 7 days on a NVIDIA V100 GPU, so learning a PBA policy adds only 2% precompute training time overhead. This overhead would be even lower, under 1%, for SVHN.
CIFAR-10 test set error between PBA, AutoAugment, and the baseline which only uses horizontal flipping, padding, and cropping, on WideResNet, Shake-Shake, and PyramidNet+ShakeDrop models. PBA is significantly better than the baseline and on-par with AutoAugment.
PBA leverages the Population Based Training algorithm to generate an augmentation policy schedule which can adapt based on the current epoch of training. This is in contrast to a fixed augmentation policy that applies the same transformations independent of the current epoch number.
This allows an ordinary workstation user to easily experiment with the search algorithm and augmentation operations. One interesting use case would be to introduce new augmentation operations, perhaps targeted towards a particular dataset or image modality, and be able to quickly produce a tailored, high performing augmentation schedule. Through ablation studies, we have found that the learned hyperparameters and schedule order are important for good results.
How is the augmentation schedule learned?
We use Population Based Training with a population of 16 small WideResNet models. Each worker in the population will learn a different candidate hyperparameter schedule. We transfer the best performing schedule to train larger models from scratch, from which we derive our test error metrics.
Overview of Population Based Training, which discovers hyperparameter schedules by training a population of neural networks. It combines random search (explore) with the copying of model weights from high performing workers (exploit). Source
The population models are trained on the target dataset of interest starting with all augmentation hyperparameters set to 0 (no augmentations applied). At frequent intervals, an “exploit-and-explore” process “exploits” high performing workers by copying their model weights to low performing workers, and then “explores” by perturbing the hyperparameters of the worker. Through this process, we are able to share compute heavily between the workers and target different augmentation hyperparameters at different regions of training. Thus, PBA is able to avoid the cost of training thousands of models to convergence in order to reach high performance.
Example and Code
We leverage Tune’s built-in implementation of PBT to make it straightforward to use PBA.
import ray
def explore(config):
"""Custom PBA function to perturb augmentation hyperparameters."""
...
ray.init()
pbt = ray.tune.schedulers.PopulationBasedTraining(
time_attr="training_iteration",
reward_attr="val_acc",
perturbation_interval=3,
custom_explore_fn=explore)
train_spec = {...} # Things like file paths, model func, compute.
ray.tune.run_experiments({"PBA": train_spec}, scheduler=pbt)
We call Tune’s implementation of PBT with our custom exploration function. This will create 16 copies of our WideResNet model and train them time-multiplexed. The policy schedule used by each copy is saved to disk and can be retrieved after termination to use for training new models.
You can run PBA by following the README at: https://github.com/arcelien/pba. On a Titan XP, it only requires one hour to learn a high performing augmentation policy schedule on the SVHN dataset. It is also easy to use PBA on a custom dataset as well: simply define a new dataloader and everything else falls into place.
Big thanks to Daniel Rothchild, Ashwinee Panda, Aniruddha Nrusimha, Daniel Seita, Joseph Gonzalez, and Ion Stoica for helpful feedback while writing this post. Feel free to get in touch with us on Github!
This post is based on the following paper to appear in ICML 2019 as an oral presentation: