By Laura Rosado | MIT News correspondent
Austen Roberson’s favorite class at MIT is 2.S007 (Design and Manufacturing I-Autonomous Machines), in which students design, build, and program a fully autonomous robot to accomplish tasks laid out on a themed game board.
“The best thing about that class is everyone had a different idea,” says Roberson. “We all had the same game board and the same instructions given to us, but the robots that came out of people’s minds were so different.”
The game board was Mars-themed, with a model shuttle that could be lifted to score points. Roberson’s robot, nicknamed Tank Evans after a character from the movie “Surf’s Up,” employed a clever strategy to accomplish this task. Instead of spinning the gears that would raise the entire mechanism, Roberson realized a claw gripper could wrap around the outside of the shuttle and lift it manually.
“That wasn’t the intended way,” says Roberson, but his outside-of-the-box strategy ending up winning him the competition at the conclusion of the class, which was part of the New Engineering Education Transformation (NEET) program. “It was a really great class for me. I get a lot of gratification out of building something with my hands and then using my programming and problem-solving skills to make it move.”
Roberson, a senior, is majoring in aerospace engineering with a minor in computer science. As his winning robot demonstrates, he thrives at the intersection of both fields. He references the Mars Curiosity Rover as the type of project that inspires him; he even keeps a Lego model of Curiosity on his desk.
“You really have to trust that the hardware you’ve made is up to the task, but you also have to trust your software equally as much,” says Roberson, referring to the challenges of operating a rover from millions of miles away. “Is the robot going to continue to function after we’ve put it into space? Both of those things have to come together in such a perfect way to make this stuff work.”
Outside of formal classwork, Roberson has pursued multiple research opportunities at MIT that blend his academic interests. He’s worked on satellite situational awareness with the Space Systems Laboratory, tested drone flight in different environments with the Aerospace Controls Laboratory, and is currently working on zero-shot machine learning for anomaly detection in big datasets with the Mechatronics Research Laboratory.
Even while tackling these challenging technical problems head-on, Roberson is also actively thinking about the social impact of his work. He takes classes in the Program on Science, Technology, and Society, which has taught him not only how societal change throughout history has been driven by technological advancements, but also how to be a thoughtful engineer in his own career.
“Learning about the social implications of the technology you’re working on is really important,” says Roberson, acknowledging that his work in automation and machine learning needs to address these questions. “Sometimes, we get caught up in technology for technology’s sake. How can we take these same concepts and bring them to people to help in a tangible, physical way? How have we come together as a scientific community to really affect social change, and what can we do in the future to continue affecting that social change?”
Roberson is already working through what these questions mean for him personally. He’s been a member of the National Society of Black Engineers (NSBE) throughout his entire college experience, which includes serving on the executive board for two years. He’s helped to organize workshops focused on everything from interview preparation to financial literacy, as well as social events to build community among members.
“The mission of the organization is to increase the number of culturally responsible Black engineers that excel academically, succeed professionally, and positively impact the community,” says Roberson. “My goal with NSBE was to be able to provide a resource to help everybody get to where they wanted to be, to be the vehicle to really push people to be their best, and to provide the resources that people needed and wanted to advance themselves professionally.”
In fact, one of his most memorable MIT experiences is the first conference he attended as a member of NSBE.
“Being able to see all different these people from all of these different schools able to come together as a family and just talk to each other, it’s a very rewarding experience,” Roberson says. “It’s important to be able to surround yourself with people who have similar professional goals and share similar backgrounds and experiences with you. It’s definitely the proudest I’ve been of any club at MIT.”
Looking toward his own career, Roberson wants to find a way to work on fast-paced, cutting-edge technologies that move society forward in a positive way.
“Whether that be space exploration or something else, all I can hope for is that I’m making an impact, and that I’m making a difference in people’s lives,” says Roberson. “I think learning about space is learning about ourselves as well. The more you can learn about the stuff that’s out there, you can take those lessons to reflect on what’s down here as well.”
Teleoperation is one of the longest-standing application fields in robotics. While full autonomy is still work in progress, the possibility to remotely operate a robot has already opened scenarios where humans can act in risky environments without endangering their own safety, such as when defusing explosives or decommissioning nuclear waste. It also allows one to be present and act even at great distance: underwater, in space, or inside a patient miles away from the surgeon. These are all critical applications, where skilled and qualified operators control the robot after receiving specific training to learn to use the system safely.
Teleoperation for everyone?
The recent pandemic has yet made even more apparent the need for immersive telepresence and remote action also for non-expert users: not only could teleoperated robots take vitals or bring drugs to infectious patients, but we could assist our elderly living far away with chores like moving heavy stuff, or cooking, for example. Also, numerous physical jobs could be executed from home.
The recent ANA-Xprize finals have shown how far teleoperation can go (see this impressive video of the winning team), but in such situations both the perceptual and control load lie entirely on the operator. This can be quite taxing on a cognitive level: both perception and action are mediated, by cameras and robotic arms respectively, reducing the user’s situation awareness and natural eye-hand coordination. While robot sensing capabilities and actuators have undergone relevant technological progress, the interface with the user still lacks intuitive solutions facilitating the operator’s job (Rea & Seo, 2022).
Human and robot joining forces
Shared control has gained popularity in recent years, as an approach championing human-machine cooperation: low-level motor control is carried out by the robot, while the human is focused on high-level action planning. To achieve such a blend, the robotic system still needs a timely way to infer the operator intention, so as to consequently assist with the execution. Usually, motor intentions are inferred by tracking arm movements or motion control commands (if the robot is operated by means of a joystick), but especially during object manipulation the hand is tightly following information collected by the gaze. In the last decades, increasing evidence in eye-hand coordination studies has shown that gaze reliably anticipates the hand movement target (Hayhoe et al., 2012), providing an early cue about human intention.
Gaze and motion features to estimate intentions
In a contribution presented at IROS 2022 last month (Belardinelli et al., 2022), we introduced an intention estimation model that relies on both gaze and motion features. We collected pick-and-place sequences in a virtual environment, where participants could operate two robotic grippers to grasp objects on a cluttered table. Motion controllers were used to track arm motions and to grasp objects by button press. Eye movements were tracked by the eye-tracker embedded in the virtual reality headset.
Gaze features were computed by defining a Gaussian distribution centered at the gaze position and taking for each object the likelihood for it to be the target of visual attention, which was given by the cumulative distribution collected by the object bounding box. For the motion features, the hand pose and velocity were used to estimate the hand’s current trajectory which was compared to an estimated optimal trajectory to each object. The normalized similarity between the two trajectories defined the likelihood of each object to be the target of the current movement.
Figure 1: Gaze features (top) and motion features (bottom) used for intention estimation. In both videos the object highlighted in green is the most likely target of visual attention and of hand movement, respectively.
These features along with the binary grasping state were used to train two Gaussian Hidden Markov Models, one on pick and one on place sequences. For 12 different intentions (picking of 6 different objects and placing at 6 different locations) the general accuracy (F1 score) was above 80%, even for occluded objects. Importantly, for both actions already 0.5 seconds before the end of the movement a prediction with over 90% accuracy was available for at least 70% of the observations. This would allow for an assisting plan to be instantiated and executed by the robot.
We also conducted an ablation study to determine the contribution of different feature combinations. While the models with gaze, motion, and grasping features performed better in the cross validation, the improvement with respect to only gaze and grasping state was minimal. Even when checking obstacles nearby at first, in fact, the gaze was already on the target before the hand trajectory became sufficiently discriminative.
We also ascertained that our models could generalize from one hand to the other (when fed the corresponding hand motion features), hence the same models could be used to concurrently estimate each hand intention. By feeding each hand prediction to a simple rule-based framework, basic bimanual intentions could also be recognized. So, for example, reaching for an object with the left hand while the right hand is going to place the same object on the left hand is considered a bimanual handover.
Figure 2: Online intention estimation: the red frame denotes the current right-hand intention prediction, the green frame the left-hand prediction. Above the scene, the bimanual intention is shown in capital letters.
Such an intention estimation model could help an operator to execute such manipulations without focusing on selecting the parameters for the exact motor execution of the pick and place, something we don’t usually do consciously in natural eye-hand coordination, since we automated such cognitive processes. For example, once a grasping intention is estimated with enough confidence, the robot could autonomously select the best grasp and grasping position and execute the grasp, relieving the operator of carefully monitoring a grasp without tactile feedback and possibly with inaccurate depth estimation.
Further, even if in our setup motion features were not decisive for early intention prediction, they might play a larger role in more complex settings and when extending the spectrum of bimanual manipulations.
Combined with suitable shared control policies and feedback visualizations, such systems could also enable untrained operators to control robotic manipulators transparently and effectively for longer times, improving the general mental workload of remote operation.
Belardinelli, A., Kondapally, A. R., Ruiken, D., Tanneberg, D., & Watabe, T. (2022). Intention estimation from gaze and motion features for human-robot shared-control object manipulation. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022.
Hayhoe, M. M., McKinney, T., Chajka, K., & Pelz, J. B. (2012). Predictive eye movements in natural vision. Experimental brain research, 217(1), 125-136.
Rea, D. J., & Seo, S. H. (2022). Still Not Solved: A Call for Renewed Focus on User-Centered Teleoperation Interfaces. Frontiers in Robotics and AI, 9.
Earlier this month, Candy Crush celebrated its decade birthday by hosting a free party in lower Manhattan. The climax culminated with a drone light display of 500 Unmanned Ariel Vehicles (UAVs) illustrating the whimsical characters of the popular mobile game over the Hudson. Rather than applauding the decision, New York lawmakers ostracized the avionic wonders to Jersey. In the words of Democratic State Senator, Brad Hoylman, “Nobody owns New York City’s skyline – it is a public good and to allow a private company to reap profits off it is in itself offensive.” The complimentary event followed the model of Macy’s New York fireworks that have illuminated the Hudson skies since 1958. Unlike the department store’s pyrotechnics that release dangerous greenhouse gases into the atmosphere, drones are a quiet climate-friendly choice. Still, Luddite politicians plan to introduce legislation to ban the technology as a public nuisance, citing its impact on migratory birds, which are often more spooked by skyscrapers in Hoylman’s district.
Beyond aerial tricks, drones are now being deployed in novel ways to fill the labor gap of menial jobs that have not returned since the pandemic. Founded in 2018, Andrew Ashur’s Lucid Drones has been power-washing buildings throughout the United States for close to five years. As the founder told me: “I saw window washers hanging off the side of the building on a swing stage and it was a mildly windy day. You saw this platform get caught in the wind and all of a sudden the platform starts slamming against the side of the building. The workers were up there, hanging on for dear life, and I remember having two profound thoughts in this moment. The first one, thank goodness that’s not me up there. And then the second one was how can we leverage technology to make this a safer, more efficient job?” At the time, Ashur was a junior at Davidson College playing baseball. The self-starter knew he was on to a big market opportunity.
Each year, more than 160,000 emergency room injuries, and 300 deaths, are caused by falling off of ladders in the United States. Entrepreneurs like Ashur understood that drones were uniquely qualified to free humans from such dangerous work. This first required building a sturdy tethered quadcopter, capable of a 300 psi flow rate, connected to a tank for power and cleaning fluid for less than the cost of the annual salary of one window cleaner. After overcoming the technical hurdle, the even harder task was gaining sales traction. Unlike many hardware companies that set out to disrupt the market and sell directly to end customers; Lucid partnered with existing building maintenance operators. “Our primary focus is on existing cleaning companies. And the way to think about it is we’re now the shiniest tool in their toolkit that helps them do more jobs with less time and less liability to make more revenue,” explains Ashur. This relationship was further enhanced this past month with the announcement of a partnership with Sunbelt Rentals, servicing its 1,000 locations throughout California, Florida, and Texas. Lucid’s drones are now within driving distance of the majority of the 86,000 facade cleaning companies in America.
According to Commercial Buildings Energy Consumption Survey, there are 5.9 million commercial office buildings in the United States, with an average height of 16 floors. This means there is room for many robot cleaning providers. Competing directly with Lucid are several other drone operators, including Apellix, Aquiline Drones, Alpha Drones, and a handful of local upstarts. In addition, there are several winch-powered companies, such as Skyline Robotics, HyCleaner, Serbot, Erlyon, Kite Robotics, and SkyPro. Facade cleaning is ripe for automation as it is a dangerous, costly, repetitive task that can be safely accomplished by an uncrewed system. As Ashur boasts, “You improve that overall profitability because it’s fewer labor hours. You’ve got lower insurance on ground cleaner versus an above ground cleaner as well as the other equipment.” His system being tethered, ground-based, and without any ladders is the safest way to power wash a multistory office building. He elaborated further on the cost savings, “It lowers insurance cost, especially when you look at how workers comp is calculated… we had a customer, one of their workers missed the bottom rung of the ladder, the bottom rung, he shattered his ankle. OSHA classifies it as a hazardous workplace injury. Third workers comp rates are projected to increase by an annual $25,000 over the next five years. So it’s a six-figure expense for just that one business from missing one single bottom rung of the ladder and unfortunately, you hear stories of people falling off a roof or other terrible accidents that are life changing or in some cases life lost. So that’s the number one thing you get to eliminate with the drone by having people on the ground.”