Page 1 of 575
1 2 3 575

Classical Indian dance inspires new ways to teach robots how to use their hands

Researchers at the University of Maryland, Baltimore County (UMBC) have extracted the building blocks of precise hand gestures used in the classical Indian dance form Bharatanatyam—and found a richer "alphabet" of movement compared to natural grasps. The work could improve how we teach hand movements to robots and offer humans better tools for physical therapy.

Enterprise AI World 2025 Notes from the Field: Evolving AI from Chatbots to Colleagues That Make An Impact

Enterprise AI World 2025, co-located with KMWorld 2025, offered a clear signal this year: the era of “drop a chatbot on the intranet and call it transformation” is over. The conversations shifted toward AI that sits inside real work—capturing tacit […]

The post Enterprise AI World 2025 Notes from the Field: Evolving AI from Chatbots to Colleagues That Make An Impact appeared first on TechSpective.

‘OCTOID,’ a soft robot that changes color and moves like an octopus

Underwater octopuses change their body color and texture in the blink of an eye to blend perfectly into their surroundings when evading predators or capturing prey. They transform their bodies to match the colors of nearby corals or seaweed, turning blue or red, and move by softly curling their arms or snatching prey.

Teaching robot policies without new demonstrations: interview with Jiahui Zhang and Jesse Zhang

The ReWiND method, which consists of three phases: learning a reward function, pre-training, and using the reward function and pre-trained policy to learn a new language-specified task online.

In their paper ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations, which was presented at CoRL 2025, Jiahui Zhang, Yusen Luo, Abrar Anwar, Sumedh A. Sontakke, Joseph J. Lim, Jesse Thomason, Erdem Bıyık and Jesse Zhang introduce a framework for learning robot manipulation tasks solely from language instructions without per-task demonstrations. We asked Jiahui Zhang and Jesse Zhang to tell us more.

What is the topic of the research in your paper, and what problem were you aiming to solve?

Our research addresses the problem of enabling robot manipulation policies to solve novel, language-conditioned tasks without collecting new demonstrations for each task. We begin with a small set of demonstrations in the deployment environment, train a language-conditioned reward model on them, and then use that learned reward function to fine-tune the policy on unseen tasks, with no additional demonstrations required.

Tell us about ReWiND – what are the main features and contributions of this framework?

ReWiND is a simple and effective three-stage framework designed to adapt robot policies to new, language-conditioned tasks without collecting new demonstrations. Its main features and contributions are:

  1. Reward function learning in the deployment environment
    We first learn a reward function using only five demonstrations per task from the deployment environment.

    • The reward model takes a sequence of images and a language instruction, and predicts per-frame progress from 0 to 1, giving us a dense reward signal instead of sparse success/failure.
    • To expose the model to both successful and failed behaviors without having to collect failed behavior demonstrations, we introduce a video rewind augmentation: For a video segmentation V(1:t), we choose an intermediate point t1. We reverse the segment V(t1:t) to create V(t:t1) and append it back to the original sequence. This generates a synthetic sequence that resembles “making progress then undoing progress,” effectively simulating failed attempts.
    • This allows the reward model to learn a smoother and more accurate dense reward signal, improving generalization and stability during policy learning.
  2. Policy pre-training with offline RL
    Once we have the learned reward function, we use it to relabel the small demonstration dataset with dense progress rewards. We then train a policy offline using these relabeled trajectories.
  3. Policy fine-tuning in the deployment environment
    Finally, we adapt the pre-trained policy to new, unseen tasks in the deployment environment. We freeze the reward function and use it as the feedback for online reinforcement learning. After each episode, the newly collected trajectory is relabeled with dense rewards from the reward model and added to the replay buffer. This iterative loop allows the policy to continually improve and adapt to new tasks without requiring any additional demonstrations.

Could you talk about the experiments you carried out to test the framework?

We evaluate ReWiND in both the MetaWorld simulation environment and the Koch real-world setup. Our analysis focuses on two aspects: the generalization ability of the reward model and the effectiveness of policy learning. We also compare how well different policies adapt to new tasks under our framework, demonstrating significant improvements over state-of-the-art methods.

(Q1) Reward generalization – MetaWorld analysis
We collect a metaworld dataset in 20 training tasks, each task include 5 demos, and 17 related but unseen tasks for evaluation. We train the reward function with the metaworld dataset and a subset of the OpenX dataset.

We compare ReWiND to LIV[1], LIV-FT, RoboCLIP[2], VLC[3], and GVL[4]. For generalization to unseen tasks, we use video–language confusion matrices. We feed the reward model video sequences paired with different language instructions and expect the correctly matched video–instruction pairs to receive the highest predicted rewards. In the confusion matrix, this corresponds to the diagonal entries having the strongest (darkest) values, indicating that the reward function reliably identifies the correct task description even for unseen tasks.

Video-language reward confusion matrix. See the paper for more information.

For demo alignment, we measure the correlation between the reward model’s predicted progress and the actual time steps in successful trajectories using Pearson r and Spearman ρ. For policy rollout ranking, we evaluate whether the reward function correctly ranks failed, near-success, and successful rollouts. Across these metrics, ReWiND significantly outperforms all baselines—for example, it achieves 30% higher Pearson correlation and 27% higher Spearman correlation than VLC on demo alignment, and delivers about 74% relative improvement in reward separation between success categories compared with the strongest baseline LIV-FT.

(Q2) Policy learning in simulation (MetaWorld)
We pre-train on the same 20 tasks and then evaluate RL on 8 unseen MetaWorld tasks for 100k environment steps.

Using ReWiND rewards, the policy achieves an interquartile mean (IQM) success rate of approximately 79%, representing a ~97.5% improvement over the best baseline. It also demonstrates substantially better sample efficiency, achieving higher success rates much earlier in training.

(Q3) Policy learning in real robot (Koch bimanual arms)
Setup: a real-world tabletop bimanual Koch v1.1 system with five tasks, including in-distribution, visually cluttered, and spatial-language generalization tasks.
We use 5 demos for the reward model and 10 demos for the policy in this more challenging setting. With about 1 hour of real-world RL (~50k env steps), ReWiND improves average success from 12% → 68% (≈5× improvement), while VLC only goes from 8% → 10%.

Are you planning future work to further improve the ReWiND framework?

Yes, we plan to extend ReWiND to larger models and further improve the accuracy and generalization of the reward function across a broader range of tasks. In fact, we already have a workshop paper extending ReWiND to larger-scale models.

In addition, we aim to make the reward model capable of directly predicting success or failure, without relying on the environment’s success signal during policy fine-tuning. Currently, even though ReWiND provides dense rewards, we still rely on the environment to indicate whether an episode has been successful. Our goal is to develop a fully generalizable reward model that can provide both accurate dense rewards and reliable success detection on its own.

References

[1] Yecheng Jason Ma et al. “Liv: Language-image representations and rewards for robotic control.” International Conference on Machine Learning. PMLR, 2023.
[2] Sumedh Sontakke et al. “Roboclip: One demonstration is enough to learn robot policies.” Advances in Neural Information Processing Systems 36 (2023): 55681-55693.
[3] Minttu Alakuijala et al. “Video-language critic: Transferable reward functions for language-conditioned robotics.” arXiv:2405.19988 (2024).
[4] Yecheng Jason Ma et al. “Vision language models are in-context value learners.” The Thirteenth International Conference on Learning Representations. 2024.

About the authors

Jiahui Zhang is a Ph.D. student in Computer Science at the University of Texas at Dallas, advised by Prof. Yu Xiang. He received his M.S. degree from the University of Southern California, where he worked with Prof. Joseph Lim and Prof. Erdem Bıyık.

Jesse Zhang is a postdoctoral researcher at the University of Washington, advised by Prof. Dieter Fox and Prof. Abhishek Gupta. He completed his Ph.D. at the University of Southern California, advised by Prof. Jesse Thomason and Prof. Erdem Bıyık at USC, and Prof. Joseph J. Lim at KAIST.

Aerial microrobot can fly as fast as a bumblebee

In the future, tiny flying robots could be deployed to aid in the search for survivors trapped beneath the rubble after a devastating earthquake. Like real insects, these robots could flit through tight spaces larger robots can't reach, while simultaneously dodging stationary obstacles and pieces of falling rubble.

New control system teaches soft robots the art of staying safe

Imagine having a continuum soft robotic arm bend around a bunch of grapes or broccoli, adjusting its grip in real time as it lifts the object. Unlike traditional rigid robots that generally aim to avoid contact with the environment as much as possible and stay far away from humans for safety reasons, this arm senses subtle forces, stretching and flexing in ways that mimic more of the compliance of a human hand. Its every motion is calculated to avoid excessive force while achieving the task efficiently.

New robotic eyeball could enhance visual perception of embodied AI

Embodied artificial intelligence (AI) systems are robotic agents that rely on machine learning algorithms to sense their surroundings, plan their actions and execute them. A key aspect of these systems are visual perception modules, which allow them to analyze images captured by cameras and interpret them.

Researchers develop new method for modeling complex sensor systems

A research team at Kumamoto University (Japan) has unveiled a new mathematical framework that makes it possible to accurately model systems using multiple sensors that operate at different sensing rates. This breakthrough could pave the way for safer autonomous vehicles, smarter robots, and more reliable sensor networks.

Optimizing Wheel Drives for AGVs and AMRs: What OEMs Need to Know About Motion Control

The motor and actuator selection behind each wheel can make or break the success of the entire system. In this post, we’ll explore the core challenges in mobile robot drive systems and how customized motion control solutions from DINGS' Motion USA can help you meet them.

AUCTION – FACILITY CLOSURE – MAJOR ROBOTICS AUTOMATION COMPANY

BTM Industrial is a leading asset disposition company assisting manufacturing companies with their surplus asset needs. Founded in 2011, it is a fully licensed-and-regulated, commission-based auction and liquidation company. The company’s full asset disposition programs provide customers with the ability to efficiently manage all aspects of their surplus and achieve higher value.

Artificial tendons give muscle-powered robots a boost

Our muscles are nature's actuators. The sinewy tissue is what generates the forces that make our bodies move. In recent years, engineers have used real muscle tissue to actuate "biohybrid robots" made from both living tissue and synthetic parts. By pairing lab-grown muscles with synthetic skeletons, researchers are engineering a menagerie of muscle-powered crawlers, walkers, swimmers, and grippers.
Page 1 of 575
1 2 3 575