Page 1 of 573
1 2 3 573

‘OCTOID,’ a soft robot that changes color and moves like an octopus

Underwater octopuses change their body color and texture in the blink of an eye to blend perfectly into their surroundings when evading predators or capturing prey. They transform their bodies to match the colors of nearby corals or seaweed, turning blue or red, and move by softly curling their arms or snatching prey.

Teaching robot policies without new demonstrations: interview with Jiahui Zhang and Jesse Zhang

The ReWiND method, which consists of three phases: learning a reward function, pre-training, and using the reward function and pre-trained policy to learn a new language-specified task online.

In their paper ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations, which was presented at CoRL 2025, Jiahui Zhang, Yusen Luo, Abrar Anwar, Sumedh A. Sontakke, Joseph J. Lim, Jesse Thomason, Erdem Bıyık and Jesse Zhang introduce a framework for learning robot manipulation tasks solely from language instructions without per-task demonstrations. We asked Jiahui Zhang and Jesse Zhang to tell us more.

What is the topic of the research in your paper, and what problem were you aiming to solve?

Our research addresses the problem of enabling robot manipulation policies to solve novel, language-conditioned tasks without collecting new demonstrations for each task. We begin with a small set of demonstrations in the deployment environment, train a language-conditioned reward model on them, and then use that learned reward function to fine-tune the policy on unseen tasks, with no additional demonstrations required.

Tell us about ReWiND – what are the main features and contributions of this framework?

ReWiND is a simple and effective three-stage framework designed to adapt robot policies to new, language-conditioned tasks without collecting new demonstrations. Its main features and contributions are:

  1. Reward function learning in the deployment environment
    We first learn a reward function using only five demonstrations per task from the deployment environment.

    • The reward model takes a sequence of images and a language instruction, and predicts per-frame progress from 0 to 1, giving us a dense reward signal instead of sparse success/failure.
    • To expose the model to both successful and failed behaviors without having to collect failed behavior demonstrations, we introduce a video rewind augmentation: For a video segmentation V(1:t), we choose an intermediate point t1. We reverse the segment V(t1:t) to create V(t:t1) and append it back to the original sequence. This generates a synthetic sequence that resembles “making progress then undoing progress,” effectively simulating failed attempts.
    • This allows the reward model to learn a smoother and more accurate dense reward signal, improving generalization and stability during policy learning.
  2. Policy pre-training with offline RL
    Once we have the learned reward function, we use it to relabel the small demonstration dataset with dense progress rewards. We then train a policy offline using these relabeled trajectories.
  3. Policy fine-tuning in the deployment environment
    Finally, we adapt the pre-trained policy to new, unseen tasks in the deployment environment. We freeze the reward function and use it as the feedback for online reinforcement learning. After each episode, the newly collected trajectory is relabeled with dense rewards from the reward model and added to the replay buffer. This iterative loop allows the policy to continually improve and adapt to new tasks without requiring any additional demonstrations.

Could you talk about the experiments you carried out to test the framework?

We evaluate ReWiND in both the MetaWorld simulation environment and the Koch real-world setup. Our analysis focuses on two aspects: the generalization ability of the reward model and the effectiveness of policy learning. We also compare how well different policies adapt to new tasks under our framework, demonstrating significant improvements over state-of-the-art methods.

(Q1) Reward generalization – MetaWorld analysis
We collect a metaworld dataset in 20 training tasks, each task include 5 demos, and 17 related but unseen tasks for evaluation. We train the reward function with the metaworld dataset and a subset of the OpenX dataset.

We compare ReWiND to LIV[1], LIV-FT, RoboCLIP[2], VLC[3], and GVL[4]. For generalization to unseen tasks, we use video–language confusion matrices. We feed the reward model video sequences paired with different language instructions and expect the correctly matched video–instruction pairs to receive the highest predicted rewards. In the confusion matrix, this corresponds to the diagonal entries having the strongest (darkest) values, indicating that the reward function reliably identifies the correct task description even for unseen tasks.

Video-language reward confusion matrix. See the paper for more information.

For demo alignment, we measure the correlation between the reward model’s predicted progress and the actual time steps in successful trajectories using Pearson r and Spearman ρ. For policy rollout ranking, we evaluate whether the reward function correctly ranks failed, near-success, and successful rollouts. Across these metrics, ReWiND significantly outperforms all baselines—for example, it achieves 30% higher Pearson correlation and 27% higher Spearman correlation than VLC on demo alignment, and delivers about 74% relative improvement in reward separation between success categories compared with the strongest baseline LIV-FT.

(Q2) Policy learning in simulation (MetaWorld)
We pre-train on the same 20 tasks and then evaluate RL on 8 unseen MetaWorld tasks for 100k environment steps.

Using ReWiND rewards, the policy achieves an interquartile mean (IQM) success rate of approximately 79%, representing a ~97.5% improvement over the best baseline. It also demonstrates substantially better sample efficiency, achieving higher success rates much earlier in training.

(Q3) Policy learning in real robot (Koch bimanual arms)
Setup: a real-world tabletop bimanual Koch v1.1 system with five tasks, including in-distribution, visually cluttered, and spatial-language generalization tasks.
We use 5 demos for the reward model and 10 demos for the policy in this more challenging setting. With about 1 hour of real-world RL (~50k env steps), ReWiND improves average success from 12% → 68% (≈5× improvement), while VLC only goes from 8% → 10%.

Are you planning future work to further improve the ReWiND framework?

Yes, we plan to extend ReWiND to larger models and further improve the accuracy and generalization of the reward function across a broader range of tasks. In fact, we already have a workshop paper extending ReWiND to larger-scale models.

In addition, we aim to make the reward model capable of directly predicting success or failure, without relying on the environment’s success signal during policy fine-tuning. Currently, even though ReWiND provides dense rewards, we still rely on the environment to indicate whether an episode has been successful. Our goal is to develop a fully generalizable reward model that can provide both accurate dense rewards and reliable success detection on its own.

References

[1] Yecheng Jason Ma et al. “Liv: Language-image representations and rewards for robotic control.” International Conference on Machine Learning. PMLR, 2023.
[2] Sumedh Sontakke et al. “Roboclip: One demonstration is enough to learn robot policies.” Advances in Neural Information Processing Systems 36 (2023): 55681-55693.
[3] Minttu Alakuijala et al. “Video-language critic: Transferable reward functions for language-conditioned robotics.” arXiv:2405.19988 (2024).
[4] Yecheng Jason Ma et al. “Vision language models are in-context value learners.” The Thirteenth International Conference on Learning Representations. 2024.

About the authors

Jiahui Zhang is a Ph.D. student in Computer Science at the University of Texas at Dallas, advised by Prof. Yu Xiang. He received his M.S. degree from the University of Southern California, where he worked with Prof. Joseph Lim and Prof. Erdem Bıyık.

Jesse Zhang is a postdoctoral researcher at the University of Washington, advised by Prof. Dieter Fox and Prof. Abhishek Gupta. He completed his Ph.D. at the University of Southern California, advised by Prof. Jesse Thomason and Prof. Erdem Bıyık at USC, and Prof. Joseph J. Lim at KAIST.

Aerial microrobot can fly as fast as a bumblebee

In the future, tiny flying robots could be deployed to aid in the search for survivors trapped beneath the rubble after a devastating earthquake. Like real insects, these robots could flit through tight spaces larger robots can't reach, while simultaneously dodging stationary obstacles and pieces of falling rubble.

New control system teaches soft robots the art of staying safe

Imagine having a continuum soft robotic arm bend around a bunch of grapes or broccoli, adjusting its grip in real time as it lifts the object. Unlike traditional rigid robots that generally aim to avoid contact with the environment as much as possible and stay far away from humans for safety reasons, this arm senses subtle forces, stretching and flexing in ways that mimic more of the compliance of a human hand. Its every motion is calculated to avoid excessive force while achieving the task efficiently.

New robotic eyeball could enhance visual perception of embodied AI

Embodied artificial intelligence (AI) systems are robotic agents that rely on machine learning algorithms to sense their surroundings, plan their actions and execute them. A key aspect of these systems are visual perception modules, which allow them to analyze images captured by cameras and interpret them.

Researchers develop new method for modeling complex sensor systems

A research team at Kumamoto University (Japan) has unveiled a new mathematical framework that makes it possible to accurately model systems using multiple sensors that operate at different sensing rates. This breakthrough could pave the way for safer autonomous vehicles, smarter robots, and more reliable sensor networks.

Optimizing Wheel Drives for AGVs and AMRs: What OEMs Need to Know About Motion Control

The motor and actuator selection behind each wheel can make or break the success of the entire system. In this post, we’ll explore the core challenges in mobile robot drive systems and how customized motion control solutions from DINGS' Motion USA can help you meet them.

AUCTION – FACILITY CLOSURE – MAJOR ROBOTICS AUTOMATION COMPANY

BTM Industrial is a leading asset disposition company assisting manufacturing companies with their surplus asset needs. Founded in 2011, it is a fully licensed-and-regulated, commission-based auction and liquidation company. The company’s full asset disposition programs provide customers with the ability to efficiently manage all aspects of their surplus and achieve higher value.

Artificial tendons give muscle-powered robots a boost

Our muscles are nature's actuators. The sinewy tissue is what generates the forces that make our bodies move. In recent years, engineers have used real muscle tissue to actuate "biohybrid robots" made from both living tissue and synthetic parts. By pairing lab-grown muscles with synthetic skeletons, researchers are engineering a menagerie of muscle-powered crawlers, walkers, swimmers, and grippers.

Why companies don’t share AV crash data – and how they could

An illustration in intense colors in a gloomy mood showing a collage of two mirrored cars, street signs and mathematical symbolsAnton Grabolle / Autonomous Driving / Licenced by CC-BY 4.0

By Susan Kelley

Autonomous vehicles (AVs) have been tested as taxis for decades in San Francisco, Pittsburgh and around the world, and trucking companies have enormous incentives to adopt them.

But AV companies rarely share the crash- and safety-related data that is crucial to improving the safety of their vehicles – mostly because they have little incentive to do so.

Is AV safety data an auto company’s intellectual asset or a public good? It can be both – with a little tweaking, according to a team of Cornell researchers.

The team has created a roadmap outlining the barriers and opportunities to encourage AV companies to share the data to make AVs safer, from untangling public versus private data knowledge, to regulations to creating incentive programs.

“The core of AV market competition involves who has that crash data, because once you have that data, it’s much easier for you to train your AI to not make that error. The hope is to first make this data transparent and then use it for public good, and not just profit,” said Hauke Sandhaus, M.S. ’24, a doctoral candidate at Cornell Tech and co-author of “My Precious Crash Data,” published Oct. 16 in ACM on Human-Computer Interaction and presented at the ACM SIGCHI Conference on Computer-Supported Cooperative Work & Social Computing.

His co-authors are Qian Yang, assistant professor at the Cornell Ann S. Bowers College of Computing and Information Science; Wendy Ju, associate professor of information science and design tech at Cornell Tech, the Cornell Ann S. Bowers College of Computing and Information Science and the Jacobs Technion-Cornell Institute; and Angel Hsing-Chi Hwang, a former postdoctoral associate at Cornell and now assistant professor of communication at the University of Southern California, Annenberg.

The team interviewed 12 AV company employees who work on safety in AV design and deployment, to understand how they currently manage and share safety data, the data sharing challenges and concerns they face, and their ideal data-sharing practices.

The interviews revealed the AV companies have a surprising diversity of approaches, Sandhaus said. “Everyone really has some niche, homegrown data set, and there’s really not a lot of shared knowledge between these companies,” he said. “I expected there would be much more commonality.”

The research team discovered two key barriers to sharing data – both underscoring a lack of incentives. First, crash and safety data includes information about the machine-learning models and infrastructure that the company uses to improve safety. “Data sharing, even within a company, is political and fraught,” the team wrote in the paper. Second, the interviewees believed AV safety knowledge is private and brings their company a competitive edge. “This perspective leads them to view safety knowledge embedded in data as a contested space rather than public knowledge for social good,” the team wrote.

And U.S. and European regulations are not helping. They require only information such as the month when the crash occurred, the manufacturer and whether there were injuries. That doesn’t capture the underlying unexpected factors that often cause accidents, such as a person suddenly running onto the street, drivers violating traffic rules, extreme weather conditions or lost cargo blocking the road.

To encourage more data-sharing, it’s crucial to untangle safety knowledge from proprietary data, the researchers said. For example, AV companies could share information about the accident, but not raw video footage that would reveal the company’s technical infrastructure.

Companies could also come up with “exam questions” that AVs would have to pass in order to take the road. “If you have pedestrians coming from one side and vehicles from the other side, then you can use that as a test case that other AVs also have to pass,” Sandhaus said.

Academic institutions could act as data intermediaries with which AV companies could leverage strategic collaborations. Independent research institutions and other civic organizations have set precedents working with industry partners’ public knowledge. “There are arrangements, collaboration, patterns for higher ed to contribute to this without necessarily making the entire data set public,” Qian said.

The team also proposes standardizing AV safety assessment via more effective government regulations. For example, a federal policymaking agency could create a virtual city as a testing ground, with busy traffic intersections and pedestrian-heavy roads that every AV algorithm would have to be able to navigate, she said.

Federal regulators could encourage car companies to contribute scenarios to the testing environment. “The AV companies might say, ‘I want to put my test cases there, because my car probably has passed those tests.’ That can be a mechanism for encouraging safer vehicle development,” Yang said. “Proposing policy changes always feels a little bit distant, but I do think there are near-future policy solutions in this space.”

The research was funded by the National Science Foundation and Schmidt Sciences.

“Cleanest Prose I’ve Ever Seen”

One Writer’s Take on Gemini 3.0

Extensive creative writing tests by ‘The Nerdy Novelist’ – known for its take-no-prisoners evaluation of AI writing – have revealed that Gemini 3.0 is head-and-shoulders above all others when it comes to being the go-to for writers.

Essentially, the author behind the channel – Jason Hamilton – found that no other AI even came close to delivering Gemini 3.0’s exquisite prose when he put each through its paces.

For an in-depth look at how Hamilton came up with his Gemini 3.0 recommendation, check-out this 36-minute video.

In other news and analysis on AI writing:

*ChatGPT Voice: Now Even Easier to Use: ChatGPT’s maker is out with an upgrade to its voice mode, which enables you to talk with ChatGPT without leaving the ChatGPT interface.

Previously, voice users needed to interact with a separate screen if they wanted to use voice.

*Killer Image App Nano Banana Gets an Upgrade: Fresh-off its take-the-world-by-storm campaign as the globe’s most preferred image editor, ‘Nano Banana’ is out with a new ‘Pro’ version.

Officially known as ‘Gemini 3 Pro Image,’ the tool has grabbed the AI image-making crown with its ability to create extremely detailed images, engage in extremely precise editing – and do it all with incredible speed.

Observes writer Abner Li: “The new model is also coming to AI Mode for subscribers in the U.S., while it’s available to paid NotebookLM users globally. Nano Banana Pro will be available in Flow with Google AI Ultra.”

*AI Research Tool Perplexity Adds AI Assistance With Memory: Perplexity is out with a major new feature to its AI research tool, which embeds AI assistants – with memory – into its research mix.

Like many AI tools, Perplexity now remembers key details of your chats on its service in an effort to ensure responses are sharper and more personalized.

The new feature is optional and can be turned-off at any time.

*ChatGPT Competitor Releases Major Upgrade: Anthropic is out with a major update of one of its key AI engines: Claude Opus, now in version 4.5.

Framed as an inexpensive alternative that offers infinite chats, the AI engine has also scored high marks with amped-up reasoning skills.

Anthropic’s AI primarily targets the enterprise market and is known for killer coding capabilities.

*ChatGPT Voice: Now Even Easier to Use: ChatGPT’s maker is out with an upgrade to its voice mode, which enables you to talk with ChatGPT without leaving the ChatGPT interface.

Previously, voice users needed to interact with a separate screen if they wanted to use ChatGPT voice.

Interestingly, voice mode still relies on an older – and some say more creative – mode of ChatGPT to talk: ChatGPT-4.0.

*New AI Singer Number One on Christian Music Chart: Add virtual AI singer Solomon Ray to the increasing number of AI artists who are minting number one song hits.

Marketed as a ‘soul singer,’ the AI has a full album, dubbed “A Soulful Christmas,” with tunes like “Soul To the World” and “Jingle Bell Soul.”

Other AI singers have also been crowding-out mere fleshbags lately with number one hits on the Country charts and R&B charts.

*AI Can Already Eliminate 12% of U.S. Workforce: A new study from MIT finds that AI can already eliminate 12% of everyday jobs.

Dubbed the “Iceberg Index,” the study simulated AI’s ability to handle – or partially handle – nearly 1,000 occupations that are currently worked by more than 150 million in the U.S.

Observes writer Megan Cerullo: “AI is also already doing some of the entry-level jobs that have historically been reserved for recent college graduates or relatively inexperienced workers.”

*He’s No Tool: Show Your New AI ‘Colleague’ Some Respect: A new study finds that 76% of business leaders now see AI as your office ‘colleague’ – and not a tool.

Specifically, those leaders are referring to agentic AI – an advanced form of the tech that can ideally perform a number of tasks to complete a mission without the need of human supervision.

Even so, real-world tests show agents regularly hallucinate, mis-route data or misinterpret a mission’s goals on their way from here- to-there.

*U.S. Congress Seeks Answers on Alleged Chinese AI CyberAttack: The CEO of a major competitor of ChatGPT – Anthropic – will be testifying before the U.S Congress this month about a recent cyberattack that relied on Anthropic AI to infiltrate finance and government servers.

The attack – allegedly orchestrated by Chinese state actors – hacked Anthropic AI’s agentic abilities to penetrate the servers.

Observes writer Sam Sabin: “As AI rapidly intensifies the cyber threat landscape, lawmakers are just starting to wrap their heads around the problem.”

*AI Big Picture: This Generation’s Manhattan Project: The Genesis Mission: The Trump Administration has embraced AI as a key defense initiative in what it is calling “The Genesis Mission.”

Observes writer Chuck Brooks: “This mission is not merely another government program: it represents a bold strategic move that aligns with my belief that science, data, and computing should be regarded as essential components of our national strength rather than optional extras.

“For too long, we have considered science and technology to be secondary to our national strategy. The Genesis Mission reverses that idea.”

Share a Link:  Please consider sharing a link to https://RobotWritersAI.com from your blog, social media post, publication or emails. More links leading to RobotWritersAI.com helps everyone interested in AI-generated writing.

Joe Dysart is editor of RobotWritersAI.com and a tech journalist with 20+ years experience. His work has appeared in 150+ publications, including The New York Times and the Financial Times of London.

Never Miss An Issue
Join our newsletter to be instantly updated when the latest issue of Robot Writers AI publishes
We respect your privacy. Unsubscribe at any time -- we abhor spam as much as you do.

The post “Cleanest Prose I’ve Ever Seen” appeared first on Robot Writers AI.

Page 1 of 573
1 2 3 573