Page 2 of 562
1 2 3 4 562

Mission-ready AI: Radio intelligence at the edge

This blog will explore how the joint solution from DataRobot and Deepwave — powered by NVIDIA — delivers a secure, high-performance AI stack, purpose-built for air-gapped, on-premises and high-security deployments. This solution ensures agencies can achieve genuine data sovereignty and operational excellence.

The need for autonomous intelligence

AI is evolving rapidly, transforming from simple tools into autonomous agents that can reason, plan, and act. This shift is critical for high-stakes, mission-critical applications such as signals intelligence (SIGINT), where vast RF data streams demand real-time analysis.

Deploying these advanced agents for public and government programs requires a new level of security, speed, and accuracy that traditional RF analysis solutions cannot provide.

Program leaders often find themselves choosing between underperforming, complex solutions that generate technical debt or a single-vendor lock-in. The pressure to deliver next-generation RF intelligence does not subside, leaving operations leaders under pressure to deploy with few options.

The challenge of radio intelligence

Signals intelligence, the real-time collection and analysis of radio frequency (RF) signals, spans both communications (COMINT) and emissions from electronic systems (ELINT). In practice, this often means extracting the content of RF signals — audio, video, or data streams — a process that presents significant challenges for federal agencies.

  • Modern RF signals are highly dynamic and require equally nimble analysis capabilities to keep up.
  • Operations often take place at the edge in contested environments, where manual analysis is too slow and not scalable. 
  • High data rates and signal complexity make RF data extraordinarily difficult to use, and dynamically changing signals require an analysis platform that can adapt in real-time. 

The mission-critical need is for an automated and highly reconfigurable solution that can quickly extract actionable intelligence from these vast amounts of data, ensuring timely, potentially life-saving decision-making and reasoning.

Introducing the Radio Intelligence Agent

To meet this critical need, the Radio Intelligence Agent (RIA) was engineered as an autonomous, proactive intelligence system that transforms raw RF signals into a constantly evolving, context-driven resource. The solution is designed to serve as a smart team member, providing new insights and recommendations that are far beyond search engine capabilities.

What truly sets the RIA apart from current technology is its integrated reasoning capability. Powered by NVIDIA Nemotron reasoning models, the system is capable of synthesizing patterns, flagging anomalies, and recommending actionable responses, effectively bridging the gap between mere information retrieval and operational intelligence.

Developed jointly by DataRobot and Deepwave, and powered by NVIDIA, this AI solution transforms raw RF signals into conversational intelligence, with its entire lifecycle orchestrated by the trusted, integrated control plane of the DataRobot Agent Workforce Platform.

Federal use cases and deployment

The Radio Intelligence Agent is engineered specifically for the stringent demands of federal operations, with every component built for security, compliance, and deployment flexibility.

The power of the RIA solution lies in performing a significant amount of processing at the edge within Deepwave’s AirStack Edge ecosystem. This architecture ensures high-performance processing while maintaining essential security and regulatory compliance. 

The Radio Intelligence Agent solution moves operations teams from simple data collection and analysis to proactive, context-aware intelligence, enabling event prevention instead of event management.  This is a step change in public safety capabilities.

  • Event response optimization: The solution goes beyond simple alerts by acting as a digital advisor during unfolding situations. It analyzes incoming data in real-time, identifies relevant entities and locations, and recommends next-best actions to reduce response time and improve outcomes.
  • Operational awareness: The solution enhances visibility across multiple data streams, including audio and video feeds, as well as sensor inputs, to create a unified view of activity in real-time. This broad monitoring capability reduces cognitive burden and helps teams focus on strategic decision-making rather than manual data analysis.
  • Other applications: RIA’s core capabilities are applicable for scenarios requiring fast, secure, and accurate analysis of massive data streams – including public safety, first responders, and other functions. 

This solution is also portable, supporting local development and testing, with the ability to transition seamlessly into private cloud or FedRAMP-authorized DataRobot-hosted environments for secure production in federal missions.

johannes heel XmLULwMRxcU unsplash (1)

A deeper dive into the Radio Intelligence Agent

Imagine receiving complex RF signals analysis that are trusted, high-fidelity, and actionable in seconds, simply by asking a question.

DataRobot, Deepwave, and NVIDIA teamed up to make this a reality. 

First, Deepwave’s AIR-T edge sensors receive and digitize the RF signals using AirStack software, powered by embedded NVIDIA GPUs.

Then, the newest AirStack component, AirStack Edge, introduces a secure API with FIPS-grade encryption, enabling the deployment of signal processing applications and NVIDIA Riva Speech and Translation AI models directly on AIR-T devices.

This end-to-end process runs securely and in real-time, delivering extracted data content into the agent-based workflows orchestrated by DataRobot.

The solution’s agentic capability is rooted in a sophisticated, two-part system that leverages NVIDIA Llama-3_1-Nemotron-Ultra-253B-v1 to interpret context and generate sophisticated responses.

  • Query Interpreter: This component is responsible for understanding the user’s initial intent, translating the natural language question into a defined information need.
  • Information Retriever: This agent executes the necessary searches, retrieves relevant transcript chunks, and synthesizes the final, cohesive answer by connecting diverse data points and applying reasoning to the retrieved text.


This functionality is delivered through the NVIDIA Streaming Data to RAG solution, which enables real-time ingestion and processing of live RF data streams using GPU-accelerated pipelines.

By leveraging NVIDIA’s optimized vector search and context synthesis, the system allows for fast, secure, and context-driven retrieval and reasoning over radio-transcribed data while ensuring both operational speed and regulatory compliance.

The agent first consults a vector database, which stores semantic embeddings of transcribed audio and sensor metadata, to find the most relevant information before generating a coherent response. The sensor metadata is customizable and contains critical information about signals, including frequency, location, and reception time of the data.

The solution is equipped with several specialized tools that enable this advanced workflow:

  • RF orchestration: The solution can utilize Deepwave’s AirStack Edge orchestration layer to actively recollect new RF intelligence by running new models, recording signals, or broadcasting signals.
  • Search tools: It performs sub-second semantic searches across massive volumes of transcript data.
  • Time parsing tools: Converts human-friendly temporal expressions (e.g., “3 weeks ago”) into precise, searchable timestamps, leveraging the sub-10 nanosecond accuracy published in the metadata.
  • Audit trail: The system maintains a complete audit trail of all queries, tool usage, and data sources, ensuring full traceability and accountability.

NVIDIA Streaming Data to RAG Blueprint  example enables the workflow to move from simple data lookup to autonomous, proactive intelligence. The GPU-accelerated software-defined radio (SDR) pipeline continuously captures, transcribes, and indexes RF signals in real-time, unlocking continuous situational awareness.

Frame 1597881204

DataRobot Agent Workforce Platform: The integrated control plane

The DataRobot Agent Workforce Platform, co-developed with NVIDIA, serves as the agentic pipeline and orchestration layer, the control plane that orchestrates the entire lifecycle. This ensures agencies maintain full visibility and control over every layer of the stack and enforce compliance automatically.

Key functions of the platform include:

  • End-to-end control: Automates the entire AI lifecycle, from development and deployment to monitoring and governance, allowing agencies to field new capabilities faster and more reliably.
  •  Design Architecture: Purpose-built with the NVIDIA Enterprise AI Factory architecture, ensuring the entire stack is validated and production-ready from day one.
  • Data sovereignty: DataRobot’s solution is purpose-built for high-security environments, deploying directly into the agency’s air-gapped or on-premises infrastructure. All processing occurs within the security perimeter, ensuring complete data sovereignty and guaranteeing the agency retains sole control and ownership of its data and operations.

    Crucially, this provides operational autonomy (or sovereignty) over the entire AI stack, as it requires no external providers for the operational hardware or models. This ensures the full AI capability remains within the agency’s controlled domain, free from external dependencies or third-party access.
RIA System Diagram GTC DC 25black bkg@4x
Radio Intelligence Agent Infrastructure Diagram

Specialized collaborations

The solution is a collaboration built on a co-developed and enterprise-grade architecture.


Deepwave: RF AI at the edge

DataRobot integrates with highly skilled, specialized partners like Deepwave, who provide the critical AI edge processing to convert raw RF signal content into RF intelligence and securely share it with DataRobot’s data pipelines. The Deepwave platform extends this solution’s capabilities by enabling the next steps in RF intelligence gathering through the orchestration and automation of RF AI edge tasks.

  • Edge AI processing: The agent uses Deepwave’s high-performance edge computing and AI models to intercept and process RF signals.
  • Reduced infrastructure: Instead of backhauling raw RF data, the solution runs AI models at the edge to extract only the critical information. This reduces network backhaul needs by a factor of 10 million — from 4 Gbps down to just 150 bps per channel — dramatically improving mobility and simplifying the required edge infrastructure.
  • Security: Deepwave’s AirStack Edge leverages the latest FIPS mode encryption to report this data to the DataRobot Agent Workforce Platform securely.
  • Orchestration: Deepwave’s AirStack Edge software orchestrates and automates networks of RF AI edge devices. This enables low-latency responses to RF scenarios, such as detecting and jamming unwanted signals.


NVIDIA: Foundational trust and performance

NVIDIA provides the high-performance and secure foundation necessary for federal missions.

  • Security: AI agents are built with  production-ready NVIDIA NIM™ microservices. These NIM are built from a trusted, STIG-ready base layer and support FIPS mode encryption, making them the essential, pre-validated building blocks for achieving a FedRAMP deployment quickly and securely.

    DataRobot provides an NVIDIA NIM gallery, which enables rapid consumption of accelerated AI models across multiple modalities and domains, including LLM, VLM, CV, embedding, and more, and direct integration into agentic AI solutions that can be deployed anywhere.
  • Reasoning: The agent’s core intelligence is powered by NVIDIA Nemotron models. These AI models with open weights, datasets, and recipes, combined with leading efficiency and accuracy, provide the high-level reasoning and planning capabilities for the agent, enabling it to excel at complex reasoning and instruction-following. It goes beyond simple lookups to connect complex data points, delivering true intelligence, not just data retrieval.
  • Speech & Translation: NVIDIA Riva Speech and Translation, enables real-time speech recognition, translation, and synthesis directly at the edge. By deploying Riva alongside AIR-T and AirStack Edge, audio content extracted from RF signals can be transcribed and translated on-device with low latency. This capability allows SIGINT agents to turn intercepted voice traffic into actionable, multilingual data streams that seamlessly flow into DataRobot’s agentic AI workflows.

A collaborative approach to mission-critical AI

The combined strengths of DataRobot, NVIDIA, and Deepwave create a comprehensive, secure, production-ready solution:

  • DataRobot: End-to-end AI lifecycle orchestration and control.
  • NVIDIA: Aaccelerated GPU infrastructure, optimized software frameworks, validated designs, secure and performant foundation models and microservices.
  • Deepwave: RF sensors with embedded GPU edge processing, secure datalinks, and streamlined orchestration software.

Together, these capabilities power the Radio Intelligence Agent solution, demonstrating how agentic AI, built on the DataRobot Agent Workforce Platform, can bring real-time intelligence to the edge. The result is a trusted, production-ready path to data sovereignty and autonomous, proactive intelligence for the federal mission.

For more information on using RIA to turn RF data into real time insights, visit deepwave.ai/ria.

To learn more about how we can help advance your agency’s AI ambitions, connect with DataRobot federal experts.

The post Mission-ready AI: Radio intelligence at the edge appeared first on DataRobot.

PILLARS OF TOMORROW – A hopeful vision of Government, Technology, and Human Dignity in harmony

For centuries, visions of the future have been dominated by dystopian warnings—cautionary tales of power gone wrong. Five Pillars of Tomorrow offers an alternative: a governance model where transparency, collective wisdom, and advanced AI safeguard human dignity.

PILLARS OF TOMORROW – A hopeful vision of Government, Technology, and Human Dignity in harmony

For centuries, visions of the future have been dominated by dystopian warnings—cautionary tales of power gone wrong. Five Pillars of Tomorrow offers an alternative: a governance model where transparency, collective wisdom, and advanced AI safeguard human dignity.

Breakthrough optical processor lets AI compute at the speed of light

Researchers at Tsinghua University developed the Optical Feature Extraction Engine (OFE2), an optical engine that processes data at 12.5 GHz using light rather than electricity. Its integrated diffraction and data preparation modules enable unprecedented speed and efficiency for AI tasks. Demonstrations in imaging and trading showed improved accuracy, lower latency, and reduced power demand. This innovation pushes optical computing toward real-world, high-performance AI.

Teen builds advanced robotic hand from LEGO parts

A talented teenager from the UK has built a four-fingered robotic hand from standard Lego parts that performs almost as well as research-grade robotic hands. The anthropomorphic device can grasp, move and hold objects with remarkable versatility and human-like adaptability.

Guarding the Digital God: The Race to Secure Artificial Intelligence

For the past several years, the world has been mesmerized by the creative and intellectual power of artificial intelligence (AI). We have watched it generate art, write code, and discover new medicines. Now, as of October 2025, we are handing […]

The post Guarding the Digital God: The Race to Secure Artificial Intelligence appeared first on TechSpective.

Social media round-up from #IROS2025

The 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025) took place from October 19 to 25, 2025 in Hangzhou, China. The programme included plenary and keynote talks, workshops, tutorials, forums, competitions, and a debate. There was also an exhibition where companies and institutions were able to showcase their latest hardware and software.

We cast an eye over the social media platforms to see what participants got up to during the week.

📢 This week, we are participating in the IEEE/RSJ International Conference on Intelligent Robots and Systems #IROS2025 in Hangzhou #China

📸 (IRI researchers right-left): @juliaborrassol.bsky.social, David Blanco-Mulero and Anais Garrell

#IRI #IROSHangzho

[image or embed]

— IRI-Institut de Robòtica i Informàtica Industrial (@iri-robotics.bsky.social) 24 October 2025 at 01:33

Truly enjoyed discussing the consolidation of specialist and generalist approaches to physical AI at #IROS2025.

Hoping to visit Hangzhou in physical rather than digital form myself in the not too distant future – second IROS AC dinner missed in a row.

#Robotics #physicalAI

[image or embed]

— Markus Wulfmeier (@mwulfmeier.bsky.social) 20 October 2025 at 11:25

At #IROS2025 General Chair Professor Hesheng Wang and Program Chair Professor Yi Guo share what makes this year’s conference unique, from the inspiring location to the latest research shaping the future of intelligent robotics. youtu.be/_JzGoH7wilU

[image or embed]

— WebsEdge Science (@websedgescience.bsky.social) 23 October 2025 at 21:32

From Hangzhou, #IROS2025 unites the brightest minds in #Robotics, #AI & intelligent systems to explore the Human–Robotics Frontier. Watch IROS TV for highlights, interviews, and a behind-the-scenes look at the labs shaping our robotic future. youtu.be/SojyPncpH1g

[image or embed]

— WebsEdge Science (@websedgescience.bsky.social) 23 October 2025 at 21:25

Impressive live demonstration by @unitreerobotics.bsky.social #G1 at the #IROS2025 conference! It really works!

[image or embed]

— Davide Scaramuzza (@davidescaramuzza.bsky.social) 23 October 2025 at 15:26

ChatGPT Now Works Inside Gmail, Google Docs

Write With ChatGPT – Without Ever Leaving the Gmail Compose Box

ChatGPT’s maker just rolled-out an incredibly new and powerful feature for email, which enables you to use the AI to write and edit an email directly inside the Gmail compose box.

Already available for Mac users, the new feature – part of the new ChatGPT-powered browser ‘Atlas’ – is promised for Windows users in a few weeks.

A boon for people who spend considerable time cranking out emails each day, the new capability eliminates the need to jump back-and-forth between ChatGPT and Gmail when composing an email with the AI.

Instead, users can simply click open a Gmail compose box to start an email, then click on a tiny ChatGPT logo that appears in the upper left-hand corner to create an email using ChatGPT.

Essentially: No more opening a Gmail compose box on one tab, then logging into ChatGPT and opening a second tab to access ChatGPT — and then cutting-and-pasting text back-and-forth from one tab to the other to come up with an email you want to send

Instead, everything is done for you inline in a single, Gmail compose box.

Even better: You can also use the new feature to highlight text you’ve already created in a Gmail compose box — then edit that text with ChatGPT and then send when you’re satisfied with the results.

Plus, ChatGPT’s Atlas ups-the-ante even further by enabling you to use the same write-in-the-app capability in Google Docs.

And it works the same way: Simply click on a tiny ChatGPT logo that appears when you hover in the top left-hand corner of a Google Doc, enter in a prompt you want the AI to use to write text for you, click enter and ChatGPT writes exactly what you’re looking for – without you ever being forced to go outside the Google Doc for help.

In a phrase: Once word of this stellar new convenience spreads across the Web, it seems certain that there will be a stampede of people embracing the idea of using ChatGPT without ever needing to leave Gmail or Google Docs.

That should especially be the case given that the new feature is currently available on the Mac to all tier levels of ChatGPT, including the ChatGPT free level – with availability to Windows users promised soon.

Here’s how the new auto-writing feature works, step by step:

To Create New Text in Gmail Using ChatGPT In-App:

  1. Open a new compose box in Gmail
  2. Hover over the blinking cursor in the upper left-hand corner in the compose box until a tiny ChatGPT logo appears
  3. Click on the tiny ChatGPT logo
  4. A tiny prompt box appears
  5. Enter in a writing prompt – the same kind of writing prompt you’d ordinarily use to trigger ChatGPT to write an email for you
  6. A drop-down window appears, showcasing the text that ChatGPT just wrote for you
  7. Click Insert to accept the text that ChatGPT wrote from you into your Gmail
  8. Click ‘Send’ to send your email

To Edit Text You’ve Already Written in a Gmail Compose Box:

  1. Highlight the text you’ve already created in the Gmail compose box
  2. A tiny ChatGPT logo appears in the upper left-hand corner of the Gmail compose box
  3. Click on the tiny ChatGPT logo
  4. A prompt box appears
  5. Type your instructions for editing the email in the prompt window
  6. Click Enter
  7. ChatGPT’s edit of your email appears in a drop-down box
  8. Read over the text ChatGPT has edited for you
  9. Click Update to add the edited text to your Gmail
  10. Click Send to send your Gmail.

To Create/Edit Text in Google Docs:

  1. Follow the same prompts above to create or edit in Google Docs

For an extremely excellent, extremely clear video demo of the steps above, click to this video by The Tech Girl and advance to timestamp 4:58 in the video to see what the step-by-step looks like on a PC screen.

Groundbreaking in its own right, the new write-in-the-app capability is one of a flurry of features that come with the new ChatGPT-powered browser Atlas, released a few days ago.

With the release of Atlas, ChatGPT’s maker OpenAI is hoping to capitalize on the 800 million visits made each week to the ChatGPT Web site.

Those visitors represent a motivated, ChatGPT-inspired audience. And OpenAI is hoping that by making its own AI-powered browser available to those people, they’ll abandon Google Chrome and start using Atlas to surf the Web.

Like Perplexity Comet – another new, AI-powered browser looking to carve into the Google Chrome market – Atlas is designed to work like an everyday browser that’s supercharged with AI at its core.

In practice, that means the new Atlas browser –- demoed by ChatGPT maker OpenAI last week — is designed to:

–Turbo-charge many browser actions with ChatGPT
–Offer a left sidebar featuring a history of all your chats with ChatGPT
–Enable you to search your search history using ChatGPT
–Get to know you better by remembering what you’ve done with Atlas in the past
–Offer suggested links for you, based on your previous searches with Atlas
–Work as an AI agent for you and complete multi-step tasks, such as finding news for you on the Web and summarizing each news item for you that includes a hotlink to the original news source
–Engage in Deep Research
–Pivot into using a traditional search engine view while searching
–Enable you to open say 20 Web sites, then analyze and summarize those Web sites with ChatGPT
–Integrate with apps like Canva, Spotify and more
–Auto-summarize a YouTube video for you without the need to generate a transcript of that video

Share a Link:  Please consider sharing a link to https://RobotWritersAI.com from your blog, social media post, publication or emails. More links leading to RobotWritersAI.com helps everyone interested in AI-generated writing.

Joe Dysart is editor of RobotWritersAI.com and a tech journalist with 20+ years experience. His work has appeared in 150+ publications, including The New York Times and the Financial Times of London.

Never Miss An Issue
Join our newsletter to be instantly updated when the latest issue of Robot Writers AI publishes
We respect your privacy. Unsubscribe at any time -- we abhor spam as much as you do.

The post ChatGPT Now Works Inside Gmail, Google Docs appeared first on Robot Writers AI.

Upcoming Major Events List

International Robot Safety Conference (IRSC) – 3-5 November 2025 – Houston, USA – https://www.automate.org/events/international-robot-safety-conference RobotWorld 2025 – 5-8 November 2025 – Goyang (KINTEX), South Korea – http://eng.robotworld.or.kr/ Get Together for Robotics 2025 – 5-6 November 2025 – Nuremberg/Erlangen, Germany – https://www.profibus.com/get-together-for-robotics/ ICRAI 2025 (11th International Conference on Robotics & Artificial Intelligence) – 19-21 December 2025 […]

A common language to describe and assess human–agent teams

Understanding how humans and AI or robotic agents can work together effectively requires a shared foundation for experimentation. A University of Michigan-led team developed a new taxonomy to serve as a common language among researchers, then used it to evaluate current testbeds used to study how human-agent teams will perform.

Agentic AI is a Force Multiplier for the Best Employees

Like it or not, your staff are already using AI. Walk around any modern office, and you’ll likely see Copilot or ChatGPT tucked behind a spreadsheet, an AI summarizer pulling key takeaways from a meeting transcript, or an AI-powered scheduling […]

The post Agentic AI is a Force Multiplier for the Best Employees appeared first on TechSpective.

Using generative AI to diversify virtual training grounds for robots

The “steerable scene generation” system creates digital scenes of things like kitchens, living rooms, and restaurants that engineers can use to simulate lots of real-world robot interactions and scenarios. Image credit: Generative AI image, courtesy of the researchers. See an animated version of the image here.

By Alex Shipps

Chatbots like ChatGPT and Claude have experienced a meteoric rise in usage over the past three years because they can help you with a wide range of tasks. Whether you’re writing Shakespearean sonnets, debugging code, or need an answer to an obscure trivia question, artificial intelligence systems seem to have you covered. The source of this versatility? Billions, or even trillions, of textual data points across the internet.

Those data aren’t enough to teach a robot to be a helpful household or factory assistant, though. To understand how to handle, stack, and place various arrangements of objects across diverse environments, robots need demonstrations. You can think of robot training data as a collection of how-to videos that walk the systems through each motion of a task. Collecting these demonstrations on real robots is time-consuming and not perfectly repeatable, so engineers have created training data by generating simulations with AI (which don’t often reflect real-world physics), or tediously handcrafting each digital environment from scratch.

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Toyota Research Institute may have found a way to create the diverse, realistic training grounds robots need. Their “steerable scene generation” approach creates digital scenes of things like kitchens, living rooms, and restaurants that engineers can use to simulate lots of real-world interactions and scenarios. Trained on over 44 million 3D rooms filled with models of objects such as tables and plates, the tool places existing assets in new scenes, then refines each one into a physically accurate, lifelike environment.

Steerable scene generation creates these 3D worlds by “steering” a diffusion model — an AI system that generates a visual from random noise — toward a scene you’d find in everyday life. The researchers used this generative system to “in-paint” an environment, filling in particular elements throughout the scene. You can imagine a blank canvas suddenly turning into a kitchen scattered with 3D objects, which are gradually rearranged into a scene that imitates real-world physics. For example, the system ensures that a fork doesn’t pass through a bowl on a table — a common glitch in 3D graphics known as “clipping,” where models overlap or intersect.

How exactly steerable scene generation guides its creation toward realism, however, depends on the strategy you choose. Its main strategy is “Monte Carlo tree search” (MCTS), where the model creates a series of alternative scenes, filling them out in different ways toward a particular objective (like making a scene more physically realistic, or including as many edible items as possible). It’s used by the AI program AlphaGo to beat human opponents in Go (a game similar to chess), as the system considers potential sequences of moves before choosing the most advantageous one.

“We are the first to apply MCTS to scene generation by framing the scene generation task as a sequential decision-making process,” says MIT Department of Electrical Engineering and Computer Science (EECS) PhD student Nicholas Pfaff, who is a CSAIL researcher and a lead author on a paper presenting the work. “We keep building on top of partial scenes to produce better or more desired scenes over time. As a result, MCTS creates scenes that are more complex than what the diffusion model was trained on.”

In one particularly telling experiment, MCTS added the maximum number of objects to a simple restaurant scene. It featured as many as 34 items on a table, including massive stacks of dim sum dishes, after training on scenes with only 17 objects on average.

Steerable scene generation also allows you to generate diverse training scenarios via reinforcement learning — essentially, teaching a diffusion model to fulfill an objective by trial-and-error. After you train on the initial data, your system undergoes a second training stage, where you outline a reward (basically, a desired outcome with a score indicating how close you are to that goal). The model automatically learns to create scenes with higher scores, often producing scenarios that are quite different from those it was trained on.

Users can also prompt the system directly by typing in specific visual descriptions (like “a kitchen with four apples and a bowl on the table”). Then, steerable scene generation can bring your requests to life with precision. For example, the tool accurately followed users’ prompts at rates of 98 percent when building scenes of pantry shelves, and 86 percent for messy breakfast tables. Both marks are at least a 10 percent improvement over comparable methods like “MiDiffusion” and “DiffuScene.”

The system can also complete specific scenes via prompting or light directions (like “come up with a different scene arrangement using the same objects”). You could ask it to place apples on several plates on a kitchen table, for instance, or put board games and books on a shelf. It’s essentially “filling in the blank” by slotting items in empty spaces, but preserving the rest of a scene.

According to the researchers, the strength of their project lies in its ability to create many scenes that roboticists can actually use. “A key insight from our findings is that it’s OK for the scenes we pre-trained on to not exactly resemble the scenes that we actually want,” says Pfaff. “Using our steering methods, we can move beyond that broad distribution and sample from a ‘better’ one. In other words, generating the diverse, realistic, and task-aligned scenes that we actually want to train our robots in.”

Such vast scenes became the testing grounds where they could record a virtual robot interacting with different items. The machine carefully placed forks and knives into a cutlery holder, for instance, and rearranged bread onto plates in various 3D settings. Each simulation appeared fluid and realistic, resembling the real-world, adaptable robots steerable scene generation could help train, one day.

While the system could be an encouraging path forward in generating lots of diverse training data for robots, the researchers say their work is more of a proof of concept. In the future, they’d like to use generative AI to create entirely new objects and scenes, instead of using a fixed library of assets. They also plan to incorporate articulated objects that the robot could open or twist (like cabinets or jars filled with food) to make the scenes even more interactive.

To make their virtual environments even more realistic, Pfaff and his colleagues may incorporate real-world objects by using a library of objects and scenes pulled from images on the internet and using their previous work on “Scalable Real2Sim.” By expanding how diverse and lifelike AI-constructed robot testing grounds can be, the team hopes to build a community of users that’ll create lots of data, which could then be used as a massive dataset to teach dexterous robots different skills.

“Today, creating realistic scenes for simulation can be quite a challenging endeavor; procedural generation can readily produce a large number of scenes, but they likely won’t be representative of the environments the robot would encounter in the real world. Manually creating bespoke scenes is both time-consuming and expensive,” says Jeremy Binagia, an applied scientist at Amazon Robotics who wasn’t involved in the paper. “Steerable scene generation offers a better approach: train a generative model on a large collection of pre-existing scenes and adapt it (using a strategy such as reinforcement learning) to specific downstream applications. Compared to previous works that leverage an off-the-shelf vision-language model or focus just on arranging objects in a 2D grid, this approach guarantees physical feasibility and considers full 3D translation and rotation, enabling the generation of much more interesting scenes.”

“Steerable scene generation with post training and inference-time search provides a novel and efficient framework for automating scene generation at scale,” says Toyota Research Institute roboticist Rick Cory SM ’08, PhD ’10, who also wasn’t involved in the paper. “Moreover, it can generate ‘never-before-seen’ scenes that are deemed important for downstream tasks. In the future, combining this framework with vast internet data could unlock an important milestone towards efficient training of robots for deployment in the real world.”

Pfaff wrote the paper with senior author Russ Tedrake, the Toyota Professor of Electrical Engineering and Computer Science, Aeronautics and Astronautics, and Mechanical Engineering at MIT; a senior vice president of large behavior models at the Toyota Research Institute; and CSAIL principal investigator. Other authors were Toyota Research Institute robotics researcher Hongkai Dai SM ’12, PhD ’16; team lead and Senior Research Scientist Sergey Zakharov; and Carnegie Mellon University PhD student Shun Iwase. Their work was supported, in part, by Amazon and the Toyota Research Institute. The researchers presented their work at the Conference on Robot Learning (CoRL) in September.

Page 2 of 562
1 2 3 4 562