Page 1 of 575
1 2 3 575

Talk to My Docs: A new AI agent for multi-source knowledge 

Navigating a sea of documents, scattered across various platforms, can be a daunting task, often leading to slow decision-making and missed insights. As organizational knowledge and data multiplies, teams that can’t centralize or surface the right information quickly will struggle to make decisions, innovate, and stay competitive.

This blog explores how the new Talk to My Docs (TTMDocs) agent provides a solution to the steep costs of knowledge fragmentation.

The high cost of knowledge fragmentation

Knowledge fragmentation is not just an inconvenience — it’s a hidden cost to productivity, actively robbing your team of time and insight.

  • A survey by Starmind across 1,000+ knowledge workers found that employees only tap into 38% of their available knowledge/expertise because of this fragmentation.
  • Another study by McKinsey & Associates found that knowledge workers spend over a quarter of their time searching for the information they need across different platforms such as Google Drive, Box, or local systems.

The constraints of existing solutions

While there are a few options on the market designed to ease the process of querying across key documents and materials living in a variety of places, many have significant constraints in what they can actually deliver. 

For example:

  • Vendor lock-in can severely hinder the promised experience. Unless you are strictly using the supported integrations of your vendor of choice, which in most instances is unrealistic, you end up with a limited subset of information repositories you can connect to and interact with.
  • Security and compliance considerations add another layer of complexity. If you have access to one platform or documents, you may not need access to another, and any misstep or missed vulnerability can open up your organization to potential risk.

Talk to My Docs takes a different approach

DataRobot’s new Talk to My Docs agent represents a different approach. We provide the developer tools and support you need to build AI solutions that actually work in enterprise contexts. Not as a vendor-controlled service, but as a customizable open-source template you can tailor to your needs.

The differentiation isn’t subtle. With TTMDocs you get:

  • Enterprise security and compliance built in from day one
  • Multi-source connectivity instead of vendor lock-in
  • Zero-trust access control (Respects Existing Permissions)
  • Complete observability through DataRobot platform integration
  • Multi-agent architecture that scales with complexity
  • Full code access and customizability instead of black box APIs
  • Modern infrastructure-as-code for repeatable deployments

What makes Talk to My Docs different

Talk To My Docs is an open-source application template that gives you the intuitive, familiar chat-style experience that modern knowledge workers have come to expect, coupled with the control and customizability you actually need.

This isn’t a SaaS product you subscribe to; but rather a developer-friendly template you can deploy, modify, and make your own.

Multi-source integration and real security

TTMDocs connects to Google Drive, Box, and your local filesystems out of the box, with Sharepoint and JIRA integrations coming soon.

  • Preserve existing controls: We provide out-of-the-box OAuth integration to handle authentication securely through existing credentials. You’re not creating a parallel permission structure to manage—if you don’t have permission to see a document in Google Drive, you won’t see it in TTMDocs either.
  • Meet data where it lives: Unlike vendor-locked solutions, you’re not forced to migrate your document ecosystem. You can seamlessly leverage files stored in structured and unstructured connectors like Google Drive, Box, Confluence, Sharepoint available on the DataRobot platform or upload your files locally.

Multi-agent architecture that scales

TTMDocs uses CrewAI for multi-agent orchestration, so you can have specialized agents handling different aspects of a query.

  • Modular & flexible: The modular architecture means you can also swap in your preferred agentic framework, whether that’s LangGraph, LlamaIndex, or any other, if it better suits your needs.
  • Customizable: Want to change how agents interpret queries? Adjust the prompts. Need custom tools for domain-specific tasks? Add them. Have compliance requirements? Build those guardrails directly into the code.
  • Scalable: As your document collection grows and use cases become more complex, you can add agents with specialized tools and prompts rather than trying to make one agent do everything. For example, one agent might retrieve financial documents, another handle technical specifications, and a third synthesize cross-functional insights.

Enterprise platform integration

Another key aspect of Talk to my Docs is that it integrates with your existing DataRobot infrastructure.

  • Guarded RAG & LLM access: The template includes a Guarded RAG LLM Model for controlled document retrieval and LLM Gateway integration for access to 80+ open and closed-source LLMs.
  • Full observability: Every query is logged. Every retrieval is tracked. Every error is captured. This means you have full tracing and observability through the DataRobot platform, allowing you to actually troubleshoot when something goes wrong.

Modern, modular components

The template is organized into clean, independent pieces that can be developed and deployed separately or as part of the full stack:

ComponentDescription
agent_retrieval_agentMulti-agent orchestration using CrewAI. Core agent logic and query routing.


core
Shared Python logic, common utilities, and functions.
frontend_webReact and Vite web frontend for the user interface.
webFastAPI backend. Manages API endpoints, authentication, and communication.
infraPulumi infrastructure-as-code for provisioning cloud resources.

The power of specialization: Talk to My Docs use cases

The pattern is productionized specialized agents, working together across your existing document sources, with security and observability built in.

Here are a few examples of how this is applied in the enterprise:

  • M&A due diligence: Cross-reference financial statements (Box), legal contracts (Google Drive), and technical documentation (local files). The permission structure ensures only the deal team sees sensitive materials.
  • Clinical trial documentation: Verify trial protocols align with regulatory guidelines across hundreds of documents, flagging inconsistencies before submission.
  • Legal discovery: Search across years of emails, contracts, and memos scattered across platforms, identifying relevant and privileged materials while respecting strict access controls.
  • Product launch readiness: Verify marketing materials, regulatory approvals, and supply chain documentation are aligned across regions and backed by certifications.
  • Insurance claims investigation: Pull policy documents, adjuster notes, and third-party assessments to cross-reference coverage terms and flag potential fraud indicators.
  • Research grant compliance: Cross-reference budget documents, purchase orders, and grant agreements to flag potential compliance issues before audits.

Use case: Clinical trial documentation

The challenge

A biotech company preparing an FDA submission is drowning in documentation spread across multiple systems: FDA guidance in Google Drive, trial protocols in SharePoint, lab reports in Box, and quality procedures locally. The core problem is ensuring consistency across all documents (protocols, safety, quality) before a submission or inspection, which demands a quick, unified view.

How TTMDocs helps

The company deploys a customized healthcare regulatory agent, a unified system that can answer complex compliance questions across all document sources. 

Regulatory agent:

Identifies applicable FDA submission requirements for the specific drug candidate.

image
Clinical review agent:

Reviews trial protocols against industry standards for patient safety and research ethics.

image
Safety compliance agent:

Checks that safety monitoring and adverse event reporting procedures meet FDA timelines.

image
The result

A regulatory team member asks: “What do we need for our submission, and are our safety monitoring procedures up to standard?”

Instead of spending days gathering documents and cross-referencing requirements, they get a structured response within minutes. The system identifies their submission pathway, flags three high-priority gaps in their safety procedures, notes two issues with their quality documentation, and provides a prioritized action plan with specific timelines.

Where to look: The code that makes it happen

The best way to understand TTMDocs is to look at the actual code. The repository is completely open source and available on Github. 

Here are the key places to start exploring:

  • Agent architecture (agent_retrieval_agent/custom_model/agent.py): See how CrewAI coordinates different agents, how prompts are structured, and where you can inject custom behavior.
  • Tool integration (agent_retrieval_agent/custom_model/tool.py): Shows how agents interact with external systems. This is where you’d add custom tools for querying an internal API or processing domain-specific file formats.
  • OAuth and security (web/app/auth/oauth.py): See exactly how authentication works with Google Drive and Box and how your user permissions are preserved throughout the system.
  • Web backend (web/app/): The FastAPI application that ties everything together. You’ll see how the frontend communicates with agents, and how conversations are managed.

The future of enterprise AI is open

Enterprise AI is at an inflection point. The gap between what end-user AI tools can do and what enterprises actually need is growing. Your company is realizing that “good enough” consumer AI products create more problems than they solve when you cannot compromise on enterprise requirements like security, compliance, and integration.

The future isn’t about choosing between convenience and control. It’s about having both. Talk to my Docs puts both the power and the flexibility into your hands, delivering results you can trust.

The code is yours. The possibilities are endless.

Experience the difference. Start building today.

With DataRobot application templates, you’re never locked into rigid black-box systems. Gain a flexible foundation that lets you adapt, experiment, and innovate on your terms. Whether refining existing workflows or creating new AI-powered applications, DataRobot gives you the clarity and confidence to move forward.

Start exploring what’s possible with a free 14-day trial.

The post Talk to My Docs: A new AI agent for multi-source knowledge  appeared first on DataRobot.

Robot Talk Episode 136 – Making driverless vehicles smarter, with Shimon Whiteson

Claire chatted to Shimon Whiteson from Waymo about machine learning for autonomous vehicles.

Shimon Whiteson is a Professor of Computer Science at the University of Oxford and a Senior Staff Research Scientist at Waymo UK. His research focuses on deep reinforcement learning and imitation learning, with applications in robotics and video games. He completed his doctorate at the University of Texas at Austin in 2007. He spent eight years as an Assistant and then an Associate Professor at the University of Amsterdam before joining Oxford as an Associate Professor in 2015. His spin-out company Latent Logic was acquired by Waymo in 2019.

Classical Indian dance inspires new ways to teach robots how to use their hands

Researchers at the University of Maryland, Baltimore County (UMBC) have extracted the building blocks of precise hand gestures used in the classical Indian dance form Bharatanatyam—and found a richer "alphabet" of movement compared to natural grasps. The work could improve how we teach hand movements to robots and offer humans better tools for physical therapy.

Enterprise AI World 2025 Notes from the Field: Evolving AI from Chatbots to Colleagues That Make An Impact

Enterprise AI World 2025, co-located with KMWorld 2025, offered a clear signal this year: the era of “drop a chatbot on the intranet and call it transformation” is over. The conversations shifted toward AI that sits inside real work—capturing tacit […]

The post Enterprise AI World 2025 Notes from the Field: Evolving AI from Chatbots to Colleagues That Make An Impact appeared first on TechSpective.

‘OCTOID,’ a soft robot that changes color and moves like an octopus

Underwater octopuses change their body color and texture in the blink of an eye to blend perfectly into their surroundings when evading predators or capturing prey. They transform their bodies to match the colors of nearby corals or seaweed, turning blue or red, and move by softly curling their arms or snatching prey.

Teaching robot policies without new demonstrations: interview with Jiahui Zhang and Jesse Zhang

The ReWiND method, which consists of three phases: learning a reward function, pre-training, and using the reward function and pre-trained policy to learn a new language-specified task online.

In their paper ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations, which was presented at CoRL 2025, Jiahui Zhang, Yusen Luo, Abrar Anwar, Sumedh A. Sontakke, Joseph J. Lim, Jesse Thomason, Erdem Bıyık and Jesse Zhang introduce a framework for learning robot manipulation tasks solely from language instructions without per-task demonstrations. We asked Jiahui Zhang and Jesse Zhang to tell us more.

What is the topic of the research in your paper, and what problem were you aiming to solve?

Our research addresses the problem of enabling robot manipulation policies to solve novel, language-conditioned tasks without collecting new demonstrations for each task. We begin with a small set of demonstrations in the deployment environment, train a language-conditioned reward model on them, and then use that learned reward function to fine-tune the policy on unseen tasks, with no additional demonstrations required.

Tell us about ReWiND – what are the main features and contributions of this framework?

ReWiND is a simple and effective three-stage framework designed to adapt robot policies to new, language-conditioned tasks without collecting new demonstrations. Its main features and contributions are:

  1. Reward function learning in the deployment environment
    We first learn a reward function using only five demonstrations per task from the deployment environment.

    • The reward model takes a sequence of images and a language instruction, and predicts per-frame progress from 0 to 1, giving us a dense reward signal instead of sparse success/failure.
    • To expose the model to both successful and failed behaviors without having to collect failed behavior demonstrations, we introduce a video rewind augmentation: For a video segmentation V(1:t), we choose an intermediate point t1. We reverse the segment V(t1:t) to create V(t:t1) and append it back to the original sequence. This generates a synthetic sequence that resembles “making progress then undoing progress,” effectively simulating failed attempts.
    • This allows the reward model to learn a smoother and more accurate dense reward signal, improving generalization and stability during policy learning.
  2. Policy pre-training with offline RL
    Once we have the learned reward function, we use it to relabel the small demonstration dataset with dense progress rewards. We then train a policy offline using these relabeled trajectories.
  3. Policy fine-tuning in the deployment environment
    Finally, we adapt the pre-trained policy to new, unseen tasks in the deployment environment. We freeze the reward function and use it as the feedback for online reinforcement learning. After each episode, the newly collected trajectory is relabeled with dense rewards from the reward model and added to the replay buffer. This iterative loop allows the policy to continually improve and adapt to new tasks without requiring any additional demonstrations.

Could you talk about the experiments you carried out to test the framework?

We evaluate ReWiND in both the MetaWorld simulation environment and the Koch real-world setup. Our analysis focuses on two aspects: the generalization ability of the reward model and the effectiveness of policy learning. We also compare how well different policies adapt to new tasks under our framework, demonstrating significant improvements over state-of-the-art methods.

(Q1) Reward generalization – MetaWorld analysis
We collect a metaworld dataset in 20 training tasks, each task include 5 demos, and 17 related but unseen tasks for evaluation. We train the reward function with the metaworld dataset and a subset of the OpenX dataset.

We compare ReWiND to LIV[1], LIV-FT, RoboCLIP[2], VLC[3], and GVL[4]. For generalization to unseen tasks, we use video–language confusion matrices. We feed the reward model video sequences paired with different language instructions and expect the correctly matched video–instruction pairs to receive the highest predicted rewards. In the confusion matrix, this corresponds to the diagonal entries having the strongest (darkest) values, indicating that the reward function reliably identifies the correct task description even for unseen tasks.

Video-language reward confusion matrix. See the paper for more information.

For demo alignment, we measure the correlation between the reward model’s predicted progress and the actual time steps in successful trajectories using Pearson r and Spearman ρ. For policy rollout ranking, we evaluate whether the reward function correctly ranks failed, near-success, and successful rollouts. Across these metrics, ReWiND significantly outperforms all baselines—for example, it achieves 30% higher Pearson correlation and 27% higher Spearman correlation than VLC on demo alignment, and delivers about 74% relative improvement in reward separation between success categories compared with the strongest baseline LIV-FT.

(Q2) Policy learning in simulation (MetaWorld)
We pre-train on the same 20 tasks and then evaluate RL on 8 unseen MetaWorld tasks for 100k environment steps.

Using ReWiND rewards, the policy achieves an interquartile mean (IQM) success rate of approximately 79%, representing a ~97.5% improvement over the best baseline. It also demonstrates substantially better sample efficiency, achieving higher success rates much earlier in training.

(Q3) Policy learning in real robot (Koch bimanual arms)
Setup: a real-world tabletop bimanual Koch v1.1 system with five tasks, including in-distribution, visually cluttered, and spatial-language generalization tasks.
We use 5 demos for the reward model and 10 demos for the policy in this more challenging setting. With about 1 hour of real-world RL (~50k env steps), ReWiND improves average success from 12% → 68% (≈5× improvement), while VLC only goes from 8% → 10%.

Are you planning future work to further improve the ReWiND framework?

Yes, we plan to extend ReWiND to larger models and further improve the accuracy and generalization of the reward function across a broader range of tasks. In fact, we already have a workshop paper extending ReWiND to larger-scale models.

In addition, we aim to make the reward model capable of directly predicting success or failure, without relying on the environment’s success signal during policy fine-tuning. Currently, even though ReWiND provides dense rewards, we still rely on the environment to indicate whether an episode has been successful. Our goal is to develop a fully generalizable reward model that can provide both accurate dense rewards and reliable success detection on its own.

References

[1] Yecheng Jason Ma et al. “Liv: Language-image representations and rewards for robotic control.” International Conference on Machine Learning. PMLR, 2023.
[2] Sumedh Sontakke et al. “Roboclip: One demonstration is enough to learn robot policies.” Advances in Neural Information Processing Systems 36 (2023): 55681-55693.
[3] Minttu Alakuijala et al. “Video-language critic: Transferable reward functions for language-conditioned robotics.” arXiv:2405.19988 (2024).
[4] Yecheng Jason Ma et al. “Vision language models are in-context value learners.” The Thirteenth International Conference on Learning Representations. 2024.

About the authors

Jiahui Zhang is a Ph.D. student in Computer Science at the University of Texas at Dallas, advised by Prof. Yu Xiang. He received his M.S. degree from the University of Southern California, where he worked with Prof. Joseph Lim and Prof. Erdem Bıyık.

Jesse Zhang is a postdoctoral researcher at the University of Washington, advised by Prof. Dieter Fox and Prof. Abhishek Gupta. He completed his Ph.D. at the University of Southern California, advised by Prof. Jesse Thomason and Prof. Erdem Bıyık at USC, and Prof. Joseph J. Lim at KAIST.

Aerial microrobot can fly as fast as a bumblebee

In the future, tiny flying robots could be deployed to aid in the search for survivors trapped beneath the rubble after a devastating earthquake. Like real insects, these robots could flit through tight spaces larger robots can't reach, while simultaneously dodging stationary obstacles and pieces of falling rubble.

New control system teaches soft robots the art of staying safe

Imagine having a continuum soft robotic arm bend around a bunch of grapes or broccoli, adjusting its grip in real time as it lifts the object. Unlike traditional rigid robots that generally aim to avoid contact with the environment as much as possible and stay far away from humans for safety reasons, this arm senses subtle forces, stretching and flexing in ways that mimic more of the compliance of a human hand. Its every motion is calculated to avoid excessive force while achieving the task efficiently.

New robotic eyeball could enhance visual perception of embodied AI

Embodied artificial intelligence (AI) systems are robotic agents that rely on machine learning algorithms to sense their surroundings, plan their actions and execute them. A key aspect of these systems are visual perception modules, which allow them to analyze images captured by cameras and interpret them.

Researchers develop new method for modeling complex sensor systems

A research team at Kumamoto University (Japan) has unveiled a new mathematical framework that makes it possible to accurately model systems using multiple sensors that operate at different sensing rates. This breakthrough could pave the way for safer autonomous vehicles, smarter robots, and more reliable sensor networks.
Page 1 of 575
1 2 3 575