Talk to My Docs: A new AI agent for multi-source knowledge 

Talk to My Docs: A new AI agent for multi-source knowledge 

Navigating a sea of documents, scattered across various platforms, can be a daunting task, often leading to slow decision-making and missed insights. As organizational knowledge and data multiplies, teams that can’t centralize or surface the right information quickly will struggle to make decisions, innovate, and stay competitive.

This blog explores how the new Talk to My Docs (TTMDocs) agent provides a solution to the steep costs of knowledge fragmentation.

The high cost of knowledge fragmentation

Knowledge fragmentation is not just an inconvenience — it’s a hidden cost to productivity, actively robbing your team of time and insight.

  • A survey by Starmind across 1,000+ knowledge workers found that employees only tap into 38% of their available knowledge/expertise because of this fragmentation.
  • Another study by McKinsey & Associates found that knowledge workers spend over a quarter of their time searching for the information they need across different platforms such as Google Drive, Box, or local systems.

The constraints of existing solutions

While there are a few options on the market designed to ease the process of querying across key documents and materials living in a variety of places, many have significant constraints in what they can actually deliver. 

For example:

  • Vendor lock-in can severely hinder the promised experience. Unless you are strictly using the supported integrations of your vendor of choice, which in most instances is unrealistic, you end up with a limited subset of information repositories you can connect to and interact with.
  • Security and compliance considerations add another layer of complexity. If you have access to one platform or documents, you may not need access to another, and any misstep or missed vulnerability can open up your organization to potential risk.

Talk to My Docs takes a different approach

DataRobot’s new Talk to My Docs agent represents a different approach. We provide the developer tools and support you need to build AI solutions that actually work in enterprise contexts. Not as a vendor-controlled service, but as a customizable open-source template you can tailor to your needs.

The differentiation isn’t subtle. With TTMDocs you get:

  • Enterprise security and compliance built in from day one
  • Multi-source connectivity instead of vendor lock-in
  • Zero-trust access control (Respects Existing Permissions)
  • Complete observability through DataRobot platform integration
  • Multi-agent architecture that scales with complexity
  • Full code access and customizability instead of black box APIs
  • Modern infrastructure-as-code for repeatable deployments

What makes Talk to My Docs different

Talk To My Docs is an open-source application template that gives you the intuitive, familiar chat-style experience that modern knowledge workers have come to expect, coupled with the control and customizability you actually need.

This isn’t a SaaS product you subscribe to; but rather a developer-friendly template you can deploy, modify, and make your own.

Multi-source integration and real security

TTMDocs connects to Google Drive, Box, and your local filesystems out of the box, with Sharepoint and JIRA integrations coming soon.

  • Preserve existing controls: We provide out-of-the-box OAuth integration to handle authentication securely through existing credentials. You’re not creating a parallel permission structure to manage—if you don’t have permission to see a document in Google Drive, you won’t see it in TTMDocs either.
  • Meet data where it lives: Unlike vendor-locked solutions, you’re not forced to migrate your document ecosystem. You can seamlessly leverage files stored in structured and unstructured connectors like Google Drive, Box, Confluence, Sharepoint available on the DataRobot platform or upload your files locally.

Multi-agent architecture that scales

TTMDocs uses CrewAI for multi-agent orchestration, so you can have specialized agents handling different aspects of a query.

  • Modular & flexible: The modular architecture means you can also swap in your preferred agentic framework, whether that’s LangGraph, LlamaIndex, or any other, if it better suits your needs.
  • Customizable: Want to change how agents interpret queries? Adjust the prompts. Need custom tools for domain-specific tasks? Add them. Have compliance requirements? Build those guardrails directly into the code.
  • Scalable: As your document collection grows and use cases become more complex, you can add agents with specialized tools and prompts rather than trying to make one agent do everything. For example, one agent might retrieve financial documents, another handle technical specifications, and a third synthesize cross-functional insights.

Enterprise platform integration

Another key aspect of Talk to my Docs is that it integrates with your existing DataRobot infrastructure.

  • Guarded RAG & LLM access: The template includes a Guarded RAG LLM Model for controlled document retrieval and LLM Gateway integration for access to 80+ open and closed-source LLMs.
  • Full observability: Every query is logged. Every retrieval is tracked. Every error is captured. This means you have full tracing and observability through the DataRobot platform, allowing you to actually troubleshoot when something goes wrong.

Modern, modular components

The template is organized into clean, independent pieces that can be developed and deployed separately or as part of the full stack:

ComponentDescription
agent_retrieval_agentMulti-agent orchestration using CrewAI. Core agent logic and query routing.


core
Shared Python logic, common utilities, and functions.
frontend_webReact and Vite web frontend for the user interface.
webFastAPI backend. Manages API endpoints, authentication, and communication.
infraPulumi infrastructure-as-code for provisioning cloud resources.

The power of specialization: Talk to My Docs use cases

The pattern is productionized specialized agents, working together across your existing document sources, with security and observability built in.

Here are a few examples of how this is applied in the enterprise:

  • M&A due diligence: Cross-reference financial statements (Box), legal contracts (Google Drive), and technical documentation (local files). The permission structure ensures only the deal team sees sensitive materials.
  • Clinical trial documentation: Verify trial protocols align with regulatory guidelines across hundreds of documents, flagging inconsistencies before submission.
  • Legal discovery: Search across years of emails, contracts, and memos scattered across platforms, identifying relevant and privileged materials while respecting strict access controls.
  • Product launch readiness: Verify marketing materials, regulatory approvals, and supply chain documentation are aligned across regions and backed by certifications.
  • Insurance claims investigation: Pull policy documents, adjuster notes, and third-party assessments to cross-reference coverage terms and flag potential fraud indicators.
  • Research grant compliance: Cross-reference budget documents, purchase orders, and grant agreements to flag potential compliance issues before audits.

Use case: Clinical trial documentation

The challenge

A biotech company preparing an FDA submission is drowning in documentation spread across multiple systems: FDA guidance in Google Drive, trial protocols in SharePoint, lab reports in Box, and quality procedures locally. The core problem is ensuring consistency across all documents (protocols, safety, quality) before a submission or inspection, which demands a quick, unified view.

How TTMDocs helps

The company deploys a customized healthcare regulatory agent, a unified system that can answer complex compliance questions across all document sources. 

Regulatory agent:

Identifies applicable FDA submission requirements for the specific drug candidate.

image
Clinical review agent:

Reviews trial protocols against industry standards for patient safety and research ethics.

image
Safety compliance agent:

Checks that safety monitoring and adverse event reporting procedures meet FDA timelines.

image
The result

A regulatory team member asks: “What do we need for our submission, and are our safety monitoring procedures up to standard?”

Instead of spending days gathering documents and cross-referencing requirements, they get a structured response within minutes. The system identifies their submission pathway, flags three high-priority gaps in their safety procedures, notes two issues with their quality documentation, and provides a prioritized action plan with specific timelines.

Where to look: The code that makes it happen

The best way to understand TTMDocs is to look at the actual code. The repository is completely open source and available on Github. 

Here are the key places to start exploring:

  • Agent architecture (agent_retrieval_agent/custom_model/agent.py): See how CrewAI coordinates different agents, how prompts are structured, and where you can inject custom behavior.
  • Tool integration (agent_retrieval_agent/custom_model/tool.py): Shows how agents interact with external systems. This is where you’d add custom tools for querying an internal API or processing domain-specific file formats.
  • OAuth and security (web/app/auth/oauth.py): See exactly how authentication works with Google Drive and Box and how your user permissions are preserved throughout the system.
  • Web backend (web/app/): The FastAPI application that ties everything together. You’ll see how the frontend communicates with agents, and how conversations are managed.

The future of enterprise AI is open

Enterprise AI is at an inflection point. The gap between what end-user AI tools can do and what enterprises actually need is growing. Your company is realizing that “good enough” consumer AI products create more problems than they solve when you cannot compromise on enterprise requirements like security, compliance, and integration.

The future isn’t about choosing between convenience and control. It’s about having both. Talk to my Docs puts both the power and the flexibility into your hands, delivering results you can trust.

The code is yours. The possibilities are endless.

Experience the difference. Start building today.

With DataRobot application templates, you’re never locked into rigid black-box systems. Gain a flexible foundation that lets you adapt, experiment, and innovate on your terms. Whether refining existing workflows or creating new AI-powered applications, DataRobot gives you the clarity and confidence to move forward.

Start exploring what’s possible with a free 14-day trial.

The post Talk to My Docs: A new AI agent for multi-source knowledge  appeared first on DataRobot.

Comments are closed.