Page 1 of 599
1 2 3 599

SAP AI Agents: How Enterprises Are Deploying Agentic AI on SAP?

SAP AI Agents: How Enterprises Are Deploying Agentic AI on SAP?

The Problem That Brought You Here

Your SAP environment runs the core of the business — procurement, inventory, production planning, finance. And now leadership is asking what AI can actually do on top of it. Not a demo. Not a proof of concept. Something that runs in production and solves a real bottleneck.

SAP AI agents are the answer a growing number of enterprise IT and operations teams are landing on. This article explains what they are, where they are being deployed today, and what it takes to put one into a live SAP environment.

USM Business Systems is a specialized SAP AI delivery partner based in Ashburn, VA. We place SAP BTP AI developers, AI Core engineers, and enterprise LLM integration specialists inside enterprises and system integrators executing SAP AI programs.

What Is a SAP AI Agent?

An AI agent is software that perceives its environment, reasons about a goal, takes actions, and checks results — without a human directing each step. When that environment is SAP, the agent reads SAP data, calls SAP APIs or workflows, interprets the output, and acts again.

SAP has built AI agent infrastructure directly into its platform. SAP Joule, the AI copilot embedded across S/4HANA, BTP, and SAP Analytics Cloud, uses an agentic architecture under the hood. Developers can extend it using SAP AI Core, the managed AI runtime where custom models and agents are deployed and governed at enterprise scale.

The practical result is an agent that can, for example, monitor a supplier’s delivery performance in SAP, flag an anomaly, cross-reference historical data, draft a purchase order adjustment, and route it for approval — without a procurement analyst touching it.

Where Enterprises Are Deploying SAP AI Agents Today?

  • Procurement and Supplier Intelligence

Agents monitor supplier delivery windows, contract compliance, and pricing variances inside SAP Ariba and S/4HANA. When a pattern signals risk — a supplier consistently shipping 4 days late on a specific SKU category — the agent flags it, pulls the relevant contract terms, and surfaces a recommended action. Procurement teams report 60-70% reductions in manual monitoring time after deploying these agents [Gartner, 2024 Supply Chain AI Survey].

  • Production Scheduling and Capacity Planning

In manufacturing environments, agents integrated with SAP PP (Production Planning) adjust schedules dynamically based on real-time inventory levels, machine availability, and demand signals from SAP IBP. The agent doesn’t replace the planner — it does the 45 minutes of data gathering and cross-referencing that used to happen before every planning decision.

  • Finance and Accounts Payable Automation

Agents working in SAP Finance match invoices against purchase orders, flag discrepancies above a defined threshold, and route exceptions to the right reviewer. Companies using this pattern report 80%+ straight-through processing rates on standard invoices within 90 days of deployment [McKinsey, 2024 Finance AI Report].

  • Inventory and Demand Signal Processing

Agents read point-of-sale signals, seasonal demand patterns, and supplier lead times from SAP, then recommend reorder quantities and safety stock adjustments. This is particularly high-value in food production and retail distribution where demand volatility is high and the cost of stockouts is immediate.

  • What is the difference between SAP Joule and a custom SAP AI agent?

SAP Joule is SAP’s native AI copilot — it works within SAP’s defined interaction patterns and covers general tasks across S/4HANA, SAP SuccessFactors, and other SAP applications. A custom SAP AI agent is built to solve a specific workflow problem in your environment, using SAP AI Core or SAP BTP as the infrastructure. Custom agents handle tasks Joule does not cover natively and can integrate with non-SAP data sources inside the same workflow.

  • Do SAP AI agents require a full BTP implementation to deploy?

Not necessarily. Agents that work purely within S/4HANA APIs can be deployed with targeted BTP services rather than a full BTP platform rollout. The right architecture depends on where your data lives, what your agent needs to access, and your existing SAP landscape. A scoping conversation typically takes 30 minutes to map this out.

What Makes SAP AI Agent Deployments Fail?

Most SAP AI agent projects that stall do so for one of three reasons:

  • The agent was built without a clean data feed. Agents that read SAP master data often encounter inconsistent coding, missing fields, or legacy data structures that were never cleaned because no one needed them to be. The agent surfaces the problem immediately.
  • The workflow boundary was too broad at the start. ‘Automate procurement’ is not an agent design. ‘Monitor supplier on-time delivery for the top 50 SKUs and flag variance above 10%’ is. Scoping matters more here than in almost any other AI project type.
  • The team building it did not have SAP AI Core experience. Standard ML engineering skills do not transfer cleanly to SAP’s AI infrastructure. SAP AI Core has its own API patterns, lifecycle management approach, and governance requirements. Engineers who have not worked inside it add 4-8 weeks of ramp time to every deployment.

What a SAP AI Agent Deployment Actually Looks Like

A typical first agent deployment for a mid-to-large SAP environment follows this sequence:

  • Week 1-2: Workflow scoping. Identify the specific process, the SAP modules involved, the data fields the agent needs to read, and the action it will take on completion.
  • Week 3-4: Data readiness assessment. Confirm that the relevant SAP master data and transactional data are clean enough for the agent to reason accurately. Identify gaps.
  • Week 5-8: Build and test in SAP AI Core. Deploy the agent model, connect to SAP APIs, build the agentic loop, run on historical data.
  • Week 9-10: Controlled live run. Agent runs in parallel with the existing manual process. Outputs are compared. Confidence thresholds are tuned.
  • Week 11-12: Production deployment with monitoring. Agent goes live. A dashboard tracks decision volume, exception rate, and accuracy. A human review loop handles edge cases.

Why USM Business Systems?

USM Business Systems is a CMMi Level 3, Oracle Gold Partner AI and IT services firm headquartered in Ashburn, VA. With 1,000+ engineers, 2,000+ delivered applications, and 27 years of enterprise delivery experience, USM specialises in AI implementation for supply chain, pharma, manufacturing, and SAP environments. Our SAP AI practice places specialized engineers inside enterprise programs within days — on contract, as dedicated delivery pods, or on a project basis.

Ready to put SAP AI into production? Book a 30-minute scoping call with our SAP AI team at usmsystems.com.

FAQ

What SAP modules are most commonly used with AI agents?

SAP S/4HANA, SAP Ariba, SAP IBP, SAP PP, SAP Finance, and SAP Datasphere are the most active areas. The agent infrastructure runs on SAP AI Core and BTP regardless of which module the agent is reading or acting on.

How long does a first SAP AI agent deployment take?

A well-scoped first agent typically reaches production in 10-14 weeks. Projects that try to automate too broad a workflow or that start with messy master data take longer.

Do we need to train a model from scratch?

Most SAP AI agent deployments use pre-trained LLMs or SAP’s foundation models as the reasoning layer, fine-tuned or prompted for the specific workflow. Training from scratch is rarely necessary and significantly extends timelines.

Can SAP AI agents work with non-SAP systems in the same workflow?

Yes. SAP AI Core supports external API connections, so an agent can read a SAP data source, call a third-party logistics API, and write a result back to SAP in the same workflow loop.

What governance controls exist for SAP AI agents?

SAP AI Core includes lifecycle management, model versioning, audit logging, and role-based access. Agents deployed in regulated industries like pharma can be configured to require human approval above defined thresholds before taking action.

Get In Touch!

[contact-form-7]

Sorry, No Fleshbags

Social Network for AI Agents Only Snapped-up by Mark Zuckerberg

Meta CEO Mark Zuckerberg has acquired a social network designed for AI agents only – no humans allowed.

Essentially, AI agents interact, talk and commiserate with one another on the text-based network – dubbed Moltbook – much like humans do on other social networks.

As for Moltbook’s human inventors: They got a lucky break with the sale.

Observes Reuters: “The deal will bring Moltbook co-founders Matt Schlicht and Ben Parr into Meta Superintelligence Labs.”

In other news and analysis on AI writing:

*ChatGPT Promising to Add AI Sora Video Maker: Long considered one of the most advanced video makers on the planet, the Sora video maker is promised to show up as a new feature for ChatGPT soon.

Observes writer Viktor Eriksson: “Sora is impressive. Not only is it more realistic with advanced movements and physics, but last October it gained the ability to ‘insert people’ into its videos.”

*AI Filmmaking: With the Latest Tools, You’re Writer, Director and Cinematographer: Hollywood’s fears that AI will someday render movie studios irrelevant seem more urgent than ever.

These days, the latest tools enable someone with a fresh imagination to become writer, director and cinematographer — and do it on the cheap.

TV producer Matt Zien, for example, says he recently cranked-out a 12-minute short film using AI tools. It cost in the low thousands of dollars to create – rather than the millions that a Hollywood studio would have charged.

*Photoshop Gets an AI Assistant: Photoshop novices just got a leg-up with the roll-out of the tool’s new AI assistant: You can now use natural language in Photoshop to add special effects, make an easy crop, punch-up shadows and more.

Observes writer Ivan Mehta: “Adobe said that paid users of Photoshop will be able to create unlimited generations with the AI assistant through April 9 — and free users will get 20 generations to start with.”

Looks like creating supplemental images for your blog or other digital property just got a whole lot easier.

*Zoom’s Answer to Boring Meetings: Send Your AI Avatar Instead: Video meeting service provider Zoom is promising to add AI avatars to its solution, which you’ll be able to send to all those insufferable online meetings in your place.

Observes writer Ivan Mehta: “The AI avatars, announced last year, are the long-anticipated photorealistic avatars that can mimic your appearance, expressions, and lip and eye movements.

“Designed to mime your actions when you’re not “camera-ready,” Zoom says the avatars will work in online meetings as well as in its asynchronous video messaging product.”

*LegalZoom Legal Advice Now Available in ChatGPT: Long-time legal advisor LegalZoom is now available within ChatGPT for users looking for business advice backed by a deep understanding of the law.

Observes Jeff Stibel, CEO, LegalZoom: “LegalZoom provides the expertise and clarity to help small business owners go from idea to action.

“Backed by attorney expertise, we’re making legal guidance and accountability even more accessible, when and where they need it.”

*Gemini Gets Tighter Integration with Google Workspace Suite: Google is out with a new upgrade to Gemini designed to ensure the ChatGPT competitor is more tightly integrated with Google Docs, Sheets, Slides and Drive.

Observes Yulie Kwon Kim, VP product/workspace: “Today we are re-imagining how people create content.”

Click here for the blow-by-blow that backs-up Kim’s statement.

*Microsoft Copilot Adds New AI Agent Module, Cowork: Seems like every time you turn around, Microsoft is giving its Copilot chatbot an agentic upgrade.

This time, it’s adding ‘Copilot Cowork’ to its bag of tricks, which promises to trigger AI agent work on Copilot to be more proactive and independent.

The key benefit with the upgrade: The ability of ‘Copilot Cowork’ to work with many Microsoft apps simultaneously – rather than being tied to just one app at a time.

*Oops: Grammarly Deep-Sixes ‘Expert Review’ After Fierce Backlash: Turns-out, more than a few authors and writers were livid after discovering that Grammarly was poaching their thinking and writing styles to offer ‘expert reviews’ of writing put together by Grammarly users.

Observes Analytics Insight: “The feature provided users with writing advice as if it were coming from well-known experts, quickly raising concerns about misrepresentation and identity misuse.

“Grammarly said it is reviewing the feature’s design and considering changes.”

*AI Big Picture: Get AI to Do Your Taxes? Maybe Not: While AI may indeed cure cancer one day, for now, better not unleash it on your taxes.

A recent test of the top AI chatbots on the planet by The New York Times found that the AIs were simply no good at doing taxes.

Equally disappointing were Gemini, ChatGPT, Claude and Grok.

Share a Link:  Please consider sharing a link to https://RobotWritersAI.com from your blog, social media post, publication or emails. More links leading to RobotWritersAI.com helps everyone interested in AI-generated writing.

Joe Dysart is editor of RobotWritersAI.com and a tech journalist with 20+ years experience. His work has appeared in 150+ publications, including The New York Times and the Financial Times of London.

Never Miss An Issue
Join our newsletter to be instantly updated when the latest issue of Robot Writers AI publishes
We respect your privacy. Unsubscribe at any time -- we abhor spam as much as you do.

The post Sorry, No Fleshbags appeared first on Robot Writers AI.

Scientists discover AI can make humans more creative

Artificial intelligence is often portrayed as a tool that replaces human work, but new research from Swansea University suggests a far more exciting role: creative collaborator. In a large study with more than 800 participants designing virtual cars, researchers found that AI-generated design galleries sparked deeper engagement, longer exploration, and better results.

New chip lets robots see in 4D by tracking distance and speed simultaneously

Current vision systems for robots and drones rely on 3D sensors that, although powerful, do not always keep up with the fast-paced, unpredictable movement of the real world. These systems often struggle to measure speed instantly or are too bulky and expensive for everyday use. Now, in a paper published in the journal Nature, scientists report how they have developed a 4D imaging sensor on a chip that creates 3D maps of an environment while simultaneously tracking the speed of moving objects.

Canine companion insights help robots locate objects with an 89% success rate

Whether in the kitchen or on a workshop floor, robot assistants that can fetch items for people could be extremely useful. Now, a team of Brown University researchers has developed a way of making robots better at figuring out exactly which items a user might want them to retrieve.

Robot Talk Episode 148 – Ethical robot behaviour, with Alan Winfield

Claire chatted to Alan Winfield from the University of the West of England about developing new standards for ethics and transparency in robotics.

Alan Winfield is Professor of Robot Ethics at the University of the West of England (UWE), Visiting Professor at the University of York, and Associate Fellow of the Cambridge Centre for the Future of Intelligence. Alan co-founded the Bristol Robotics Laboratory, where his research is focussed on the science, engineering and ethics of cognitive robotics. Alan is an advocate for robot ethics; he chairs the advisory board of the Responsible Technology Institute at the University of Oxford and has co-drafted new standards on ethical risk assessment and transparency.

Mosrac – Efficient Motion Control Products Direct Drive PMSM & Encoders

Mosrac is an ISO 9001 company that provides a full range of motion-control products from a single source. We design and manufacture both customer-specific (OEM) and our own-branded (OEM) products based on our product and service offerings. With nearly 15 years of experience, our products are built to last and deliver superior precision, accuracy, consistency, and efficiency. Whether you need a standard or custom component or motion solution, we have exactly what you're looking for.

Identifying Interactions at Scale for LLMs

different_tests

Understanding the behavior of complex machine learning systems, particularly Large Language Models (LLMs), is a critical challenge in modern artificial intelligence. Interpretability research aims to make the decision-making process more transparent to model builders and impacted humans, a step toward safer and more trustworthy AI. To gain a comprehensive understanding, we can analyze these systems through different lenses: feature attribution, which isolates the specific input features driving a prediction (Lundberg & Lee, 2017; Ribeiro et al., 2022); data attribution, which links model behaviors to influential training examples (Koh & Liang, 2017; Ilyas et al., 2022); and mechanistic interpretability, which dissects the functions of internal components (Conmy et al., 2023; Sharkey et al., 2025).

Read More

Scientists built the hardest AI test ever and the results are surprising

As AI systems began acing traditional tests, researchers realized those benchmarks were no longer tough enough. In response, nearly 1,000 experts created Humanity’s Last Exam, a massive 2,500-question challenge covering highly specialized topics across many fields. The exam was engineered so that any question solvable by current AI models was removed. Early results show even the most advanced systems still struggle — revealing a surprisingly large gap between AI performance and true expert-level knowledge.

AI search robot uses 3D maps and internet knowledge to find lost items

A robot that can locate lost items on command, the latest development at the Technical University of Munich (TUM), combines knowledge from the internet with a spatial map of its surroundings to efficiently find the objects being sought. The new robot from Prof. Angela Schoellig's TUM Learning Systems and Robotics Lab looks like a broomstick on wheels with a camera mounted at the top. It is one of the first robots that not only integrates image understanding but also applies it to a clearly defined task.

Build enterprise-ready Agentic AI with DataRobot using NVIDIA Nemotron 3 Super 

With the arrival of NVIDIA Nemotron 3 Super, organizations now have access to a high-accuracy reasoning model purpose-built for collaborative, multi-agent enterprise workloads. Being fully open, Nemotron 3 Super can be customized and deployed securely anywhere. However, having a powerful large language model (LLM) like Nemotron 3 Super is just the starting line. The real challenge is turning that powerful reasoning engine quickly into a production-grade system that your enterprise can trust for building AI agents and applications seamlessly using the LLM.

That is where DataRobot comes in. In this post, we will walk through how DataRobot’s Agent Workforce Platform, co-engineered with NVIDIA, makes it straightforward and quick to take Nemotron 3 Super from a standalone Large Language Model (LLM) to a fully deployed, evaluated, monitored, and governed production system that enterprises can trust and use to build their AI agents and applications seamlessly. We will also explore why mastering each of these steps is critical to successfully deploying specialized agentic AI systems.

A great LLM alone isn’t enough

Nemotron 3 Super is a highly capable 120-billion-parameter hybrid Mamba-Transformer MoE model, optimized for enterprise multi-agent tasks like IT automation and supply chain orchestration, boasting a 1-million-token context window. However, the move from pilot to reliable production is challenging; MIT research shows 95% of GenAI pilots fail, not due to the model’s capabilities, but due to issues in the surrounding deployment infrastructure.

Before deploying any LLM for enterprise applications and agents, organizations must address five critical areas:

  1. Evaluation and Comparison: Thoroughly assess models based on behavioral metrics (accuracy, hallucination) and operational metrics (cost, latency). Use LLMs as judges, proprietary, standard, or synthetic datasets, and comparative evaluations, often augmenting with human input.
  2. Efficient Hosting/Inferencing: Implement scalable, reliable, and elastic hosting infrastructure to ensure continuity for the LLM at the core of Generative and Agentic AI systems.
  3. Observability: Continuously monitor the deployed model’s behavior, both standalone and within agents, with instrumentation to detect and alert on drifts from desired performance.
  4. Real-Time Intervention and Moderation: Establish strong guardrails for real-time intervention to prevent undesirable or toxic behavior, such as PII leakage, which could compound quickly across interactions.
  5. Governance, Security, and Compliance: Enforce rigorous governance via authentication, authorization, approval workflows for updates, and comprehensive testing and reporting against enterprise, industry, and regulatory compliance standards.

DataRobot’s Agent Workforce Platform, co-engineered with NVIDIA, provides a unified solution for all these challenges with NVIDIA Nemotron 3 Super.

Launch Nemotron 3 Super NIM on your infrastructure with a few clicks

Your AI team wants Nemotron 3 Super in production. Your security team wants hardened containers with signed images. Your compliance team wants an audit trail from day one. And you want all of this to run without a month of configuration and a stack of support tickets.

NVIDIA NIM microservices are available directly within the DataRobot platform, pre-configured and optimized for NVIDIA AI Infrastructure. For Nemotron 3 Super — which uses NVFP4 quantization to deliver high performance while keeping compute costs predictable — this means your deployment comes production-ready out of the box. No inference engine tuning. No GPU parameter research. No guesswork.

deploy neomotron 3 final

Here’s what the workflow looks like:

  • Browse and select. Open the NVIDIA NIM model gallery inside DataRobot. Each model comes with a clear description of its capabilities, supported GPU configurations, and resource requirements. Select Nemotron 3 Super and import it into your registry. DataRobot automatically tracks the version, tags it, and begins a full lineage record — so when your compliance team asks “which exact model version is running in production?”, the answer is already documented. 
  • Let the platform handle GPU sizing. DataRobot recommends the optimal GPU configuration for your deployment — whether you’re running on NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs or other supported hardware — so you can focus on testing rather than troubleshooting infrastructure. You don’t need to understand the model’s internal architecture to get this right. The platform matches the model to your hardware and tells you what to provision. If your AI team later asks why you chose a particular configuration, the recommendation is logged and auditable.
  • Deploy with one click. Select your configuration and deploy. Here’s what makes this different from downloading a model container and figuring out the rest yourself: DataRobot deploys the model with monitoring and access controls already wired in. There’s no separate step to “add observability later.” The moment your Nemotron 3 Super endpoint goes live, its already reporting health metrics, latency, throughput, and token consumption to your monitoring dashboard — giving you immediate visibility into how the deployment is performing.

Your AI team gets a live API endpoint they can start building immediately. You get a deployment that’s observable and auditable from minute one. 

Multiple teams, one endpoint — without the free-for-all

Once Nemotron 3 Super is live, the next problem lands fast: multiple teams and applications all hitting the same deployment, with no way to prevent one team’s spike from degrading everyone else’s experience. Without controls, you’re back to fielding “why is the model so slow?” tickets.

NIM multi tenancy

DataRobot’s built-in quota management lets you set default access limits for each endpoint, then apply overrides for specific users, groups, or agents that need more (or less) capacity. Your production agent gets priority allocation; the experimentation team gets enough to stay productive without impacting production traffic. The platform enforces limits automatically — no more arbitrating access over email or diagnosing mystery slowdowns caused by a runaway agent on another team.

Built-in cost visibility

Not every task needs the same level of reasoning — and Nemotron 3 Super is equipped with a configurable thinking budget that lets you match inference cost to task complexity. The difference is dramatic: on the Finance Reasoning Hard benchmark, Nemotron 3 Super at its highest thinking budget reaches ~86% accuracy but consumes over 1.4 million output tokens, while the lowest thinking setting still delivers ~74% accuracy on roughly 100,000 tokens — a 14x reduction in token spend based on results conducted by DataRobot. For straightforward classification or routing tasks, the low setting is more than enough. For complex financial analysis or multi-step reasoning, you dial it up.

accuracy vs tokens

This means you can run a single model across multiple use cases and tune the cost-accuracy tradeoff per task, rather than deploying separate models for simple versus complex workloads. DataRobot surfaces this through its monitoring dashboard — giving you and your leadership clear visibility into token consumption per team, and per deployment. When your CFO asks “what are we spending on AI inference?”, you’ll have the numbers ready.

Rigorous evaluation before production

Deployment without evaluation is a recipe for failure. DataRobot provides comprehensive evaluation capabilities that let you rigorously test Nemotron 3 Super before they reach production.

LLM-as-a-Judge and out-of-the-box metrics

DataRobot’s evaluation framework spans the full range of metrics that matter:

  • Functional metrics and automated compliance tests measure correctness, faithfulness, relevance, bias, toxicity, etc., giving teams a rigorous, multi-dimensional view of model quality. 
  • Security and safety metrics provide real-time guards evaluating whether outputs comply with safety expectations — including detection of toxic language, PII exposure prevention, prompt-injection resistance, topic boundary adherence, and emotional tone classification.
  • Economic metrics track token usage and cost, ensuring that your Nemotron 3 Super deployment remains economically sustainable at scale.
configure eval

Playground comparison and the Evaluation API

DataRobot’s LLM Playground lets you setup side-by-side comparisons — running Nemotron 3 Super against other models, different prompt strategies, or alternative vector database configurations. You can configure up to three workflows at a time, run queries, and analyze results using LLM-as-a-judge alongside human-in-the-loop reviews with custom or synthetic test data. 

For teams that want programmatic control, the Evaluation API supports the same full set of metrics, enabling automated evaluation pipelines that integrate with your existing CI/CD workflows.

Execution tracing for deep debugging

Evaluation without explainability is incomplete. DataRobot’s tracing capabilities expose the full execution path of every interaction: the sequence and latency, the tools or functions invoked, and the inputs and outputs at each stage. This is especially important for Nemotron 3 Super powered agents because the model’s reasoning capabilities — including its configurable reasoning trace — mean that understanding how the agent arrived at a result is as important as whether the result was correct.

Tracing extends relevant metrics like accuracy and latency to both the input and output of each step, enabling you to pinpoint exactly where an issue originated in a multi-step workflow. This visibility makes debugging faster, iteration safer, and refinement more confident.

execution tracing

Scalable deployment and production monitoring

Once evaluation confirms Nemotron 3 Super is performing as expected, DataRobot ensures it stays that way in production.

Scalable infrastructure management

The Agent Workforce Platform handles the operational complexity of running Nemotron 3 Super at enterprise scale. With NVIDIA AI Enterprise natively embedded, the platform manages containerization, resource allocation, and scaling automatically. Whether you’re handling hundreds or thousands of concurrent requests, the infrastructure adapts — scaling GPU resources up and down based on demand without requiring manual intervention.

For organizations with strict data sovereignty requirements, this extends to on-premises and air-gapped deployments using the NVIDIA AI Factory for Government reference architecture.

Continuous monitoring with out-of-the-box metrics

DataRobot’s observability framework delivers comprehensive visibility across health, quality, usage, and resource dimensions through a unified console:

  • Real-time performance & resource tracking monitors latency, throughput, token consumption, CPU utilization, memory, and concurrency across every deployment — with quota rates and alerts to catch degradation and enforce cost governance before either impacts users.
OTel tracing
  • OTel tracing captures the full execution path of every system interaction — from initial prompt through each tool call, retrieval step, and model invocation — with timing and payload visibility at each node. Trace correlation links a quality degradation signal directly to the offending step, so root cause analysis takes minutes rather than hours.
  • Custom alerting lets you define thresholds across any metric and route notifications to your preferred channels, enabling proactive intervention rather than reactive firefighting.

The monitoring system works seamlessly across all deployment environments, providing a single pane of glass whether your NVIDIA Nemotron 3 Super NIM are running in the cloud, on-premises, or in a hybrid configuration.

Enterprise governance and real-time intervention

Governance isn’t a checkbox at the end of a deployment — it’s an operational discipline that spans the entire model lifecycle. DataRobot provides governance capabilities across three critical dimensions for NVIDIA Nemotron 3 Super deployments.

Security risk governance

DataRobot enforces role-based access controls (RBAC) aligned with your organizational policies for all tools and enterprise systems that agents can access. This means your Nemotron 3 Super only interacts with the data and systems they’re explicitly authorized to use.

Robust, auditable approval workflows prevent unauthorized or unintended deployments and updates. Every change to the system — from prompt modifications to configuration updates — is tracked and requires appropriate authorization.

Operational risk governance with real-time intervention

This is where DataRobot’s capabilities become particularly critical. Beyond monitoring and alerting, the platform provides real-time moderation and intervention capabilities that can catch and address undesired inputs or outputs as they happen.

Multi-layer safety guardrails — including NVIDIA NeMo Guardrails for topic control, content safety, and jailbreak detection — operate in real time during model execution. You can configure these guardrails directly within the DataRobot Model Workshop, customizing thresholds and adding additional protections specific to NVIDIA Nemotron 3 Super deployment.

Lineage and versioning
Lineage and versioning

Lineage and versioning capabilities track all versions of NVIDIA Nemotron 3 – powered AI system: models, prompts, VDBs, datasets, creating an auditable record of how decisions were made and preventing behavioral drift across deployments.

Regulatory risk governance

DataRobot supports validation against applicable regulatory frameworks — including the EU AI Act, NIST RMF, and country- or state-level guidelines — identifying risks including bias, hallucinations, toxicity, prompt injection, and PII leakage.

Automated compliance documentation is generated as part of the deployment process, reducing audit effort and manual work while ensuring NVIDIA Nemotron 3 Super deployment maintains ongoing compliance as regulations evolve.

How to use doc

From model to impact

NVIDIA Nemotron 3 family of open models represents a significant step forward for enterprise agentic AI. Nemotron 3 Super, with its high-accuracy reasoning optimized for collaborative multi-agent workloads, is purpose-built for the kind of enterprise applications that drive real business outcomes.

But the organizations that will succeed with Nemotron 3 Super are not the ones with the most impressive demos. They’re the ones that rigorously evaluate behavior, monitor systems continuously in production, and embed governance across the entire agent lifecycle. Reliability, safety, and scale are not accidental outcomes — they are engineered through disciplined metrics, observability, and control.

DataRobot’s Agent Workforce Platform, co-engineered with NVIDIA, provides the complete foundation to make that happen. From one-click deployment to comprehensive evaluation, from continuous monitoring to real-time governance — we make the hard part of enterprise AI manageable.

Ready to build with NVIDIA Nemotron 3 Super on DataRobot? Request a demo and see how quickly you can move from model to production.

The post Build enterprise-ready Agentic AI with DataRobot using NVIDIA Nemotron 3 Super  appeared first on DataRobot.

Robots that learn everyday tasks can free humans from repetitive work

A robot task AI capable of learning and performing everyday repetitive tasks in a human-like manner has been developed. The AI learns tasks through human demonstrations and executes complex tasks step by step based on a hierarchical task execution framework. The technology is expected to contribute to the automation of labor-intensive repetitive work and reduce human workload in homes, offices, as well as retail and logistics environments.

A bicycle robot that can drive fast and jump over obstacles

Experienced human cyclists can perform a wide range of maneuvers and acrobatics while riding their bicycle, from balancing in place to riding on a single wheel or hopping over obstacles. Reproducing these agile maneuvers in two-wheeled robots could open new opportunities both for entertainment or robot sports and for the completion of complex missions in rough terrain.

Coding for underwater robotics

Screenshot from video showing underwater robotic vehicle. Credit: Tim Briggs/MIT Lincoln Laboratory.

During a summer internship at MIT Lincoln Laboratory, Ivy Mahncke, an undergraduate student of robotics engineering at Olin College of Engineering, took a hands-on approach to testing algorithms for underwater navigation. She first discovered her love for working with underwater robotics as an intern at the Woods Hole Oceanographic Institution in 2024. Drawn by the chance to tackle new problems and cutting-edge algorithm development, Mahncke began an internship with Lincoln Laboratory’s Advanced Undersea Systems and Technology Group in 2025. 

Mahncke spent the summer developing and troubleshooting an algorithm that would help a human diver and robotic vehicle collaboratively navigate underwater. The lack of traditional localization aids — such as the Global Positioning System, or GPS — in an underwater environment posed challenges for navigation that Mahncke and her mentors sought to overcome. Her work in the laboratory culminated in field tests of the algorithm on an operational underwater vehicle. Accompanying group staff to field test sites in the Atlantic Ocean, Charles River, and Lake Superior, Mahncke had the opportunity see her software in action in the real world.

“One of the lead engineers on the project had split off to go do other work. And she said, ‘Here’s my laptop. Here are the things that you need to do. I trust you to go do them.’ And so I got to be out on the water as not just an extra pair of hands, but as one of the lead field testers,” Mahncke says. “I really felt that my supervisors saw me as the future generation of engineers, either at Lincoln Lab or just in the broader industry.”

Says Madeline Miller, Mahncke’s internship supervisor: “Ivy’s internship coincided with a rigorous series of field tests at the end of an ambitious program. We figuratively threw her right in the water, and she not only floated, but played an integral part in our program’s ability to hit several reach goals.”

Lincoln Laboratory’s summer research program runs from mid-May to August. Applications are now open. 

Video by Tim Briggs/MIT Lincoln Laboratory | 2 minutes, 59 seconds

Page 1 of 599
1 2 3 599