The Perfect Marriage: How CNC Solutions Turned ENCY, a Stäubli Robot, and a Beckhoff Controller into a Product
Wristband enables wearers to control a robotic hand with their own movements
Amazon buys Fauna Robotics, maker of the Sprout humanoid robot
Drones paired with AI could help search‑and‑rescue teams find missing persons faster
7 Key Engineering Hurdles in Building High Precision Gantry Systems for Automation
A history of RoboCup with Manuela Veloso
RoboCup is an international competition that promotes and advances robotics and AI through the challenges presented by its various leagues. We got the chance to sit down with Professor Manuela Veloso, one of RoboCup’s founders, to find out more about how it all started, how the community has grown over the years, and the vision for the future.
I think it would be very interesting to go right back to the beginning and hear how RoboCup got started. What was the initial idea, and how did it get set up?
So we are talking about the mid-90s. In terms of the research in those days, it was the beginning of the internet and many AI and computer science researchers were focused on the internet, first on sophisticated search algorithms, on natural language understanding, on information retrieval, and then on software agents and machine learning applied to digital information. From what I recall, there was a smaller group of researchers who were interested in actual, physical robots, and in particular in AI and robotics. I myself was specifically interested in the problem of creating autonomous robots with perception (get information from the world), cognition (select action to achieve goals), and then act (execute the planned actions). This combination of perception, cognition, and action is a very good framework for autonomous robots, because they have to get their information from their sensors, they have to reason about actions to achieve their goals, and then execute them. So, during the 90s, I was at Carnegie Mellon with this AI research goal of integrating perception, cognition, and action in autonomous robots.
Over in Canada there was Alan Mackworth, who jointly with his wonderful student, Michael Sahota, built a one-on-one little autonomous robot soccer set-up. Two robots ran on a small field that had a camera overhead, and aimed at scoring each in one of the two little goals. This work showed that this task of kicking a ball and defending and aiming at a goal could be done autonomously. So it was a tremendous demonstration that a robot soccer world could exist. Mostly at the same time, in Japan, Minoru Asada was showing that a big robot could learn with reinforcement learning how to push a ball into a goal. So you have these one-on-one, fully autonomous little robot cars that were pushing balls around in Canada, and then there was this effort of learning to score with a larger robot in Japan. The learning robot didn’t have a team, it was not a real game, but it was showing that reinforcement learning could learn the skill of aiming into a goal. And then there was also Hiroaki Kitano at Sony who was very interested in little humanoids.
So this is very beautiful because all these things came into play – all of us had different interests.
Alan Mackworth did not get involved with RoboCup, but he gave a demonstration of these one-on-one robots at AAAI in 1994. And in those days, I had a PhD student who had just joined – Peter Stone. And Peter was a serious and passionate soccer player. He saw this little game and he came to me and said, “this is what I want to do for my thesis research, robot soccer!” And for me, I was trying to find a research environment where autonomy was needed in the robot world. I had already a student, Karen Haigh, who was working with autonomous office robots, and learning to plan and execute. But with these soccer robots and Peter Stone’s interest everything came to play, and we started robot soccer research in my lab.
In 1996, there was also a robot soccer effort in South Korea, called MIROSOT, and that’s the first competition we participated in. So Peter Stone, myself, and the team we built at Carnegie Mellon – Sorin Achim and Kwun Han – went to South Korea to participate. From South Korea, we flew to a robot soccer workshop in Japan organized by Minoru Asada and Hiroaki Kitano. Also in attendance were Dominique Duhaut, Itsuki Noda, Silvia Coradeschi, and Enrico Pagello. And that’s where RoboCup really started – we decided to do a competition. And the good thing was that Kitano was the chair of IJCAI, which was going to happen in Osaka in the summer of 1997. So we are there in Osaka and literally we came up with this idea of having a robot soccer competition, RoboCup. It was a big moment for us as researchers. We had to come up with the rules of this competition so that people would be able to participate seven months later. We came up with the three leagues that we were interested in and had expertise in.
The small size league, building upon our Carnegie Mellon interests, would have a field with a camera overhead connected to a computer and then the computer would remotely control the robots through radio.
Then Minoru Asada had these bigger robots with wheels, and we created a league that we call the middle-size league to include the robotics research of Minoru Asada and others. And then Itsuki Noda was interested in creating a simulation environment. We thought that this would help get more people participating in this task of robot soccer.
So that’s how the three leagues started: the small size, the middle size, and the simulation. Hiroaki Kitano, Dominique Duhaut and I were in charge of the small-size league, Minoru Asada was in charge of the middle-size league, and Itsuki Noda ran the simulation league.
One of the challenges was to come up with the rules and define the robots and playing fields. I remember my own pragmatism in suggesting that we play on a ping-pong table for the small-size, as a ping-pong table is something that exists in the whole world. That meant that we would have a playing surface, with defined size and texture, anywhere in the world. We decided one ping-pong table for the small-size league, and nine ping-pong tables for the middle-size league.
In the summer of 1997, at IJCAI, when we all went to the actual first RoboCup competition, the space was gorgeous. Hiroaki Kitano had made these beautiful fields and white bleachers around the fields. It was a very beautiful space with an area with computers for the simulation league. There were 80 teams that had joined the simulation online. We were, I believe, five teams for the small size league and about eight to ten for the middle size league.
That’s how it all started. And 1997 it was in Osaka, in 1998 Dominique Duhaut organized RoboCup in Paris, at a modern Science Center, La Villette. And then in 1999, RoboCup was organized by Silvia Coradeschi, in Stockholm, again co-located with the IJCAI conference. In these three years 1997, ’98, and ’99, there were only these three leagues, small size, middle size, and simulation. It was the foundation of everything. RoboCup grew consistently every year in terms of the number of teams, the number of participants, and the number of participating countries.
Some of the pitches in the main soccer arena at RoboCup 2024, held in Eindhoven, The Netherlands.
So how did RoboCup expand and could you talk about the decisions to add extra leagues?
Well, nothing was necessarily planned, it was more related to the research interests that people had. So when we were in Melbourne in 2000, there were two things that were added. One was RoboCupJunior. There was a professor, Elizabeth Sklar, who had a tremendous interest in robotics education for children. She proposed the RoboCupJunior competitions for children K-12. The goal of RoboCupJunior was to train all these young people to do research in robotics. It was and it still is extremely successful. In Melbourne there were probably the same number of children as there were people competing in what we now call the major leagues. As well as the soccer leagues, Elizabeth also ran a dance competition. This was very impressive – the children would dance on stage with their designed, built, and programmed mobile robots.
I believe that the rescue league was also introduced in 2000. The reason why we came up with rescue was because in those days, there was a lot of research on robots in disaster environments. So many people in our groups had an interest in developing robots that were able to handle disaster environments. So, we included that interest. And later on, @home was also introduced because people had an interest in service robots.
The logistics league came later to cover robots inside factories. The reason for introducing new leagues was always to include the community. And we didn’t want anyone to feel they could not come to RoboCup because they did different research with their robots. And so it was a very intellectually inclusive environment for AI and robotics researchers that could perform tasks autonomously – with perception, cognition, and action mostly in teams for particular tasks.
Rescue league arena at RoboCup 2024.
Could you talk about the changes that were announced last year, and the decision behind those?
Back in 1997, Hiroaki Kitano came up with this goal of a robot soccer team being able to beat the human World Cup winners by 2025. (I later tried, in the early 2000s, to rephrase that goal so that it was not about the robots beating the humans, but instead having robots playing alongside human players by 2050.) However, for this to happen, we have started the humanoid league so that the robots would have legs, not wheels. Our roadmap hence moves towards humanoid robots. It also happens that currently there are several robot companies who are producing humanoid robots that are easy to acquire, so researchers do not have the design and build their own humanoid robots, which was a difficult task that was pursued only by a limited group of researchers. In parallel, events started aiming at including humanoid robot soccer. We believed that RoboCup should be visible and a reference for humanoid robot soccer. We then proposed and changed that, in the international RoboCup competitions, robot soccer will focus solely on humanoid robots. That doesn’t mean that the other RoboCup leagues aren’t valuable. But if they continue at a regional level, there will still be venues to foster these other types of interests. The RoboCup international competitions would be solely focused on humanoid robot soccer.
Action from the humanoid soccer league at RoboCup 2025. Image courtesy of Alessandra Rossi.
RoboCup was tremendously successful in terms of including a variety of different autonomous robot research interests from the community. But I also believe that now we should move into just humanoid soccer at the international level. This will help with visibility of the competition and also for consolidation. We can now buy humanoid robots that we didn’t have before. Previously, you had to build them in your universities, and that was not a research direction for many people. But there is not that excuse anymore. You can buy humanoid robots, and increasingly, you can talk with these humanoid robots using GenAI. So it became a different challenge, a much more accessible challenge than before. The challenge will be: can these platforms play soccer in the presence of another team? I think it’s a fascinating new direction to try to focus on this challenge of the game of soccer – multiple players, two teams in a large space, eventually coordinating, collaborating, or playing with and against humans. So that’s the rationale.
When people don’t know the story, they tend to wonder why we have all these other leagues, such as rescue and logistics. It was always driven by the interests of the community and a big heart for a research approach. And we always thought that because we were organizing an event with a venue, that we could include these people that have interests outside of soccer. That has always been our drive. However, it’s true that it diluted a little bit the goal of 2050 – the humanoid soccer robots. Now I think it’s time to go back to soccer as a main focus, and still keep supporting the other interests that we have fostered, more at a regional level.
There are still a lot of details to be sorted out. And, of course, people may be upset by leagues terminating at the international level. I have always participated in the small-size league myself, 20 years of participating, and I am very attached to it. I was always the trustee representing the small-size league, and I am not sure I am happy with the small-size not being in the international event. But I need to think beyond what my personal interests are and try to understand what would have a bigger impact for AI, robotics, and RoboCup. I think that that’s demonstrating a group of humanoids playing soccer. And if we don’t do it at RoboCup, maybe someone else will.
The way I think about it is that we have made this decision and we can reevaluate it in five or ten years. But not making a change and never having the courage to deviate from what we have been pursuing is less exciting. Things have stalled a bit, with the same number of teams, the same people, the same rules, the same type of intellectual and research accomplishments. We need to excite people about something major again. So I think that the community will greatly embrace this new RoboCup international. And we are so well organized locally, that we can support other leagues at the local events.
I do think that, from a scientific and research point of view, it’s the right moment to target the soccer humanoid robots, because of their availability and our RoboCup ultimate goal. I think it would be amazing if people knew RoboCup international as the humanoid robot soccer competition. Can you imagine in 2027 having 100 teams all playing soccer with humanoids? We’ll see.
About Manuela Veloso
|
|
For the last eight years, Manuela Veloso has been the founder and Head of JPMorganChase AI Research and Herbert A. Simon University Professor Emerita at Carnegie Mellon University, where she was faculty in the Computer Science Department and then Head of the Machine Learning Department.
At JPMorganChase, she built a team of 100 top talented members with graduate education (PhD and Masters) in AI and related disciplines. The team focused on pillar areas of AI in finance, including data-driven optimization, planning and search, document analysis, trustworthy AI, AI and mathematical reasoning, continual learning, and multiagent systems. The team published their research in academic venues and addressed and contributed to business needs and vision.
Veloso has a licenciatura degree in Electrical Engineering and an M.Sc. in Electrical and Computer Engineering from Instituto Superior Técnico, Lisbon, an M.A. in Computer Science from Boston University, and a Ph.D. in Computer Science from Carnegie Mellon University. Veloso has Doctorate Honoris Causa degrees from the Örebro University, Sweden, the Instituto Universitário de Lisboa (ISCTE), Portugal, the Université de Bordeaux, France, and the Universidade Católica of Portugal.
She served as president of the Association for the Advancement of Artificial Intelligence (AAAI), and she is co-founder and a Past President of the RoboCup Federation. She is a fellow of main professional organizations in her area, namely AAAI, IEEE, AAAS, and ACM. She is the recipient of the ACM/SIGART Autonomous Agents Research Award, the Einstein Chair of the Chinese Academy of Sciences, an NSF Career Award, and the Allen Newell Medal for Excellence in Research. Veloso is a member of the National Academy of Engineering with a citation “for contributions to artificial intelligence and its applications in robotics and the financial services industry.” She is also a member of the Academy of Sciences of Portugal.
Her research interests are in AI, including Multiagent Systems, Autonomous Robots, Continual Learning Agents, and AI in Finance. For further details, see Manuela’s webpage.
SAP Generative AI
SAP Generative AI: Enterprise Use Cases, Deployment Realities, and What to Expect in 2026?
The Conversation Happening in Every SAP Shop Right Now
Every major enterprise running SAP has had a version of the same leadership conversation in the past 18 months: we have invested heavily in SAP, our data lives there, generative AI is real — so what does GenAI on SAP actually look like for us?
The honest answer is more nuanced than most vendor pitches suggest. Generative AI on SAP is working well in specific use cases, producing real productivity gains, and expanding fast. It is also being deployed carelessly in others, producing outputs that undermine trust and slow adoption.
This article maps both sides: where SAP generative AI is producing verifiable business results, and what it takes to deploy it in a way that holds up inside a governed enterprise environment.
USM Business Systems is a CMMi Level 3, Oracle Gold Partner AI and IT services firm based in Ashburn, VA, with 1,000+ engineers and 2,000+ delivered enterprise applications. Our SAP AI practice integrates generative AI capabilities into live SAP environments across manufacturing, supply chain, pharma, and logistics.
What SAP Has Built — The Native GenAI Layer
SAP’s generative AI strategy centers on three interconnected components:
- SAP Joule
Joule is SAP’s AI copilot — a generative AI assistant embedded across S/4HANA, SAP SuccessFactors, SAP Ariba, SAP Customer Experience, and SAP Analytics Cloud. It interprets natural language requests, retrieves relevant SAP data, and executes tasks or surfaces insights without the user navigating transaction codes.
Joule launched to general availability in late 2023 and has been expanding its coverage across SAP applications steadily. By mid-2025, SAP reported Joule embedded in over 80% of its cloud revenue-generating applications. For enterprises on SAP’s cloud products, Joule is the fastest path to generative AI adoption because it requires no custom development — it is configured, not built.
- SAP AI Core
AI Core is the managed runtime where custom generative AI models are deployed, governed, and operated inside the SAP ecosystem. An enterprise that wants to deploy a proprietary LLM, a fine-tuned model trained on their SAP data, or an agentic system that uses generative AI as its reasoning layer uses AI Core as the infrastructure. AI Core integrates with major model providers — Azure OpenAI, Anthropic, AWS Bedrock — through SAP’s generative AI hub.
- SAP AI Foundation (BTP)
AI Foundation on BTP provides the developer tooling, APIs, and pre-built AI services that allow enterprise developers to build generative AI applications connected to SAP data and workflows. It includes vector database services for retrieval-augmented generation (RAG), embedding models, and the API gateway that connects external LLMs to SAP data in a governed way.
Where Generative AI on SAP Is Producing Real Results?
- Supply Chain Exception Handling
Operations teams receive hundreds of exceptions daily from SAP IBP and S/4HANA — demand deviations, supplier alerts, inventory flags. Generative AI systems trained on historical exception data and resolution patterns can classify incoming exceptions, retrieve the relevant context from SAP, draft a recommended resolution, and route it to the right team.
Enterprises using this pattern report 40-60% reductions in time-to-resolution for standard exceptions, with planners focusing attention on the complex cases the AI flags as requiring judgment [Gartner Supply Chain Technology Report, 2025].
- Procurement Content and Contract Intelligence
Generative AI connected to SAP Ariba contract data can answer natural language questions about contract terms, flag compliance deviations, summarize vendor performance, and draft procurement communications. A procurement manager who previously spent two hours pulling contract data before a supplier review now gets a briefing document generated in minutes from the SAP source data.
- Maintenance and Operations Narrative Generation
In manufacturing environments, SAP PM (Plant Maintenance) accumulates years of work order history, failure codes, and technician notes — mostly unstructured. Generative AI can synthesize this data to produce maintenance history summaries, predict recurring failure patterns, and draft work order instructions that incorporate historical repair context. Plants using this capability report meaningful reductions in repeat failures and faster technician onboarding.
- Financial Narrative and Close Support
Finance teams using SAP S/4HANA Finance are deploying generative AI to draft variance explanations, generate management commentary on financial results, and produce first drafts of board reporting. These are tasks that previously consumed analyst time at month-end. The model reads the SAP financial data, interprets the variance against prior period, and drafts an explanation in the organization’s reporting format.
- What is the difference between using Joule and building a custom generative AI capability on SAP?
Joule addresses tasks that SAP has designed it for — navigating S/4HANA, retrieving standard data, executing defined SAP workflows in natural language. Custom generative AI addresses problems specific to your environment, your data, and your workflows that SAP has not pre-built. Most enterprises will use both: Joule for general SAP productivity, and custom capabilities for the high-value, organization-specific problems.
- How do you keep sensitive SAP data out of public LLM training data?
Enterprise generative AI deployments on SAP use private API connections to model providers — Azure OpenAI, Anthropic, AWS Bedrock — where data sent through the API is not used for model training. SAP AI Core manages these connections with enterprise-grade credential management and logging. For the most sensitive environments, models can be deployed entirely within the enterprise’s cloud tenant.
What 2026 Looks Like for SAP GenAI Adoption?
Based on current deployment velocity and SAP’s product roadmap, three shifts are materializing in 2026:
- Joule coverage expanding to SAP Extended Warehouse Management and SAP TM, making generative AI accessible to logistics and distribution operations teams without custom development.
- SAP AI Core adding support for multi-agent orchestration natively, reducing the custom engineering required to build agentic workflows on SAP.
- Enterprises moving from pilot to production at scale. IDC projects that 65% of large enterprises running SAP will have at least one generative AI capability in production by end of 2026, up from roughly 28% at end of 2024.
Why USM Business Systems?
USM Business Systems is a CMMi Level 3, Oracle Gold Partner AI and IT services firm headquartered in Ashburn, VA. With 1,000+ engineers, 2,000+ delivered applications, and 27 years of enterprise delivery experience, USM specializes in AI implementation for supply chain, pharma, manufacturing, and SAP environments. Our SAP AI practice places specialized engineers inside enterprise programs within days — on contract, as dedicated delivery pods, or on a project basis.
Ready to put SAP AI into production? Book a 30-minute scoping call with our SAP AI team.
[contact-form-7]
FAQ
Does generative AI on SAP require moving to SAP’s cloud products?
No. SAP AI Core and BTP services can connect to on-premise S/4HANA environments through SAP Integration Suite. The generative AI runtime and the SAP data source do not need to be in the same deployment model.
What is retrieval-augmented generation (RAG) and why is it important for SAP?
RAG is an architecture where the AI model retrieves relevant data from a source — in this case SAP Datasphere or HANA views — and uses it as context when generating a response, rather than relying solely on its training data. For SAP use cases, RAG is important because it grounds the model’s outputs in your actual enterprise data rather than general knowledge.
How do you measure ROI on SAP generative AI deployments?
The most reliable metrics are time reduction on specific tasks (exception handling time, reporting preparation time, document review time), error rate reduction on processes the AI is involved in, and throughput increase for teams using AI assistance. Tie each metric to a baseline measurement taken before deployment.
What SAP license or subscription is required for generative AI features?
Joule is included in SAP’s Business AI subscription, which is bundled with most SAP cloud products. SAP AI Core pricing is consumption-based. For custom deployments using external LLM providers, costs include the BTP services and the model API costs from the LLM provider.
Can generative AI work with SAP on-premise systems that are not on S/4HANA?
Yes, though the integration path is more complex. Older SAP systems — ECC, BW — can be connected through SAP Integration Suite and data extraction pipelines. The generative AI capability sits outside the legacy system and reads from a structured data extract.
The agentic AI cost problem no one talks about: slow iteration cycles
Imagine a factory floor where every machine is running at full capacity. The lights are on, the equipment is humming, the engineers are busy. Nothing is shipping.
The bottleneck isn’t production capacity. It’s the quality control loop that takes three weeks every cycle, holds everything up, and costs the same whether the line is moving or standing still. You can buy faster machines. You can hire more engineers. Until the loop speeds up, costs keep rising and output stays stuck.
That’s exactly where most enterprise agentic AI programs are right now. The models are good enough. Compute is provisioned. Teams are building. But the path from development to evaluation to approval to deployment is too slow, and every extra cycle burns budget before business value appears.
This is what makes agentic AI expensive in ways many teams underestimate. These systems don’t just generate outputs. They make decisions, call tools, and act with enough autonomy to cause real damage in production if they aren’t continuously refined. The complexity that makes them powerful is the same complexity that makes each cycle expensive when the process isn’t built for speed.
The fix isn’t more budget. It’s a faster loop, one where evaluation, governance, and deployment are built into how you iterate, not bolted on at the end.
Key takeaways
- Slow iteration is a hidden cost multiplier. GPU waste, rework, and opportunity cost compound faster than most teams realize.
- Evaluation and debugging, not model training, are the real budget drains. Multi-step agent testing, tracing, and governance validation consume far more time and compute than most enterprises anticipate.
- Governance embedded early accelerates delivery. Treating compliance as continuous validation prevents expensive late-stage rebuilds that stall production.
- When provisioning, scaling, and orchestration run automatically, teams can focus on improving agents instead of managing plumbing.
- The right metric is success-per-dollar. Measuring task success rate relative to compute cost reveals whether iteration cycles are truly improving ROI.
Why agentic AI iteration is harder than you think
The old playbook — develop, test, refine — doesn’t hold up for agentic AI. The reason is simple: once agents can take actions, not just return answers, development stops being a linear build-test cycle and becomes a continuous loop of evaluation, debugging, governance, and observation.
The modern cycle has six stages:
- Build
- Evaluate
- Debug
- Deploy
- Observe
- Govern
Each step feeds into the next, and the loop never stops. A broken handoff anywhere can add weeks to your timeline.
The complexity is structural. Agentic systems don’t just respond to input. They act with enough autonomy to create real failures in production. More autonomy means more failure modes. More failure modes mean more testing, more debugging, and more governance. And while governance appears last in the cycle, it can’t be treated as a final checkpoint. Teams that do pay for that decision twice: once to build, and again to rebuild.
Three barriers consistently slow this cycle down in enterprise environments:
- Tool sprawl: Evaluation, orchestration, monitoring, and governance tools stitched together from different vendors create fragile integrations that break at the worst moments.
- Infrastructure overhead: Engineers spend more time provisioning compute, managing containers, or scaling GPUs than improving agents.
- Governance bottlenecks: Compliance treated as a final step forces teams into the same expensive cycle. Build, hit the wall, rework, repeat.
Model training isn’t where your budget disappears. That’s increasingly commodity territory. The real cost is evaluation and debugging: GPU hours consumed while teams run complex multi-step tests and trace agent behavior across distributed systems they’re still learning to operate.
Why slow iteration drives up AI costs
Slow iteration isn’t just inefficient. It’s a compounding tax on budget, momentum, and time-to-value, and the costs accumulate faster than most teams track.
- GPU waste from long-running evaluation cycles: When evaluation pipelines take hours or days, expensive GPU instances burn budget while your team waits for results. Without confidence in rapid scale-up and scale-down, IT defaults to keeping resources running continuously. You pay full price for idle compute.
- Late governance flags force full rebuilds: When compliance catches issues after architecture, integrations, and custom logic are already in place, you don’t patch the problem. You rebuild. That means paying the full development cost twice.
- Orchestration work crowds out agent work: Every new agent means container setup, infrastructure configuration, and integration overhead. Engineers hired to build AI spend their time maintaining pipelines instead.
- Time-to-production delays are the highest cost of all: Every additional iteration cycle is another week a real business problem goes unsolved. Markets shift. Priorities change. The use case your team is perfecting may matter far less by the time it ships.
Technical debt compounds each of these costs. Slow cycles make architectural decisions harder to reverse and push teams toward shortcuts that create larger problems downstream.
Faster iteration compounds. Here’s what that means for ROI.
Most enterprises think faster iteration means shipping sooner. That’s true, but it’s the least interesting part.
The real advantage is compounding. Each cycle improves the AI agent you’re building and sharpens your team’s ability to build the next one. When you can validate quickly, you stop making theoretical bets about agent design and start running real experiments. Decisions get made on evidence, not assumptions, and course corrections happen while they’re still inexpensive.
Four factors determine how much ROI you actually capture:
- Governance built in from day zero: Compliance treated as a final hurdle forces expensive rebuilds just as teams approach launch. When governance, auditability, and risk controls are part of how you iterate from the start, you eliminate the rework cycles that drain budgets and kill momentum.
- Automated infrastructure: When provisioning, scaling, and orchestration run automatically, engineers focus on agent logic instead of managing compute. The overhead disappears. Iteration accelerates.
- Evaluation that runs without manual intervention: Automated pipelines run scenarios in parallel, return faster feedback, and cover more ground than manual testing. The historically slowest part of the cycle stops being a bottleneck.
- Debugging with real visibility: Multi-step agent failures are notoriously hard to diagnose without tooling. Trace logs, state inspection, and scenario replays compress debugging from days to hours.
Together, these factors don’t just speed up a single deployment. They build the operational foundation that makes every subsequent agent faster and cheaper to deliver.
Practical ways to accelerate iterations without overspending
The following tactics address the points where agentic AI cycles break down most often: evaluation, model selection, parallelization, and tooling.
Stop treating evaluation as an afterthought
Evaluation is where agentic AI projects slow to a crawl and budgets spiral. The problem sits at the intersection of governance requirements, infrastructure complexity, and the reality that multi-agent systems are simply harder to test than traditional ML.
Multi-agent evaluation requires orchestrating scenarios where agents communicate with each other, call external APIs, and interact with other production systems. Traditional frameworks weren’t built for this. Teams end up building custom solutions that work initially but become unmaintainable fast.
Safety checks and compliance validation need to run with every iteration, not just at major milestones. When those checks are manual or scattered across tools, evaluation timelines bloat unnecessarily. Being thorough and being slow are not the same thing. The answer is unified evaluation pipelines. Infrastructure, safety validation, and performance testing are integrated capabilities. Automate governance checks. Give engineers the time to improve agents instead of managing test environments.
Match model size to task complexity
Stop throwing frontier models at every problem. It’s expensive, and it’s a choice, not a default.
Agentic workflows aren’t monolithic. A simple data extraction task doesn’t require the same model as complex multi-step reasoning. Matching model capability to task complexity reduces compute costs substantially while maintaining performance where it actually matters. Smaller models don’t always produce equivalent results, but for the right tasks, they don’t need to.
Dynamic model selection, where simpler tasks route to smaller models and complex reasoning routes to larger ones, can significantly cut token and compute costs without degrading output quality. The catch is that your infrastructure needs to switch between models without adding latency or operational complexity. Most enterprises aren’t there yet, which is why they default to overpaying.
Use parallelization for faster feedback
Running multiple evaluations simultaneously is the obvious way to compress iteration cycles. The catch is that it only works when the underlying infrastructure is built for it.
When evaluation workloads are properly containerized and orchestrated, you can test multiple agent variants, run diverse scenarios, and validate configurations at the same time. Throughput increases without a proportional rise in costs. Feedback arrives faster.
Most enterprise teams aren’t there yet. They attempt parallel testing, hit resource contention, watch costs spike, and end up managing infrastructure problems instead of improving agents. The speed-up becomes a slowdown with a higher bill.
The prerequisite isn’t parallelization itself. It’s elastic, containerized infrastructure that can scale workloads on demand without manual intervention.
Fragmented tooling is a hidden iteration tax
The real tooling gaps that slow enterprise teams aren’t about individual tool quality. They’re about integration, lifecycle management, and the manual work that accumulates at every seam.
Map your workflow from development through monitoring and eliminate every manual handoff. Every point where a human moves data, triggers a process, or translates formats is a breakpoint that slows iteration. Consolidate tools where possible. Automate handoffs where you can’t.
Consolidate governance into one layer. Disconnected compliance tools create fragmented audit trails, and permissions have to be rebuilt for every agent. When you’re scaling an agent workforce, that overhead compounds fast. A single source for audit logs, permissions, and compliance validation isn’t a nice-to-have.
Standardize infrastructure setup. Custom environment configuration for every iteration is a recurring cost that scales with your team’s output. Templates and infrastructure-as-code make setup a non-event instead of a recurring tax.
Choose platforms where development, evaluation, deployment, monitoring, and governance are integrated capabilities. The overhead of maintaining disconnected tools will cost more over time than any marginal feature difference between them is worth.
Governance built in moves faster than governance bolted on
Speed doesn’t undermine compliance. Frequent validation creates stronger governance than sporadic audits at major milestones. Continuous checks catch issues early, when fixing them is cheap. Sporadic audits catch them late, when fixing them means rebuilding.
Most enterprises still treat governance as a final checkpoint, a gate at the end of development. Compliance issues surface after weeks of building, forcing rework cycles that wreck timelines and budgets. The cost isn’t just the rebuild. It’s everything that didn’t ship while the team was rebuilding.
The alternative is governance embedded from day zero: reproducibility, versioning, lineage tracking, and auditability built into how you develop, not appended at the end.
Automated checks replace manual reviews that create bottlenecks. Audit trails captured continuously during development become assets during compliance reviews, not reconstructions of work no one documented properly. Systems that validate agent behavior in real time prevent the late-stage discoveries that derail projects entirely.
When compliance is part of how you iterate, it stops being a gate and starts being an accelerator.
The metrics that actually measure iteration performance
Most enterprises are measuring iteration performance with metrics that don’t matter anymore.
Your metrics should directly address why iteration is slower than expected, whether it’s due to infrastructure setup delays, evaluation complexity, governance slowdowns, or tool fragmentation. Generic software development KPIs miss the specific challenges of agentic AI development.
Cost per iteration
Total resource consumption needs to include compute and GPU costs and engineering time. The most expensive part of slow iteration is often the hours spent on infrastructure setup, tool integration, and manual processes. Work that doesn’t improve the agent.
Costs balloon when teams reinvent infrastructure for every new agent, building ad hoc runtimes and duplicating orchestration work across projects.
Cost per iteration drops significantly when governance, evaluation, and infrastructure provisioning are standardized and reusable across the lifecycle rather than rebuilt each cycle.
Time-to-deployment
Code completion to staging is not time-to-deployment. It’s one step in the middle.
Real time-to-deployment starts at business requirement and ends at production impact. The stages in between (evaluation cycles, approval workflows, environment provisioning, and integration testing) are where agentic AI projects lose weeks and months. Measure the full span, or the metric is meaningless.
Faster iteration also reduces risk. Quick cycles surface architectural mistakes early, when course corrections are still inexpensive. Slow cycles surface them late, when the only path forward is reconstruction. Speed and risk management aren’t in tension here. They move together.
Task success rate vs. budget
Traditional performance metrics are meaningless for agentic AI. What finance actually cares about is task success rate. Does your agent complete real workflows end-to-end, and what does that cost?
Tier accuracy by business stakes. Not every workflow deserves all of your most powerful models. Classify tasks by criticality, and set success thresholds based on actual business impact. That gives you a defensible framework when finance questions GPU spend, and a clear rationale for routing routine tasks to smaller, cheaper models.
Model selection, scaling policies, and intelligent routing determine your unit economics. Leaner inference for standard tasks, flexible scaling that adjusts to demand rather than running at maximum, and routing logic that reserves frontier compute for high-stakes workflows — these are the levers that control cost without degrading performance where it matters. Make them tunable and measurable.
Track success-per-dollar weekly and break it down by workflow. Task success rate divided by compute cost is how you demonstrate that iteration cycles are generating returns, not just consuming resources.
Resource utilization rate
Underused compute and storage are a steady drain that most teams don’t measure until the bill arrives. Track resource utilization as a continuous operational metric, not a one-time assessment during project planning.
Faster iteration improves utilization naturally. Workflows spend less time waiting on manual steps, approval processes, and infrastructure provisioning. That idle time costs the same as active compute. Eliminating it compounds the cost savings of every other improvement in this list.
Why enterprise agentic AI programs stall, and how to unblock them
Large enterprises face systemic blockers: governance debt, infrastructure provisioning delays, security review processes, and siloed responsibilities across IT, AI, and DevOps. These blockers get worse when teams build agentic systems on DIY technology stacks, where orchestrating multiple tools and maintaining governance across separate systems adds complexity at every layer.
Sandboxed pilots don’t build organizational confidence
Experiments that don’t face real-world constraints don’t prove anything to stakeholders. Governed pilots do. Visible evaluation results, auditable agent behavior, and documented governance lineage give stakeholders something concrete to evaluate rather than a demo to applaud.
Stakeholders shouldn’t have to take your word that risk is managed. Give them access to evaluation results, agent decision traces, and compliance validation logs. Visibility should be continuous and automatic, not a report you scramble to generate when someone asks.
Clarify roles and responsibilities
Agentic AI creates accountability gaps that traditional software development doesn’t. Who owns the agent logic? The workflow orchestration? The model performance? The runtime infrastructure? When those questions don’t have clear answers, approval cycles slow, and problems become expensive.
Define ownership before it becomes a question. Assign individual points of contact to every component of your agentic AI system, not just team names. Someone specific needs to be accountable for each layer.
Document escalation paths for cross-functional issues. When problems cross boundaries, it needs to be clear who has the authority to act.
Improve tool integration
Disconnected toolchains often cost more than the tools themselves. Rebuilding infrastructure per agent, managing multiple runtimes, manually orchestrating evaluations, and stitching logs across systems creates integration overhead that compounds with every new agent. Most teams don’t measure it systematically, which is why it keeps growing.
The fix isn’t better connectors between broken pieces. It’s unified compute layers, standardized evaluation pipelines, and governance built into the workflow instead of wrapped around it. That’s how you turn integration hours into iteration hours.
Fill in skill gaps
Demoing agentic AI is the easy part. Operationalizing it is where most organizations fall short, and the gap is as much operational as it is technical.
Infrastructure teams need GPU orchestration and model serving expertise that traditional IT backgrounds don’t include. AI practitioners need multi-step workflow evaluation and agent debugging skills that are still emerging across the industry. Governance teams need frameworks validating autonomous systems, not just review model cards.
Cross-train across functions before the skills gap stalls your roadmap. Pair teams on agentic-specific challenges. The organizations that scale agents successfully aren’t the ones that hired the most — they’re the ones that built operational muscle across existing teams.
You can’t hire your way out of a skills gap this broad or this fast-moving. Tooling that abstracts infrastructure complexity lets current teams operate above their current skill level while capabilities mature on both sides.
Turn faster feedback into lasting ROI
Iteration speed is a structural advantage, not a one-time gain. Enterprises that build rapid iteration into their operating model don’t just ship faster — they build capabilities that compound across every future project. Automated evaluation transfers across initiatives. Embedded governance reduces compliance overhead. Integrated lifecycle tooling becomes reusable infrastructure instead of single-use scaffolding.
The result is a flywheel: faster cycles improve predictability, reduce operational drag, and lower costs while increasing delivery pace. Your competitors wrestling with the same bottlenecks project after project aren’t your benchmark. The benchmark is what becomes possible when the loop actually works.
Ready to move from prototype to production? Download “Scaling AI agents beyond PoC” to see how leading enterprises are doing it.
FAQs
Why does iteration speed matter more for agentic AI than traditional ML? Agentic systems are autonomous, multi-step, and action-taking. Failures don’t just result in bad predictions. They can trigger cascading tool calls, cost overruns, or compliance risks. Faster iteration cycles catch architectural, governance, and cost issues before they compound in production.
What is the biggest hidden cost in agentic AI development? It’s not model training. It’s evaluation and debugging. Multi-agent workflows require scenario testing, tracing across systems, and repeated governance checks, which can consume significant GPU hours and engineering time if not automated and streamlined.
Doesn’t faster iteration increase compliance risk? Not if governance is embedded from the start. Continuous validation, automated compliance checks, versioning, and audit trails strengthen governance by catching issues earlier instead of surfacing them at the end of development.
How do you measure whether faster iteration is actually saving money? Track cost per iteration, time-to-deployment (from business requirement to production impact), resource utilization rate, and task success rate divided by compute spend. Those metrics reveal whether each cycle is becoming more efficient and more valuable.
The post The agentic AI cost problem no one talks about: slow iteration cycles appeared first on DataRobot.