#IJCAI2025 distinguished paper: Combining MORL with restraining bolts to learn normative behaviour
Image provided by the authors – generated using Gemini.
For many of us, artificial intelligence (AI) has become part of everyday life, and the rate at which we assign previously human roles to AI systems shows no signs of slowing down. AI systems are the crucial ingredients of many technologies — e.g., self-driving cars, smart urban planning, digital assistants — across a growing number of domains. At the core of many of these technologies are autonomous agents — systems designed to act on behalf of humans and make decisions without direct supervision. In order to act effectively in the real world, these agents must be capable of carrying out a wide range of tasks despite possibly unpredictable environmental conditions, which often requires some form of machine learning (ML) for achieving adaptive behaviour.
Reinforcement learning (RL) [6] stands out as a powerful ML technique for training agents to achieve optimal behaviour in stochastic environments. RL agents learn by interacting with their environment: for every action they take, they receive context-specific rewards or penalties. Over time, they learn behaviour that maximizes the expected rewards throughout their runtime.
Image provided by the authors – generated using Gemini.
RL agents can master a wide variety of complex tasks, from winning video games to controlling cyber-physical systems such as self-driving cars, often surpassing what expert humans are capable of. This optimal, efficient behaviour, however, if left entirely unconstrained, may turn out to be off-putting or even dangerous to the humans it impacts. This motivates the substantial research effort in safe RL, where specialized techniques are developed to ensure that RL agents meet specific safety requirements. These requirements are often expressed in formal languages like linear temporal logic (LTL), which extends classical (true/false) logic with temporal operators, allowing us to specify conditions like “something that must always hold”, or “something that must eventually occur”. By combining the adaptability of ML with the precision of logic, researchers have developed powerful methods for training agents to act both effectively and safely.
However, safety isn’t everything. Indeed, as RL-based agents are increasingly given roles that either replace or closely interact with humans, a new challenge arises: ensuring their behavior is also compliant with the social, legal and ethical norms that structure human society, which often go beyond simple constraints guaranteeing safety. For example, a self-driving car might perfectly follow safety constraints (e.g. avoiding collisions), yet still adopt behaviors that, while technically safe, violate social norms, appearing bizarre or rude on the road, which might cause other (human) drivers to react in unsafe ways.

Norms are typically expressed as obligations (“you must do it”), permissions (“you are permitted to do it”) and prohibitions (“you are forbidden from doing it”), which are not statements that can be true or false, like classical logic formulas. Instead, they are deontic concepts: they describe what is right, wrong, or permissible — ideal or acceptable behaviour, instead of what is actually the case. This nuance introduces several difficult dynamics to reasoning about norms, which many logics (such as LTL) struggle to handle. Even every-day normative systems like driving regulations can feature such complications; while some norms can be very simple (e.g., never exceed 50 kph within city limits), others can be more complex, as in:
- Always maintain 10 meters between your vehicle and the vehicles in front of and behind you.
- If there are less than 10 meters between you and the vehicle behind you, you should slow down to put more space between yourself and the vehicle in front of you.
(2) is an example of a contrary-to-duty obligation (CTD), an obligation you must follow specifically in a situation where another primary obligation (1) has already been violated to, e.g., compensate or reduce damage. Although studied extensively in the fields of normative reasoning and deontic logic, such norms can be problematic for many basic safe RL methods based on enforcing LTL constraints, as was discussed in [4].
However, there are approaches for safe RL that show more potential. One notable example is the Restraining Bolt technique, introduced by De Giacomo et al. [2]. Named after a device used in the Star Wars universe to curb the behavior of droids, this method influences an agent’s actions to align with specified rules while still allowing it to pursue its goals. That is, the restraining bolt modifies the behavior an RL agent learns so that it also respects a set of specifications. These specifications, expressed in a variant of LTL (LTLf [3]), are each paired with its own reward. The central idea is simple but powerful: along with the rewards the agent receives while exploring the environment, we add an additional reward whenever its actions satisfy the corresponding specification, nudging it to behave in ways that align with individual safety requirements. The assignment of specific rewards to individual specifications allows us to model more complicated dynamics like, e.g., CTD obligations, by assigning one reward for obeying the primary obligation, and a different reward for obeying the CTD obligation.
Still, issues with modeling norms persist; for example, many (if not most) norms are conditional. Consider the obligation stating “if pedestrians are present at a pedestrian crossing, THEN the nearby vehicles must stop”. If an agent were rewarded every time this rule was satisfied, it would also receive rewards in situations where the norm is not actually in force. This is because, in logic, an implication holds also when the antecedent (“pedestrians are present”) is false. As a result, the agent is rewarded whenever pedestrians are not around, and might learn to prolong its runtime in order to accumulate these rewards for effectively doing nothing, instead of efficiently pursuing its intended task (e.g., reaching a destination). In [5] we showed that there are scenarios where an agent will either ignore the norms, or learn this “procrastination” behavior, no matter which rewards we choose. As a result, we introduced Normative Restraining Bolts (NRBs), a step forward toward enforcing norms in RL agents. Unlike the original Restraining Bolt, which encouraged compliance by providing additional rewards, the normative version instead punishes norm violations. This design is inspired by the Andersonian view of deontic logic [1], which treats obligations as rules whose violation necessarily triggers a sanction. Thus, the framework no longer relies on reinforcing acceptable behavior, but instead enforces norms by guaranteeing that violations carry tangible penalties. While effective for managing intricate normative dynamics like conditional obligations, contrary-to-duties, and exceptions to norms, NRBs rely on trial-and-error reward tuning to implement norm adherence, and therefore can be unwieldy, especially when trying to resolve conflicts between norms. Moreover, they require retraining to accommodate norm updates, and do not lend themselves to guarantees that optimal policies minimize norm violations.
Our contribution
Building on NRBs, we introduce Ordered Normative Restraining Bolts (ONRBs), a framework for guiding reinforcement learning agents to comply with social, legal, and ethical norms while addressing the limitations of NRBs. In this approach, each norm is treated as an objective in a multi-objective reinforcement learning (MORL) problem. Reformulating the problem in this way allows us to:
- Prove that when norms do not conflict, an agent who learns optimal behaviour will minimize norm violations over time.
- Express relationships between norms in terms of a ranking system describing which norm should be prioritized when a conflict occurs.
- Use MORL techniques to algorithmically determine the necessary magnitude of the punishments we assign such that it is guarantied that so long as an agent learns optimal behaviour, norms will be violated as little as possible, prioritizing the norms with the highest rank.
- Accommodate changes in our normative systems by “deactivating” or “reactivating” specific norms.
We tested our framework in a grid-world environment inspired by strategy games, where an agent learns to collect resources and deliver them to designated areas. This setup allows us to demonstrate the framework’s ability to handle the complex normative scenarios we noted above, along with direct prioritization of conflicting norms and norm updates. For instance, the figure below

displays how the agent handles norm conflicts, when it is both obligated to (1) avoid the dangerous (pink) areas, and (2) reach the market (blue) area by a certain deadline, supposing that the second norm takes priority. We can see that it chooses to violate (1) once, because otherwise it will be stuck at the beginning of the map, unable to fulfill (2). Nevertheless, when given the possibility to violate (1) once more, it chooses the compliant path, even though the violating path would allow it to collect more resources, and therefore more rewards from the environment.
In summary, by combining RL with logic, we can build AI agents that do not just work, they work right.
This work won a distinguished paper award at IJCAI 2025. Read the paper in full: Combining MORL with restraining bolts to learn normative behaviour, Emery A. Neufeld, Agata Ciabattoni and Radu Florin Tulcan.
Acknowledgements
This research was funded by the Vienna Science and Technology Fund (WWTF) project ICT22-023 and the Austrian Science Fund (FWF) 10.55776/COE12 Cluster of Excellence Bilateral AI.
References
[1] Alan Ross Anderson. A reduction of deontic logic to alethic modal logic. Mind, 67(265):100–103, 1958.
[2] Giuseppe De Giacomo, Luca Iocchi, Marco Favorito, and Fabio Patrizi. Foundations for restraining bolts: Reinforcement learning with LTLf/LDLf restraining specifications. In Proceedings of the international conference on automated planning and scheduling, volume 29, pages 128–136, 2019.
[3] Giuseppe De Giacomo and Moshe Y Vardi. Linear temporal logic and linear dynamic logic on finite traces. In IJCAI, volume 13, pages 854–860, 2013.
[4] Emery Neufeld, Ezio Bartocci, and Agata Ciabattoni. On normative reinforcement learning via safe reinforcement learning. In PRIMA 2022, 2022.
[5] Emery A Neufeld, Agata Ciabattoni, and Radu Florin Tulcan. Norm compliance in reinforcement learning agents via restraining bolts. In Legal Knowledge and Information Systems JURIX 2024, pages 119–130. IOS Press, 2024.
[6] Richard S. Sutton and Andrew G. Barto. Reinforcement learning – an introduction. Adaptive computation and machine learning. MIT Press, 1998.
A robot learns to handle bulky objects like humans do after just one lesson
Beyond Industrial Arms: How Service Robots Will Become Everyday Infrastructure by 2034
Physical AI uses both sight and touch to manipulate objects like a human
Not all AI gateways are built for agentic AI. Here’s how to tell.
Agentic AI is here, and the pace is picking up. Like elite cycling teams, the enterprises pulling ahead are the ones that move fast together, without losing balance, visibility, or control.
That kind of coordinated speed doesn’t happen by accident.
In our last post, we introduced the concept of an AI gateway: a lightweight, centralized system that sits between your agentic AI applications and the ecosystem of tools they rely on — APIs, infrastructure, policies, and platforms. It keeps those components decoupled and easier to secure, manage, and evolve as complexity grows.
In this post, we’ll show you how to spot the difference between a true AI gateway and just another connector — and how to evaluate whether your architecture can scale agentic AI without introducing risk.
Self-assess your AI maturity
In elite cycling, like the Tour de France, no one wins alone. Success depends on coordination: specialized riders, support staff, strategy teams, and more, all working together with precision and speed.
The same applies to agentic AI.
The enterprises pulling ahead are the ones that move fast together. Not just experimenting, but scaling with control.
So where do you stand?
Think of this as a quick checkup. A way to assess your current AI maturity and spot the gaps that could slow you down:
- Solo riders: You’re experimenting with generative AI tools, but efforts are isolated and disconnected.
- Race teams: You’ve started coordinating tools and workflows, but orchestration is still patchy.
- Tour-level teams: You’re building scalable, adaptive systems that operate in sync across the organization.
If you are aiming for that top tier – not just running proofs of concept, but deploying agentic AI at scale — your AI gateway becomes mission-critical.
Because at that level, chaos doesn’t scale. Coordination does.
And that coordination depends on three core capabilities: abstraction, control and agility.
Let’s take a closer look at each.
Abstraction: coordination without constraint
In elite cycling, every rider has a specialized role. There are sprinters, climbers, and support riders, each with a distinct job. But they all train and race within a shared system that synchronizes nutrition plans, coaching strategies, recovery protocols, and race-day tactics.
The system doesn’t constrain performance. It amplifies it. It allows each athlete to adapt to the race without losing cohesion across the team.
That’s the role abstraction plays in an AI gateway.
It creates a shared structure for your agents to operate in without tethering them to specific tools, vendors, or workflows. The abstraction layer decouples brittle dependencies, allowing agents to coordinate dynamically as conditions change.
What abstraction looks like in an AI gateway
LLMs, vector databases, orchestrators, APIs, and legacy tools are unified under a shared interface, without forcing premature standardization. Your system stays tool-agnostic — not locked into any one vendor, version, or deployment model.
Agents adapt task flow based on real-time inputs like cost, policy, or performance, instead of brittle routes hard-coded to a specific tool. This flexibility enables smarter routing and more responsive decisions, without bloating your architecture.
The result is architectural flexibility without operational fragility. You can test new tools, upgrade components, or replace systems entirely without rewriting everything from scratch. And because coordination happens within a shared abstraction layer, experimentation at the edge doesn’t compromise core system stability.
Why it matters for AI leaders
Tool-agnostic design reduces vendor lock-in and unnecessary duplication. Workflows stay resilient even as teams test new agents, infrastructure evolves, or business priorities shift.
Abstraction lowers the cost of change — enabling faster experimentation and innovation without rework.
It’s what lets your AI footprint grow without your architecture becoming rigid or fragile.
Abstraction gives you flexibility without chaos; cohesion without constraint.
Control: manage agentic AI without touching every tool
In the Tour de France, the team director isn’t on the bike, but they’re calling the shots. From the car, they monitor rider stats, weather updates, mechanical issues, and competitor moves in real time.
They adjust strategy, issue commands, and keep the entire team moving as one.
That’s the role of the control layer in an AI gateway.
It gives you centralized oversight across your agentic AI system — letting you respond fast, enforce policies consistently, and keep risk in check without managing every agent or integration directly.
What control looks like in an AI gateway
From one place, you define and enforce policies across tools, teams, and environments.
Role-based access controls (RBAC) are consistent, and approvals follow structured workflows that support scale.
Compliance with standards like GDPR, HIPAA, NIST, and the EU AI Act is built in.
Audit trails and explainability are embedded from the start, versus being bolted on later.
Observability that does more than watch
With observability built into your agentic system, you’re not guessing. You’re seeing agent behavior, task execution, and system performance in real time. Drift, failure, or misuse is detected immediately, not days later.
Alerts and automated diagnostics reduce downtime and eliminate the need for manual root-cause hunts. Patterns across tools and agents become visible, enabling faster decisions and continuous improvement.
Security that scales with complexity
As agentic systems grow, so do the attack surfaces. A robust control layer lets you secure the system at every level, not just at the edge, applying layered defenses like red teaming, prompt injection protection, and content moderation. Access is tightly governed, with controls enforced at both the model and tool level.
These safeguards are proactive, built to detect and contain risky or unreliable agent behavior before it spreads.
Because the more agents you run, the more important it is to know they’re operating safely without slowing you down.
Cost control that scales with you
With full visibility into compute, API usage, and LLM consumption across your stack, you can catch inefficiencies early and act before costs spiral.
Usage thresholds and metering help prevent runaway spend before it starts. You can set limits, monitor consumption in real time, and track how usage maps to specific teams, tools, and workflows.
Built-in optimization tools help manage cost-to-serve without compromising on performance. It’s not just about cutting costs — it’s about making sure every dollar spent delivers value.
Why it matters for AI leaders
Centralized governance reduces the risk of policy gaps and inconsistent enforcement.
Built-in metering and usage tracking prevent overspending before it starts, turning control into measurable savings.
Visibility across all agentic tools supports enterprise-grade observability and accountability.
Shadow AI, fragmented oversight, and misconfigured agents are surfaced and addressed before they become liabilities.
Audit readiness is strengthened, and stakeholder trust is easier to earn and maintain.
And when governance, observability, security, and cost control are unified, scale becomes sustainable. You can extend agentic AI across teams, geographies, and clouds — fast, without losing control.
Agility: adapt without losing momentum
When the unexpected happens in the Tour de France – a crash in the peloton, a sudden downpour, a mechanical failure — teams don’t pause to replan. They adjust in motion. Bikes are swapped. Strategies shift. Riders surge or fall back in seconds.
That kind of responsiveness is what agility looks like. And it’s just as critical in agentic AI systems.
What agility looks like in an AI gateway
Agile agentic systems aren’t brittle. You can swap an LLM, upgrade an orchestrator, or re-route a workflow without causing downtime or requiring a full rebuild.
Policies update across tools instantly. Components can be added or removed with zero disruption to the agents still operating. Workflows continue executing smoothly, because they’re not hardwired to any one tool or vendor.
And when something breaks or shifts unexpectedly, your system doesn’t stall. It adjusts, just like the best teams do.
Why it matters for AI leaders
Rigid systems come at a high price. They delay time-to-value, inflate rework, and force teams to pause when they should be shipping.
Agility changes the equation. It gives your teams the freedom to adjust course — whether that means pivoting to a new LLM, responding to policy changes, or swapping tools midstream — without rewriting pipelines or breaking stability.
It’s not just about keeping pace. Agility future-proofs your AI infrastructure, helping you respond to the moment and prepare for what’s next.
Because the moment the environment shifts — and it will — your ability to adapt becomes your competitive edge.
The AI gateway benchmark
A true AI gateway isn’t just a pass-through or a connector. It’s a critical layer that lets enterprises build, operate, and govern agentic systems with clarity and control.
Use this checklist to evaluate whether a platform meets the standard of a true AI gateway.
Abstraction
Can it decouple workflows from tooling? Can your system stay modular and adaptable as tools evolve?
Control
Does it provide centralized visibility and governance across all agentic components?
Agility
Can you adjust quickly — swapping tools, applying policies, or scaling — without triggering risk or rework?
This isn’t about checking boxes. It’s about whether your AI foundation is built to last.
Without all three, your stack becomes brittle, risky, and unsustainable at scale. And that puts speed, safety, and strategy in jeopardy.
(CTA)Want to build scalable agentic AI systems without spiraling cost or risk? Download the Enterprise guide to agentic AI.
The post Not all AI gateways are built for agentic AI. Here’s how to tell. appeared first on DataRobot.
AI-equipped aerial robots help track and model wildfire smoke
Bumblebee X Powers Taiga Robotics’ Mining Automation with AI-Driven Vision
Researchers are teaching robots to walk on Mars from the sand of New Mexico
Scientists and robot at White Sands National Park.
By Sean Nealon
Researchers are closer to equipping a dog-like robot to conduct science on the surface of Mars after five days of experiments this month at White Sands National Park in New Mexico.
The national park is serving as a Mars analog environment and the scientists are conducting field test scenarios to inform future Mars operations with astronauts, dog-like robots known as quadruped robots, rovers and scientists at Mission Control on Earth. The work builds on similar experiments by the team with the same robot on the slopes of Mount Hood in Oregon, which simulated the landscape on the Moon.
“Our group is very committed to putting quadrupeds on the Moon and on Mars,” said Cristina Wilson, a robotics researcher in the College of Engineering at Oregon State University. “It’s the next frontier and takes advantage of the unique capabilities of legged robots.”
The NASA-funded project supports the agency’s Moon to Mars program, which is developing the tools for long-term lunar exploration and future crewed missions to Mars. It builds on research that has enabled NASA to send rovers and a helicopter to Mars.
The LASSIE Project: Legged Autonomous Surface Science in Analog Environments includes engineers, cognitive scientists, geoscientists and planetary scientists from Oregon State, the University of Southern California, Texas A&M University, the Georgia Institute of Technology, the University of Pennsylvania, Temple University and NASA Johnson Space Center.
The field work this month at White Sands was the second time the research team visited the national park. They made the initial trip in 2023 and also made trips in 2023 and 2024 to Mount Hood. During these field sessions, the scientists gather data from the feet of the quadruped robots, which can measure mechanical responses to foot-surface interactions.
“In the same way that the human foot standing on ground can sense the stability of the surface as things shift, legged robots are capable of potentially feeling the exact same thing,” Wilson said. “So each step the robot takes provides us information that will help its future performance in places like the Moon or Mars.”
Quadruped robot.
The conditions at White Sands this month were challenging. Triple-digit high temperatures meant the team started field work at sunrise and wrapped by late morning because of the rising heat index and its impact on the researchers and the power supply to the robots.
But the team made important progress. Improvements to the algorithms they have refined in recent years led for the first time to the robot acting autonomously and making its own decisions.
This is important, Wilson noted, because in a scenario where the quadruped would be on the surface of Mars with an astronaut, it would allow both the robot and the astronaut to act independently, increasing the amount of scientific work that could be accomplished.
They also tested advances they have made in developing different ways for the robot to move depending on surface conditions, which could lead to increased energy efficiency, Wilson said.
“There is certainly a lot more research to do, but these are important steps in realizing the goal of sending quadrupeds to the Moon and Mars,” Wilson said.
Other leaders of the project include Feifei Qian, USC; Ryan Ewing and Kenton Fisher, NASA Johnson Space Center; Marion Nachon, Texas A&M; Frances Rivera-Hernández, Georgia Tech; Douglas Jerolmack and Daniel Koditschek, University of Pennsylvania; and Thomas Shipley, Temple University.
The research is funded by the NASA Planetary Science and Technology through Analog Research (PSTAR) program, and Mars Exploration Program.
Snap-through effect helps engineers solve soft material motion trade-off
Humanoid robots showcase skills at Ancient Olympia. But they’re on a long road to catch up to AI
Grammarly Gets Serious Chops As Writing Tool
Best known as a proofreading and editing solution, Grammarly has repositioned itself as a full-fledged AI writer.
Essentially, the tool has been significantly expanded with a new document editor designed to nurture an idea into a full-blown article, blog post, report and similar – with the help of a number of AI agents.
Dubbed Grammarly ‘Docs,’ the AI writer promises to amplify your idea every step of the way – without stepping on your unique voice.
In other news and analysis on AI writing:
*Now You can Auto-Write Your Gmails Inside ChatGPT: AI expert Matt Paiva has figured-out a way to use ChatGPT to auto-write emails for Gmail – without ever leaving the ChatGPT interface.
An incredible time-saver, Paiva’s method is detailed step-by-step in this YouTube video, which capitalizes on ChatGPT’s new ability to make direct connections with a number of outside apps now.
One caveat: If you’re a novice, you may want to play this fast-paced tutorial a few times to get what’s going on – but even so, the juice is worth the squeeze.
*AI Agent-Driven Email Arrives: 6sense has released a new email marketing suite that uses AI agents to drive the email marketing process.
The idea: Use AI agents to write all the marketing emails, send and follow-up, read/analyze replies, respond accordingly – and then route hot leads to sales reps as soon as those manifest.
While such automation has been around for a while, it will be interesting to see if 6sense’s decision to ‘agentify’ the process brings significant new gains.
*Discount Version of ChatGPT Released in India: Fans of ChatGPT in India now have a tier level they can call their own – dubbed ChatGPT Go – that costs less than $US5 / month.
Essentially, subscribers get 10 times more message and image generating capability with Go as compared to ChatGPT Free.
ChatGPT’s maker is experimenting with the discount version in India only, with an eye towards offering the new tier in other countries if it makes sense.
*AI Writing Comes to WhatsApp: Users of the wildly popular WhatsApp now have a new AI writer.
Dubbed ‘Writing Help,’ the new tool is designed to help users draft error-free messages so they can respond even more quickly to family, friends and colleagues.
Writing Help also offers users the ability to send messages in various styles, including professional, funny or supportive.
*Top Ten AI Reworders: Technically, AI chatbots/writers like ChatGPT already have the ability to reword your text in all sorts of ways.
You simply need to describe the kind of writing you’re looking for (such witty, button-downed, ‘out there,’ etc.) ask ChatGPT to rewrite in that style and you’re done.
Even so, there are tools specially designed to reword your text — and writer Alicia Keller offers an excellent rundown on what’s available.
*Google’s Upgraded AI Image Generator Turning Heads: Google is out with a new version of its image generator with an exceedingly powerful new feature: The ability to faithfully replicate a person’s face/body, no matter how many times you edit that image.
The capability is perfect for someone who is trying to touch-up their headshot, for example, and wants to experiment with all sorts of effects while ensuring that their image an exact replica of who they are.
Until now, AI image generators were never able to stay true to the image of a person and instead churned-out images that only “sorta, kinda” looked like the person in the original image the generator was working with.
*Time Magazine Releases Its Top 100 People in AI: Time has released its own take on the top movers and shakers in AI, dubbed “TIME100 AI.”
Many of the names AI insiders would expect are on there.
But there are a few surprises, including Pope Leo XIV.
*ChatGPT Voice Tech Gets a Polish: Users who prefer interacting with AI via voice should ultimately be more pleased with that mode in months to come.
The reason: ChatGPT’s maker has introduced an upgrade to the underlying technology and released it to software developers.
In a perfect world, that will mean more AI apps coming down the pipeline that work with voice even better than they do now.
*AI BIG PICTURE: Stanford University Study: AI Making It Tougher for Young People to Find Jobs: Turns-out all those dire warnings about AI vacuuming up jobs are becoming reality.
A new study from Stanford finds AI is taking entry level jobs from young people, 22-25 – especially those looking to work in software engineering or customer service.
Observes writer Nick Lichtenberg: “The analysis revealed a 13% relative decline in employment for early-career workers in the most AI-exposed jobs since the widespread adoption of generative-AI tools.”

Share a Link: Please consider sharing a link to https://RobotWritersAI.com from your blog, social media post, publication or emails. More links leading to RobotWritersAI.com helps everyone interested in AI-generated writing.
–Joe Dysart is editor of RobotWritersAI.com and a tech journalist with 20+ years experience. His work has appeared in 150+ publications, including The New York Times and the Financial Times of London.
The post Grammarly Gets Serious Chops As Writing Tool appeared first on Robot Writers AI.