Page 1 of 3
1 2 3

#ICML2025 outstanding position paper: Interview with Jaeho Kim on addressing the problems with conference reviewing

At this year’s International Conference on Machine Learning (ICML2025), Jaeho Kim, Yunseok Lee and Seulki Lee won an outstanding position paper award for their work Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards. We hear from Jaeho about the problems they were trying to address, and their proposed author feedback mechanism and reviewer reward system.

Could you say something about the problem that you address in your position paper?

Our position paper addresses the problems plaguing current AI conference peer review systems, while also raising questions about the future direction of peer review.

The imminent problem with the current peer review system in AI conferences is the exponential growth in paper submissions driven by increasing interest in AI. To put this with numbers, NeurIPS received over 30,000 submissions this year, while ICLR saw a 59.8% increase in submissions in just one year. This huge increase in submissions has created a fundamental mismatch: while paper submissions grow exponentially, the pool of qualified reviewers has not kept pace.

Submissions to some of the major AI conferences over the past few years.

This imbalance has severe consequences. The majority of papers are no longer receiving adequate review quality, undermining peer review’s essential function as a gatekeeper of scientific knowledge. When the review process fails, inappropriate papers and flawed research can slip through, potentially polluting the scientific record.

Considering AI’s profound societal impact, this breakdown in quality control poses risks that extend far beyond academia. Poor research that enters the scientific discourse can mislead future work, influence policy decisions, and ultimately hinder genuine knowledge advancement. Our position paper focuses on this critical question and proposes methods on how we can enhance the quality of review, thus leading to better dissemination of knowledge.

What do you argue for in the position paper?

Our position paper proposes two major changes to tackle the current peer review crisis: an author feedback mechanism and a reviewer reward system.

First, the author feedback system enables authors to formally evaluate the quality of reviews they receive. This system allows authors to assess reviewers’ comprehension of their work, identify potential signs of LLM-generated content, and establish basic safeguards against unfair, biased, or superficial reviews. Importantly, this isn’t about penalizing reviewers, but rather creating minimal accountability to protect authors from the small minority of reviewers who may not meet professional standards.

Second, our reviewer incentive system provides both immediate and long-term professional value for quality reviewing. For short-term motivation, author evaluation scores determine eligibility for digital badges (such as “Top 10% Reviewer” recognition) that can be displayed on academic profiles like OpenReview and Google Scholar. For long-term career impact, we propose novel metrics like a “reviewer impact score” – essentially an h-index calculated from the subsequent citations of papers a reviewer has evaluated. This treats reviewers as contributors to the papers they help improve and validates their role in advancing scientific knowledge.

Could you tell us more about your proposal for this new two-way peer review method?

Our proposed two-way peer review system makes one key change to the current process: we split review release into two phases.

The authors’ proposed modification to the peer-review system.

Currently, authors submit papers, reviewers write complete reviews, and all reviews are released at once. In our system, authors first receive only the neutral sections – the summary, strengths, and questions about their paper. Authors then provide feedback on whether reviewers properly understood their work. Only after this feedback do we release the second part containing weaknesses and ratings.

This approach offers three main benefits. First, it’s practical – we don’t need to change existing timelines or review templates. The second phase can be released immediately after the authors give feedback. Second, it protects authors from irresponsible reviews since reviewers know their work will be evaluated. Third, since reviewers typically review multiple papers, we can track their feedback scores to help area chairs identify (ir)responsible reviewers.

The key insight is that authors know their own work best and can quickly spot when a reviewer hasn’t properly engaged with their paper.

Could you talk about the concrete reward system that you suggest in the paper?

We propose both short-term and long-term rewards to address reviewer motivation, which naturally declines over time despite starting enthusiastically.

Short-term: Digital badges displayed on reviewers’ academic profiles, awarded based on author feedback scores. The goal is making reviewer contributions more visible. While some conferences list top reviewers on their websites, these lists are hard to find. Our badges would be prominently displayed on profiles and could even be printed on conference name tags.
Example of a badge that could appear on profiles.

Long-term: Numerical metrics to quantify reviewer impact at AI conferences. We suggest tracking measures like an h-index for reviewed papers. These metrics could be included in academic portfolios, similar to how we currently track publication impact.

The core idea is creating tangible career benefits for reviewers while establishing peer review as a professional academic service that rewards both authors and reviewers.

What do you think could be some of the pros and cons of implementing this system?

The benefits of our system are threefold. First, it is a very practical solution. Our approach doesn’t change current review schedules or review burdens, making it easy to incorporate into existing systems. Second, it encourages reviewers to act more responsibly, knowing their work will be evaluated. We emphasize that most reviewers already act professionally – however, even a small number of irresponsible reviewers can seriously damage the peer review system. Third, with sufficient scale, author feedback scores will make conferences more sustainable. Area chairs will have better information about reviewer quality, enabling them to make more informed decisions about paper acceptance.

However, there is strong potential for gaming by reviewers. Reviewers might optimize for rewards by giving overly positive reviews. Measures to counteract these problems are definitely needed. We are currently exploring solutions to address this issue.

Are there any concluding thoughts you’d like to add about the potential future
of conferences and peer-review?

One emerging trend we’ve observed is the increasing discussion of LLMs in peer review. While we believe current LLMs have several weaknesses (e.g., prompt injection, shallow reviews), we also think they will eventually surpass humans. When that happens, we will face a fundamental dilemma: if LLMs provide better reviews, why should humans be reviewing? Just as the rapid rise of LLMs caught us unprepared and created chaos, we cannot afford a repeat. We should start preparing for this question as soon as possible.

About Jaeho

Jaeho Kim is a Postdoctoral Researcher at Korea University with Professor Changhee Lee. He received his Ph.D. from UNIST under the supervision of Professor Seulki Lee. His main research focuses on time series learning, particularly developing foundation models that generate synthetic and human-guided time series data to reduce computational and data costs. He also contributes to improving the peer review process at major AI conferences, with his work recognized by the ICML 2025 Outstanding Position Paper Award.

Read the work in full

Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards, Jaeho Kim, Yunseok Lee, Seulki Lee.

RoboCup@Work League: Interview with Christoph Steup

RoboCup@Work League teams at the event in Brazil.

RoboCup is an international scientific initiative with the goal of advancing the state of the art of intelligent robots, AI and automation. The annual RoboCup event, where teams gather from across the globe to take part in competitions across a number of leagues, this year took place in Salvador, Brazil from 15-21 July. In a series of interviews, we’ve been meeting some of the RoboCup trustees, committee members, and participants, to find out more about their respective leagues. Christoph Steup is an Executive Committee member and oversees the @Work League. Ahead of the event in Brazil, we spoke to Christoph to find out more about the @Work League, the tasks that teams need to complete, and future plans for the League.

Could you start by giving us an introduction to the @Work league?

The @Work League, along with the Logistics League, forms the Industrial League. Our goal is to mimic some of the aspects of industrial production systems. An important aspect of this is factory automization and trying to mimic the factory of the future, where you have autonomous robots building products according to customer design. In these factories of the future, a single piece would be produced individually for each customer. Factories nowadays have big conveyor belts and a lot of automization, with the tasks mostly done in the same way, and you can only build stuff efficiently if you build millions of items. We are working on building individual pieces, where automization is still possible, and even a single piece can be built effectively. But obviously, in our RoboCup competitions, we are not interested in building on a factory scale – we are doing it on a very small scale. That means our robots are typically 80 centimeters long, the largest are around 70 centimeters wide, and some of them are also 80 centimeters high. So let’s say they fit in a one metre cubed box. Also, all our operations are done on the ground. This is just for simplification because building big tables to make it more realistic would also increase the cost for RoboCup and wouldn’t give much additional value.

What our robots need to do is transport objects from different workstations. So we have a default configuration where the arena starts, and there are workstations with objects lying on them, and some of these objects need to be transported to other workstations. The robot needs to do that completely autonomously. So this is one of the special things about the @Work League, that it’s completely autonomous and there is only a single restart allowed per team. So that means the robot really needs to be reliable. One of the big differences between medium teams and very good teams is that the very good teams perform well all the time whereas the medium teams have some good runs and some bad runs.

As well as the object transportation that you mentioned, and there other tasks that the teams need to carry out?

There are some special tasks in our league, like the precision placement task where the robot needs to fit an object into a cavity that is essentially the same shape and size as the object. It’s a little bit like the game that babies do to train their dexterity.

We also have a task that is inspired by a conveyor belt, but we are using a table that is constantly turning. The robots need to grasp stuff while the table is turning. This looks a little bit silly because no one would actually put a rotating table in a factory, however this is our way of actually mimicking a conveyor belt. The conveyor belt itself would be really, really difficult to integrate into the competition, so we just abstracted that and use this rotating table to actually have the same challenge but in a more manageable way. And it’s still a very, very hard challenge.

Then there are some special challenges that we integrate. For example, that robots need to report their state back so that we can observe what the robot is doing. We also have a challenge where humans are in the loop. For example, the robot brings pieces to a certain workstation where a human is present, the human assembles the pieces, and then the human needs to give a sign to the robot, and then the robot will take the piece away and put it somewhere else. This is designed to really mimic the automated factory flow.

In the past we also had a challenge where the robots needed to open a drawer, take something out and then close the drawer again. We’ve also had tasks where the robot has to handle fragile objects, like sweets, where the robot really needed to be careful in manipulating them. So in general, what differentiates us most from the Logistics League is that we are focusing a lot on manipulation and all the difficulties that come with manipulation and unknown objects, whereas Logistics is more tailored towards large-scale logistics processes with all their optimization and planning.

I was lucky enough to attend RoboCup last year in Eindhoven and what the teams were doing was really impressive. It was also interesting to see how varied the robots were, and how teams were approaching the tasks in distinct ways, with different grabbers and so on.

Yes, this difference in approaches is related to the history of our League, which is a little bit similar to the Logistics League. The Logistics League was originally a sponsored demonstration by Festo, which is a large company from Germany that creates tools, but they also have a didactics area where they provide tools to help people understand factory optimization. The @Work League was sponsored by Kuka, the robotics company, and, in the beginning, they required every team to compete with the Kuka youBot. So this was pretty much the default platform for our league, but at some point Kuka dropped from a sponsor to just an advisor to the league, and nowadays they are not part of the league at all. So when the Kuka youBot was going out of commission, the teams searched for alternatives and now we are presented with a wide variety of robots that are competing in the league, which I personally find really cool. Now we have all these different robots, all these different approaches, and some work better in some scenarios and worse in others. So we really have a scientific approach to the problem and we are really getting some insights into how you can tackle this problem on multiple levels.

Have you noticed that some of the challenges specifically are more difficult in general for all the teams?

In 2018 we introduced a challenge of so-called arbitrary surfaces which are surfaces unknown to the teams that are put on top of the workstations. The teams need to be able to deal with these surfaces. There are two surfaces that are really, really awful for the teams – one is grass that we pretty much stole from the soccer competitions! We just thought it would be funny to try it, and it was a really interesting problem, especially for some of the grippers of the different teams. For example, the current world champion, they have a rigid gripper so they can have force feedback when they grasp. However, the grass is really difficult for them – because of their rigidity, they always grasp the grass itself and then they pull up the grass with the object. And this led to some interesting problems, like they are transporting the actual surface around and not only the object. This isn’t a problem for teams using a flexible gripper. However, on the other hand, the flexible gripper makes it really difficult to assess if you have grasped the object because you have very bad force feedback. So there are two different approaches that have their pros and cons in different scenarios.

Three robots from the @Work competition in Brazil.

Are you introducing any new tasks for this year?

Yes and no. So actually we are introducing a completely new challenge which is different from what we’ve done before. The new challenge is the so-called smart farming challenge, which is opening our league to a whole new field of applications, because we are now looking at agriculture. We are working on this with Studica, a robotics company from Canada, whose hardware we are using. We’ve already given it a try at the German Open. This challenge comes with some new specifics and one these is that the teams only get the robot shortly before the competition. So they don’t really know the robot largely in advance, and they need to assemble, program and design the robot in a very short time. To compensate for this, we reduce the amount of optimization and robustness that is necessary. Because it’s an agriculture setting, we have different objects, like fruits, that the teams need to handle. This makes it a little bit more complicated because fruits have a more arbitrary shape and different levels of ripeness that need to be detected. We also have some grapes that are hanging on a wall, which is a completely different kind of manipulation task than before, because before the teams just needed to grasp things from surfaces, but now they really need to pluck stuff from a wall in a reliable way. So this is the new challenge. It also comes with a lot of software challenges because the computational power of this robot is very limited because it only has a Raspberry Pi. For example, there is not a lot of image processing possible, especially compared to the current robots, some of which even have GPUs embedded.

Is there a particular part of the hardware or software that you’ve seen some of the biggest developments in over the last year or so?

Yeah, I think one big change I observed over the years was a switch from custom neural networks for object detection to off-the-shelf components. So pretty much all teams nowadays use YOLO networks, which you can get pre-trained and they just deploy them on GPUs that they embedded into their robots. This is also one of the reasons why the robots really grew in size over the last few years because they needed space for the larger computational power. This actually made it possible for a lot of teams to reliably detect the objects. Object detection was a big problem in the beginning of the league and nowadays it’s not really a big issue – most teams are really good at that. Sometimes they are a little bit startled with decoy objects – these are objects in the arena that are not really part of the task, and they are unknown to the teams beforehand. Sometimes they are, let’s say, evil decoys that look like an object and there’s some mismatches that the teams do, but this is becoming very rare.

I think the second big change is a switch to larger manipulators with more degrees of freedom. So in the beginning, everyone had a very small manipulator with only five degrees of freedom, which limited the operating range, and nowadays pretty much all teams have a six degree of freedom manipulator with a large range. This means that they don’t need to move their robot when they are in front of a workstation, which makes them much faster and also much more precise.

Could you talk about the future plans for the League?

There are a few things we are thinking about.

With regards to the competition itself, we had a discussion with the teams about what they are interested in doing in the future. Two things came up that they really want to have. One is mobile obstacles, so they want other objects to move autonomously through the arena. We’re in the process of creating that, in cooperation with EduArt, which is a company from Germany that also provides small educational robots. And the second thing we want to introduce is a kind of humanoid robot that the teams can use to actually handle special manipulation tasks that cannot be done simply with a robot manipulator.

In terms of creating an entry-level League, we have been working on this and one potential idea is to use the smart-farming challenge as the entry point. Through the collaboration with Studica, we can provide teams with the robot and they get to keep it after the competition. At the German Open I spoke to the Rapidly-Manufactured Rescue League about this crossover between the Rescue and the Junior Leagues, and they are very keen to collaborate.

We’re also talking with the Logistics League. Their sponsor, Festo, has dropped out of their league and now they need to reorganize. We are wondering if it would be worthwhile to bring our leagues closer together, or even fuse them together to a single RoboCup Industrial league. The Logistics League wants to do more manipulation, and the @Work League wants to do more planning, so we are closing the gap naturally between the two. However, this is just a thought at the moment – we need to see how the teams react to that.

About Christoph

Christoph Steup is an active researcher specializing in various fields of robotics, including swarm robotics, precision farming, and weather-resilient autonomous driving. He currently works at the Fraunhofer Institute for Transportation and Infrastructure Systems (IVI), where he leads the Swarm Technology Group. Prior to this role, he headed the Computational Intelligence in Robotics group at the Otto von Guericke University Magdeburg. Christoph’s involvement with RoboCup began in 2015 when he joined the robOTTO team of Otto von Guericke University as team leader. His contributions to the RoboCup community expanded as he became a member of the Technical Committee for the @Work League in 2017. In 2019, he further advanced his participation by joining the Executive Committee of the league.

Interview with Haimin Hu: Game-theoretic integration of safety, interaction and learning for human-centered autonomy

In this interview series, we’re meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. In this latest interview, Haimin Hu tells us about his research on the algorithmic foundations of human-centered autonomy and his plans for future projects, and gives some advice for PhD students looking to take the next step in their career.

Could you give us an overview of the research you carried out during your PhD?

My PhD research, conducted under the supervision of Professor Jaime Fernández Fisac in the Princeton Safe Robotics Lab, focuses on the algorithmic foundations of human-centered autonomy. By integrating dynamic game theory with machine learning and safety-critical control, my work aims to ensure autonomous systems, from self-driving vehicles to drones and quadrupedal robots, are performant, verifiable, and trustworthy when deployed in human-populated space. The core principle of my PhD research is to plan robots’ motion in the joint space of both physical and information states, actively ensuring safety as they navigate uncertain, changing environments and interact with humans. Its key contribution is a unified algorithmic framework—backed by game theory—that allows robots to safely interact with their human peers, adapt to human preferences and goals, and even help humans refine their skills. Specifically, my PhD work contributes to the following areas in human-centered autonomy and multi-agent systems:

  • Trustworthy human–robot interaction: Planning safe and efficient robot trajectories by closing the computation loop between physical human-robot interaction and runtime learning that reduces the robot’s uncertainty about the human.
  • Verifiable neural safety analysis for complex robotic systems: Learning robust neural controllers for robots with high-dimensional dynamics; guaranteeing their training-time convergence and deployment-time safety.
  • Scalable interactive planning under uncertainty: Synthesizing game-theoretic control policies for complex and uncertain human–robot systems at scale.

Was there a project (or aspect of your research) that was particularly interesting?

Safety in human-robot interaction is especially difficult to define, because it hinges on an, I’d say, almost unanswerable question: How safe is safe enough when humans might behave in arbitrary ways? To give a concrete example: Is it sufficient if an autonomous vehicle can avoid hitting a fallen cyclist 99.9% of the time? What if this rate can only be achieved by the vehicle always stopping and waiting for the human to move out of the way?

I would argue that, for trustworthy deployment of robots in human-populated space, we need to complement standard statistical methods with clear-cut robust safety assurances under a vetted set of operation conditions as well established as those of bridges, power plants, and elevators. We need runtime learning to minimize the robot’s performance loss caused by safety-enforcing maneuvers; this calls for algorithms that can reduce the robot’s inherent uncertainty induced by its human peers, for example, their intent (does a human driver want to merge, cut behind, or stay in the lane?) or response (if the robot comes closer, how will the human react?). We need to close the loop between the robot’s learning and decision-making so that it can optimize efficiency by anticipating how its ongoing interaction with the human may affect the evolving uncertainty, and ultimately, its long-term performance.

What made you want to study AI, and the area of human-centered robotic systems in particular?

I’ve been fascinated by robotics and intelligent systems since childhood, when I’d spend entire days watching sci-fi anime like Mobile Suit Gundam, Neon Genesis Evangelion, or Future GPX Cyber Formula. What captivated me wasn’t just the futuristic technology, but the vision of AI as a true partner—augmenting human abilities rather than replacing them. Cyber Formula in particular planted the idea of human-AI co-evolution in my mind: an AI co-pilot that not only helps a human driver navigate high-speed, high-stakes environments, but also adapts to the driver’s style over time, ultimately making the human a better racer and deepening mutual trust along the way. Today, during my collaboration with Toyota Research Institute (TRI), I work on human-centered robotics systems that embody this principle: designing AI systems that collaborate with people in dynamic, safety-critical settings by rapidly aligning with human intent through multimodal inputs, from physical assistance to visual cues and language feedback, bringing to life the very ideas that once lived in my childhood imagination.

You’ve landed a faculty position at Johns Hopkins University (JHU) – congratulations! Could you talk a bit about the process of job searching, and perhaps share some advice and insights for PhD students who may be at a similar stage in their career?

The job search was definitely intense but also deeply rewarding. My advice to PhD students: start thinking early about the kind of long-term impact you want to make, and act early on your application package and job talk. Also, make sure you talk to people, especially your senior colleagues and peers on the job market. I personally benefited a lot from the following resources:

Do you have an idea of the research projects you’ll be working on at JHU?

I wish to help create a future where humans can unquestionably embrace the presence of robots around them. Towards this vision, my lab at JHU will investigate the following topics:

  • Uncertainty-aware interactive motion planning: How can robots plan safe and efficient motion by accounting for their evolving uncertainty, as well as their ability to reduce it through future interaction, sensing, communication, and learning?
  • Human–AI co-evolution and co-adaptation: How can embodied AI systems learn from human teammates while helping them refine existing skills and acquire new ones in a safe, personalized manner?
  • Safe human-compatible autonomy: How can autonomous systems ensure prescribed safety while remaining aligned with human values and attuned to human cognitive limitations?
  • Scalable and generalizable strategic decision-making: How can multi-robot systems make safe, coordinated decisions in dynamic, human-populated environments?

How was the experience attending the AAAI Doctoral Consortium?

I had the privilege of attending the 2025 AAAI Doctoral Consortium, and it was an incredibly valuable experience. I’m especially grateful to the organizers for curating such a thoughtful and supportive environment for early-career researchers. The highlight for me was the mentoring session with Dr Ming Yin (postdoc at Princeton, now faculty at Georgia Tech CSE), whose insights on navigating the uncertain and competitive job market were both encouraging and eye-opening.

Could you tell us an interesting (non-AI related) fact about you?

I am passionate about skiing. I learned to ski primarily by vision-based imitation learning from a chairlift, though I’m definitely paying the price now for poor generalization! One day, I hope to build an exoskeleton that teaches me to ski better while keeping me safe on the double black diamonds.

About Haimin

Haimin Hu is an incoming Assistant Professor of Computer Science at Johns Hopkins University, where he is also a member of the Data Science and AI Institute, the Institute for Assured Autonomy, and the Laboratory for Computational Sensing and Robotics. His research focuses on the algorithmic foundations of human-centered autonomy. He has received several awards and recognitions, including a 2025 Robotics: Science and Systems Pioneer, a 2025 Cyber-Physical Systems Rising Star, and a 2024 Human-Robot Interaction Pioneer. Additionally, he has served as an Associate Editor for IEEE Robotics and Automation Letters since his fourth year as a PhD student. He obtained a PhD in Electrical and Computer Engineering from Princeton University in 2025, an MSE in Electrical Engineering from the University of Pennsylvania in 2020, and a BE in Electronic and Information Engineering from ShanghaiTech University in 2018.

AIhub coffee corner: Agentic AI

AIhub coffee corner

The AIhub coffee corner captures the musings of AI experts over a short conversation. This month we tackle the topic of agentic AI. Joining the conversation this time are: Sanmay Das (Virginia Tech), Tom Dietterich (Oregon State University), Sabine Hauert (University of Bristol), Sarit Kraus (Bar-Ilan University), and Michael Littman (Brown University).

Sabine Hauert: Today’s topic is agentic AI. What is it? Why is it taking off? Sanmay, perhaps you could kick off with what you noticed at AAMAS [the Autonomous Agents and Multiagent Systems conference]?

Sanmay Das: It was very interesting because obviously there’s suddenly been an enormous interest in what an agent is and in the development of agentic AI. People in the AAMAS community have been thinking about what an agent is for at least three decades. Well, longer actually, but the community itself dates back about three decades in the form of these conferences. One of the very interesting questions was about why everybody is rediscovering the wheel and rewriting these papers about what it means to be an agent, and how we should think about these agents. The way in which AI has progressed, in the sense that large language models (LLMs) are now the dominant paradigm, is almost entirely different from the way in which people have thought about agents in the AAMAS community. Obviously, there’s been a lot of machine learning and reinforcement learning work, but there’s this historical tradition of thinking about reasoning and logic where you can actually have explicit world models. Even when you’re doing game theory, or MDPs, or their variants, you have an explicit world model that allows you to specify the notion of how to encode agency. Whereas I think that’s part of the disconnect now – everything is a little bit black boxy and statistical. How do you then think about what it means to be an agent? I think in terms of the underlying notion of what it means to be an agent, there’s a lot that can be learnt from what’s been done in the agents community and in philosophy.

I also think that there are some interesting ties to thinking about emergent behaviors, and multi-agent simulation. But it’s a little bit of a Wild West out there and there are all of these papers saying we need to first define what an agent is, which is definitely rediscovering the wheel. So, at AAMAS, there was a lot of discussion of stuff like that, but also questions about what this means in this particular era, because now we suddenly have these really powerful creatures that I think nobody in the AAMAS community saw coming. Fundamentally we need to adapt what we’ve been doing in the community to take into account that these are different from how we thought intelligent agents would emerge into this more general space where they can play. We need to work out how we adapt the kinds of things that we’ve learned about negotiation, agent interaction, and agent intention, to this world. Rada Mihalcea gave a really interesting keynote talk thinking about the natural language processing (NLP) side of things and the questions there.

Sabine: Do you feel like it was a new community joining the AAMAS community, or the AAMAS community that was converting?

Sanmay: Well, there were people who were coming to AAMAS and seeing that the community has been working on this for a long time. So learning something from that was definitely the vibe that I got. But my guess is, if you go to ICML or NeurIPS, that’s very much not the vibe.

Sarit Kraus: I think they’re wasting some time. I mean, forget the “what is an agent?”, but there have been many works from the agent community for many years about coordination, collaboration, etc. I heard about one recent paper where they reinvented Contract Nets. Contract Nets were introduced in 1980, and now there is a paper about it. OK, it’s LLMs that are transferring tasks from one another and signing contracts, but if they just read the past papers, it would save their time and then they could move to more interesting research questions. Currently, they say with LLM agents that you need to divide the task into sub agents. My PhD was about building a Diplomacy player, and in my design of the player there were agents that each played a different part of a Diplomacy play – one was a strategic agent, one was a Foreign Minister, etc. And now they are talking about it again.

Michael Littman: I totally agree with Sanmay and Sarit. The way I think about it is this: this notion of “let’s build agents now that we have LLMs” to me feels a little bit like we have a new programming language like Rust++, or whatever, and we can use it to write programs that we were struggling with before. It’s true that new programming languages can make some things easier, which is great, and LLMs give us a new, powerful way to create AI systems, and that’s also great. But it’s not clear that they solve the challenges that the agents community have been grappling with for so long. So, here’s a concrete example from an article that I read yesterday. Claudius is a version of Claude and it was agentified to run a small online shop. They gave it the ability to communicate with people, post slack messages, order products, set prices on things, and people were actually doing economic exchanges with the system. At the end of the day, it was terrible. Somebody talked it into buying tungsten cubes and selling them in the store. It was just nonsense. The Anthropic people viewed the experiment as a win. They said “ohh yeah, there were definitely problems, but they’re totally fixable”. And the fixes, to me, sounded like all they’d have to do is solve the problems that the agents community has been trying to solve for the last couple of decades. That’s all, and then we’ve got it perfect. And it’s not clear to me at all that just making LLMs generically better, or smarter, or better reasoners suddenly makes all these kinds of agents questions trivial because I don’t think they are. I think they’re hard for a reason and I think you have to grapple with the hard questions to actually solve these problems. But it’s true that LLMs give us a new ability to create a system that can have a conversation. But then the system’s decision-making is just really, really bad. And so I thought that was super interesting. But we agents researchers still have jobs, that’s the good news from all this.

Sabine: My bread and butter is to design agents, in our case robots, that work together to arrive at desired emergent properties and collective behaviors. From this swarm perspective, I feel that over the past 20 years we have learned a lot of the mechanisms by which you reach consensus, the mechanisms by which you automatically design agent behaviours using machine learning to enable groups to achieve a desired collective task. We know how to make agent behaviours understandable, all that good stuff you want in an engineered system. But up until now, we’ve been profoundly lacking the individual agents’ ability to interact with the world in a way that gives you richness. So in my mind, there’s a really nice interface where the agents are more capable, so they can now do those local interactions that make them useful. But we have this whole overarching way to systematically engineer collectives that I think might make the best of both worlds. I don’t know at what point that interface happens. I guess it comes partly from every community going a little bit towards the other side. So from the swarm side, we’re trying visual language models (VLMs), we’re trying to have our robots understand using LLMs their local world to communicate with humans and with each other and get a collective awareness at a very local level of what’s happening. And then we use our swarm paradigms to be able to engineer what they do as a collective using our past research expertise. I imagine for those who are just entering this discipline they need to start from the LLMs and go up. I think it’s part of the process.

Tom Dietterich: I think a lot of it just doesn’t have anything to do with agents at all, you’re writing computer programs. People found that if you try to use a single LLM to do the whole thing, the context gets all messed up and the LLM starts having trouble interpreting it. In fact, these LLMs have a relatively small short-term memory that they can effectively use before they start getting interference among the different things in the buffer. So the engineers break the system into multiple LLM calls and chain them together, and it’s not an agent, it’s just a computer program. I don’t know how many of you have seen this system called DSPy (written by Omar Khattab)? It takes an explicit sort of software engineering perspective on things. Basically, you write a type signature for each LLM module that says “here’s what it’s going to take as input, here’s what it’s going to produce as output”, you build your system, and then DSPy automatically tunes all the prompts as a sort of compiler phase to get the system to do the right thing. I want to question whether building systems with LLMs as a software engineering exercise will branch off from the building of multi-agent systems. Because virtually all the “agentic systems” are not agents in the sense that we would call them that. They don’t have autonomy any more than a regular computer program does.

Sabine: I wonder about the anthropomorphization of this, because now that you have different agents, they’re all doing a task or a job, and all of a sudden you get articles talking about how you can replace a whole team by a set of agents. So we’re no longer replacing individual jobs, we’re now replacing teams and I wonder if this terminology also doesn’t help.

Sanmay: To be clear, this idea has existed at least since the early 90s, when there were these “soft bots” that were basically running Unix commands and they were figuring out what to do themselves. It’s really no different. What people mean when they’re talking about agents is giving a piece of code the opportunity to run its own stuff and to be able to do that in service of some kind of a goal.

I think about this in terms of economic agents, because that’s what I grew up (AKA, did my PhD) thinking about. And, do I want an agent? I could think about writing an agent that manages my (non-existent) stock portfolio. If I had enough money to have a stock portfolio, I might think about writing an agent that manages that portfolio, and that’s a reasonable notion of having autonomy, right? It has some goal, which I set, and then it goes about making decisions. If you think about the sensor-actuator framework, its actuator is that it can make trades and it can take money from my bank account in order to do so. So I think that there’s something in getting back to the basic question of “how does this agent act in the world?” and then what are the percepts that it is receiving?

I completely agree with what you were saying earlier about this question of whether the LLMs enable interactions to happen in different ways. If you look at pre-LLMs, with these agents that were doing pricing, there’s this hilarious story of how some old biology textbook ended up costing $17 million on Amazon because there were these two bots that were doing the pricing of those books at two different used book stores. One of them was a slightly higher-rated store than the other, so it would take whatever price that the lower-rated store had and push it up by 10%. Then the lower-rated store was an undercutter and it would take the current highest price and go to 99% of that price. But this just led to this spiral where suddenly that book cost $17 million. This is exactly the kind of thing that’s going to happen in this world. But the thing that I’m actually somewhat worried about, and anthropomorphising, is how these agents are going to decide on their goals.There’s an opportunity for really bad errors to come out of programming that wouldn’t be as harmful in a more constrained situation.

Tom: In the reinforcement learning literature, of course, there’s all this discussion about reward hacking and so on, but now we imagine two agents interacting with each other and hacking each other’s rewards effectively, so the whole dynamics blows up – people are just not prepared.

Sabine: The breakdown of the problem that Tom mentioned, I think there’s perhaps a real benefit to having these agents that are narrower and that as a result are perhaps more verifiable at the individual level, they maybe have clearer goals, they might be more green because we might be able to constrain what area they operate with. And then in the robotics world, we’ve been looking at collaborative awareness where narrow agents that are task-specific are aware of other agents and collectively they have some awareness of what they’re meant to be doing overall. And it’s quite anti-AGI in the sense that you have lots of narrow agents again. So part of me is wondering, are we going back to heterogeneous task-specific agents and the AGI is collective, perhaps? And so this new wave, maybe it’s anti-AGI – that would be interesting!

Tom: Well, it’s almost the only way we can hope to prove the correctness of the system, to have each component narrow enough that we can actually reason about it. That’s an interesting paradox that I was missing from Stuart Russell’s “What if we succeed?” chapter in his book, which is what if we succeed in building a broad-spectrum agent, how are we going to test it?

It does seem like it would be great to have some people from the agents community speak at the machine learning conferences and try to do some diplomatic outreach. Or maybe run some workshops at those conferences.

Sarit: I was always interested in human-agent interaction and the fact that LLMs have solved the language issue for me, I’m very excited. But the other problem that has been mentioned is still here – you need to integrate strategies and decision-making. So my model is you have LLM agents that have tools that are all sorts of algorithms that we developed and implemented and there should be several of them. But the fact that somebody solved our natural language interaction, I think this is really, really great and good for the agents community as well for the computer science community generally.

Sabine: And good for the humans. It’s a good point, the humans are agents as well in those systems.

Interview with Kate Candon: Leveraging explicit and implicit feedback in human-robot interactions

In this interview series, we’re meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. Kate Candon is a PhD student at Yale University interested in understanding how we can create interactive agents that are more effectively able to help people. We spoke to Kate to find out more about how she is leveraging explicit and implicit feedback in human-robot interactions.

Could you start by giving us a quick introduction to the topic of your research?

I study human-robot interaction. Specifically I’m interested in how we can get robots to better learn from humans in the way that they naturally teach. Typically, a lot of work in robot learning is with a human teacher who is only tasked with giving explicit feedback to the robot, but they’re not necessarily engaged in the task. So, for example, you might have a button for “good job” and “bad job”. But we know that humans give a lot of other signals, things like facial expressions and reactions to what the robot’s doing, maybe gestures like scratching their head. It could even be something like moving an object to the side that a robot hands them – that’s implicitly saying that that was the wrong thing to hand them at that time, because they’re not using it right now. Those implicit cues are trickier, they need interpretation. However, they are a way to get additional information without adding any burden to the human user. In the past, I’ve looked at these two streams (implicit and explicit feedback) separately, but my current and future research is about combining them together. Right now, we have a framework, which we are working on improving, where we can combine the implicit and explicit feedback.

In terms of picking up on the implicit feedback, how are you doing that, what’s the mechanism? Because it sounds incredibly difficult.

It can be really hard to interpret implicit cues. People will respond differently, from person to person, culture to culture, etc. And so it’s hard to know exactly which facial reaction means good versus which facial reaction means bad.

So right now, the first version of our framework is just using human actions. Seeing what the human is doing in the task can give clues about what the robot should do. They have different action spaces, but we can find an abstraction so that we can know that if a human does an action, what the similar actions would be that the robot can do. That’s the implicit feedback right now. And then, this summer, we want to extend that to using visual cues and looking at facial reactions and gestures.

So what kind of scenarios have you been kind of testing it on?

For our current project, we use a pizza making setup. Personally I really like cooking as an example because it’s a setting where it’s easy to imagine why these things would matter. I also like that cooking has this element of recipes and there is a formula, but there’s also room for personal preferences. For example, somebody likes to put their cheese on top of the pizza, so it gets really crispy, whereas other people like to put it under the meat and veggies, so that maybe it is more melty instead of crispy. Or even, some people clean up as they go versus others who wait until the end to deal with all the dishes. Another thing that I’m really excited about is that cooking can be social. Right now, we’re just working in dyadic human-robot interactions where it’s one person and one robot, but another extension that we want to work on in the coming year is extending this to group interactions. So if we have multiple people, maybe the robot can learn not only from the person reacting to the robot, but also learn from a person reacting to another person and extrapolating what that might mean for them in the collaboration.

Could you say a bit about how the work that you did earlier in your PhD has led you to this point?

When I first started my PhD, I was really interested in implicit feedback. And I thought that I wanted to focus on learning only from implicit feedback. One of my current lab mates was focused on the EMPATHIC framework, and was looking into learning from implicit human feedback, and I really liked that work and thought it was the direction that I wanted to go into.

However, that first summer of my PhD it was during COVID and so we couldn’t really have people come into the lab to interact with robots. And so instead I did an online study where I had people play a game with a robot. We recorded their face while they were playing the game, and then we tried to see if we could predict based on just facial reactions, gaze, and head orientation if we could predict what behaviors they preferred for the agent that they were playing with in the game. We actually found that we could decently well predict which of the behaviors they preferred.

The thing that was really cool was we found how much context matters. And I think this is something that is really important for going from just a solely teacher-learner paradigm to a collaboration – context really matters. What we found is that sometimes people would have really big reactions but it wasn’t necessarily to what the agent was doing, it was to something that they had done in the game. For example, there’s this clip that I always use in talks about this. This person’s playing and she has this really noticeably confused, upset look. And so at first you might think that’s negative feedback, whatever the robot did, the robot shouldn’t have done that. But if you actually look at the context, we see that it was the first time that she lost a life in this game. For the game we made a multiplayer version of Space Invaders, and she got hit by one of the aliens and her spaceship disappeared. And so based on the context, when a human looks at that, we actually say she was just confused about what happened to her. We want to filter that out and not actually consider that when reasoning about the human’s behavior. I think that was really exciting. After that, we realized that using implicit feedback only was just so hard. That’s why I’ve taken this pivot, and now I’m more interested in combining the implicit and explicit feedback together.

You mentioned the explicit element would be more binary, like good feedback, bad feedback. Would the person-in-the-loop press a button or would the feedback be given through speech?

Right now we just have a button for good job, bad job. In an HRI paper we looked at explicit feedback only. We had the same space invaders game, but we had people come into the lab and we had a little Nao robot, a little humanoid robot, sitting on the table next to them playing the game. We made it so that the person could give positive or negative feedback during the game to the robot so that it would hopefully learn better helping behavior in the collaboration. But we found that people wouldn’t actually give that much feedback because they were focused on just trying to play the game.

And so in this work we looked at whether there are different ways we can remind the person to give feedback. You don’t want to be doing it all the time because it’ll annoy the person and maybe make them worse at the game if you’re distracting them. And also you don’t necessarily always want feedback, you just want it at useful points. The two conditions we looked at were: 1) should the robot remind someone to give feedback before or after they try a new behavior? 2) should they use an “I” versus “we” framing? For example, “remember to give feedback so I can be a better teammate” versus “remember to give feedback so we can be a better team”, things like that. And we found that the “we” framing didn’t actually make people give more feedback, but it made them feel better about the feedback they gave. They felt like it was more helpful, kind of a camaraderie building. And that was only explicit feedback, but we want to see now if we combine that with a reaction from someone, maybe that point would be a good time to ask for that explicit feedback.

You’ve already touched on this but could you tell us about the future steps you have planned for the project?

The big thing motivating a lot of my work is that I want to make it easier for robots to adapt to humans with these subjective preferences. I think in terms of objective things, like being able to pick something up and move it from here to here, we’ll get to a point where robots are pretty good. But it’s these subjective preferences that are exciting. For example, I love to cook, and so I want the robot to not do too much, just to maybe do my dishes whilst I’m cooking. But someone who hates to cook might want the robot to do all of the cooking. Those are things that, even if you have the perfect robot, it can’t necessarily know those things. And so it has to be able to adapt. And a lot of the current preference learning work is so data hungry that you have to interact with it tons and tons of times for it to be able to learn. And I just don’t think that that’s realistic for people to actually have a robot in the home. If after three days you’re still telling it “no, when you help me clean up the living room, the blankets go on the couch not the chair” or something, you’re going to stop using the robot. I’m hoping that this combination of explicit and implicit feedback will help it be more naturalistic. You don’t have to necessarily know exactly the right way to give explicit feedback to get the robot to do what you want it to do. Hopefully through all of these different signals, the robot will be able to hone in a little bit faster.

I think a big future step (that is not necessarily in the near future) is incorporating language. It’s very exciting with how large language models have gotten so much better, but also there’s a lot of interesting questions. Up until now, I haven’t really included natural language. Part of it is because I’m not fully sure where it fits in the implicit versus explicit delineation. On the one hand, you can say “good job robot”, but the way you say it can mean different things – the tone is very important. For example, if you say it with a sarcastic tone, it doesn’t necessarily mean that the robot actually did a good job. So, language doesn’t fit neatly into one of the buckets, and I’m interested in future work to think more about that. I think it’s a super rich space, and it’s a way for humans to be much more granular and specific in their feedback in a natural way.

What was it that inspired you to go into this area then?

Honestly, it was a little accidental. I studied math and computer science in undergrad. After that, I worked in consulting for a couple of years and then in the public healthcare sector, for the Massachusetts Medicaid office. I decided I wanted to go back to academia and to get into AI. At the time, I wanted to combine AI with healthcare, so I was initially thinking about clinical machine learning. I’m at Yale, and there was only one person at the time doing that, so I was looking at the rest of the department and then I found Scaz (Brian Scassellati) who does a lot of work with robots for people with autism and is now moving more into robots for people with behavioral health challenges, things like dementia or anxiety. I thought his work was super interesting. I didn’t even realize that that kind of work was an option. He was working with Marynel Vázquez, a professor at Yale who was also doing human-robot interaction. She didn’t have any healthcare projects, but I interviewed with her and the questions that she was thinking about were exactly what I wanted to work on. I also really wanted to work with her. So, I accidentally stumbled into it, but I feel very grateful because I think it’s a way better fit for me than the clinical machine learning would have necessarily been. It combines a lot of what I’m interested in, and I also feel it allows me to flex back and forth between the mathy, more technical work, but then there’s also the human element, which is also super interesting and exciting to me.

Have you got any advice you’d give to someone thinking of doing a PhD in the field? Your perspective will be particularly interesting because you’ve worked outside of academia and then come back to start your PhD.

One thing is that, I mean it’s kind of cliche, but it’s not too late to start. I was hesitant because I’d been out of the field for a while, but I think if you can find the right mentor, it can be a really good experience. I think the biggest thing is finding a good advisor who you think is working on interesting questions, but also someone that you want to learn from. I feel very lucky with Marynel, she’s been a fabulous advisor. I’ve worked pretty closely with Scaz as well and they both foster this excitement about the work, but also care about me as a person. I’m not just a cog in the research machine.

The other thing I’d say is to find a lab where you have flexibility if your interests change, because it is a long time to be working on a set of projects.

For our final question, have you got an interesting non-AI related fact about you?

My main summertime hobby is playing golf. My whole family is into it – for my grandma’s 100th birthday party we had a family golf outing where we had about 40 of us golfing. And actually, that summer, when my grandma was 99, she had a par on one of the par threes – she’s my golfing role model!

About Kate

Kate Candon is a PhD candidate at Yale University in the Computer Science Department, advised by Professor Marynel Vázquez. She studies human-robot interaction, and is particularly interested in enabling robots to better learn from natural human feedback so that they can become better collaborators. She was selected for the AAMAS Doctoral Consortium in 2023 and HRI Pioneers in 2024. Before starting in human-robot interaction, she received her B.S. in Mathematics with Computer Science from MIT and then worked in consulting and in government healthcare.

#RoboCup2025: social media round-up part 2

RoboCup2025 took place from 15-21 July in Salvador, Brazil. The event saw around 3000 participants competing in the various leagues. In our first social media round-up post we saw what the teams got up to during the first couple of days of the event. In this second post, we take a look at the action from the final days when the competitions reached their climax.

In the #RoboCup2025 @Home OPL Final, our robot performed very well. It opened two doors, removed trash, and closed a cabinet door. Overall, NimbRo came in second, next to team Tidyboy (Korea).
www.ais.uni-bonn.de/nimbro/@Home

[image or embed]

— Sven Behnke (@sven-behnke.bsky.social) 20 July 2025 at 18:04

#RoboCup2025: social media round-up 1

RoboCup2025 took place in Salvador, Brazil. The event saw around 3000 participants competing in the various leagues. Find out what the teams got up to during the first couple of days:

Livestream of RoboCup2025

RoboCup2025 is currently taking place in Salvador, Brazil. With day one of the main competition complete, things are hotting up across the many different leagues. From soccer to rescue, from industrial to home scenarios, teams are putting their robots through their paces across a variety of tasks and matches.

If you would like to catch up on the action from the first day, you can watch the recording of the livestream below. This includes coverage of the teams competing, interviews with participants and organisers, and insights into RoboCup and the various leagues.

The livestreams for days two and three, which will feature the knockout stages of the competitions, can be found below:

The livestream for the award ceremony, will be here:

You can also find the livestream on the RoboCup Twitch channel


Read our series of interviews with RoboCup organisers, trustees, and committee members:

Tackling the 3D Simulation League: an interview with Klaus Dorer and Stefan Glaser

A screenshot from the new simulator that will be trialled for a special challenge at RoboCup2025.

The annual RoboCup event, where teams gather from across the globe to take part in competitions across a number of leagues, will this year take place in Brazil, from 15-21 July. In advance of kick-off, we spoke to two members of the RoboCup Soccer 3D Simulation League: Executive Committee Member Klaus Dorer, and Stefan Glaser, who is on the Maintenance Committee and who has been recently developing a new simulator for the League.

Could start by just giving us a quick introduction to the Simulation League?
Klaus Dorer: There are two Simulation Leagues in Soccer: the 2D Simulation League and the 3D Simulation League. The 2D Simulation League, as the name suggests, is a flat league where the players and ball are simulated with simplified physics and the main focus is on team strategy. The 3D Simulation League is much closer to real robots; it simulates 11 versus 11 Nao robots. The level of control is like with real robots, where you move each motor of the legs and the arms and so on to achieve movement.

I understand that you have been working on a new simulator for the 3D League. What was the idea behind this new simulator?
Klaus: The aim is to bring us closer to the hardware leagues so that the simulator can be more useful. The current simulator that we use in the 3D Simulation League is called SimSpark. It was created in the early 2000s with the aim of making it possible to play 11 vs 11 players. With the hardware constraints of that time, there had to be some compromises on the physics to be able to simulate 22 players at the same time. So the simulation is physically somewhat realistic, but not in the sense that it’s easy to transpose it to a real Nao robot.

Stefan Glaser: The idea for developing a new simulator has been around for a few years. SimSpark is a very powerful simulation framework. The base framework is domain independent (not soccer specific) and specific simulations are realized via plugins. It supports multiple physics engines in the backend and provides a flexible scripting interface for configuration and adaptations of the simulation. However, all this flexibility comes with the price of complexity. In addition to that, SimSpark uses custom robot model specifications and communication protocols, limiting the amount of available robot models and requiring teams to develop custom communication layers only for communicating with SimSpark. As a result of this, SimSpark has not been widely adopted in the RoboCup community.

With the new simulator, I would like to address these two major issues: complexity and standardization. In the ML community, the MuJoCo physics engine has become a very popular choice for learning environments after Google DeepMind acquired it and released it open source. Its standards for world and robot model specifications are widely adopted in the community and there exist a lot of ready-to-use robot model specifications for a wide variety of virtual as well as real-world robots. In the middle of last year, they (MuJoCo) added a feature which allows you to manipulate the world representation during simulation (adding and removing objects to / from the simulation while preserving the simulation state). This is one essential requirement we have in the simulation league, where we start with an empty field and then the agents connect on demand and form the teams. When this feature has been added, I decided to make a step forward and try to implement a new simulator for the 3D Simulation League based on MuJoCo. Initially, I wanted to start development in C/C++ to achieve maximum performance, but then decided to start in Python to reduce complexity and make it more accessible for other developers. I started development on Easter Monday so it’s not even three months old!

I think it might be useful to explain a little bit more about the setup of our league and the requirements of the simulator. If we take the FIFA game (on your favorite gaming device) as an example, there is one simulation happening which simulates 22 players and the decision making is part of the simulation having full access to the state of the world. In the 3D Simulation League we have two teams with 11 robots on the field, but we also have 22 individual agent softwares which are connected to the simulation server, each controlling one single robot. Each connected agent only receives sensor information related to their robot in the simulation. They are also only allowed to communicate via the server – there is no direct communication between the agents allowed in Simulation League. So we have a general setup where the simulation server has to be able to accept up to 22 connections and manage the situation there. This functionality has been the major focus for me for the last couple of months and this part is already working well. Teams can connect their agents, which will receive sensor information and can actuate joints of the robot in the simulation and so on. They are also able to select different robot models if they like.

An illustration of the simulator set-up.

Presumably the new simulator has a better representation of the physics of a real robot.
Klaus: Exactly. For example, how the motors are controlled is now a bit different and much closer to real robots. So when I did my first experiments, I saw the robot collapse and I thought it was exactly how a real robot would collapse! In SimSpark we also had falling robots but the motor control in the new simulator is different. Now you can control the motors by speed, by force, by position, which is much more flexible – it’s closer to what we know from real robots.

I think that, at least initially, it will be more difficult for the Simulation League teams to get the robots to do what they want them to do, because it’s more realistic. For example, in SimSpark the ground contact was much more forgiving. So if you step hard on the ground, you don’t fall immediately with a SimSpark robot but with a MuJoCo robot this will be much more realistic. Indeed, in real robots ground contact is somewhat less forgiving.

I had a question about the vision aspect – how do the individual agents “see” the position of the other agents on the field?
Stefan: We simulate a virtual vision pipeline on the server side. You have a restricted field of view of ±60° horizontally and vertically. Within that field of view you will detect the head, the arms, the feet of other players, or the ball, for example, or different features of the field. Similar to common real-world vision pipelines, each detection consists of a label, a direction vector and the distance information. The information has some noise on it like real robots have, too, but teams don’t need to process camera images. They get the detections directly from the simulation server.

We’ve previously had a discussion about moving towards getting camera images of the simulation to integrate into the vision pipeline on the agent side. This was never really realistic in SimSpark with the implementation we had there. However, it should be possible with MuJoCo. However, for the first version, I used the same way the traditional simulator treated the vision. This means that teams don’t need to train a vision model, and don’t need to handle camera images to get started. This reduces the load significantly and also shifts the focus of the problem towards motion and decision making.

Will the simulator be used at RoboCup 2025?
Stefan: We plan to have a challenge with a new simulator and I will try to provide some demo games. At the moment it’s not really in a state where you can play a whole competition.

Klaus: That’s usually how we proceed with new simulators. We would not move from one to the other without any intermediate step. We will have a challenge this year at RoboCup 2025 with the new MuJoCo simulator where each participating team will try to teach the robot to kick as far as possible. So, we will not be playing a whole game, we won’t have multiple robots, just a single robot stepping in front of the ball and kicking the ball. That’s the technical challenge for this year. Teams will get an idea of how the simulator works, and we’ll get an idea of what has to be changed in the simulator to proceed.

This new challenge will be voluntary, so we are not sure how many teams will participate. Our team (MagmaOffenburg) will certainly take part. It will be interesting to see how well the teams perform because no one knows how far a good kick is in this simulator. It’s a bit like in Formula One when the rules change and no one knows which team will be the leading team.

Do you have an idea of how much adaptation teams will have to make if and when you move to the new simulator for the full matches?
Stefan: As a long-term member of 3D Simulation League, I know the old simulator SimSpark pretty well, and know the protocols involved and how the processes work. So the first version of the new simulator is designed to use the same basic protocol, the same sensor information, and so on. The idea is that the teams can use the new simulator with minimal effort in adapting their current agent software. So they should be able to get started pretty fast.

Although, when designing a new platform, I would like to take the opportunity to make a step forward in terms of protocols, because I also want to integrate other Leagues in the long-term. They usually have other control mechanisms, and they don’t use the same protocol that is prominent in 3D Simulation. Therefore there has to be some flexibility in the future. But for the first version, the idea was to get the Simulation League ready with minimal effort.

Klaus: The big idea is that this is not just used in the 3D Simulation league, but also as a useful simulator for the Humanoid League and also for the Standard Platform League (SPL). So if that turns out to be true, then it will be completely successful. For the Kick Challenge this year, for example, we use a T1 robot that is a Humanoid League robot.

Could you say something about this simulation to real world (Sim2Real) aspect?
Stefan: We’d like it to be possible for the motions and behaviors in the simulator to be ported to real robots. From my point of view, it would be useful the other way round too.

We, as a Simulation League, usually develop for the Simulation League and therefore would like to get the behaviors running on a real robot. But the hardware teams usually have a similar issue when they want to test high-level decision making. They might have two to five robots on the field, and if they want to play a high-level decision-making match and train in that regard, they always have to deploy a lot of robots. If they also want to have an opponent, they have to double the amount of robots in order to play a game to see how the strategy would turn out. The Sim2Real aspect is also interesting for these teams, because they should be able to take what they deployed on the real robot and it should also work in the simulation. They can then use the simulation to train high-level skills like team play, player positioning and so on, which is a challenging aspect for the real robot leagues like SPL or the Humanoid Leagues.

Klaus: And the reason we know this is because we have a team in the Simulation League and we have a team in the Humanoid League. So that’s another reason why we are keen to bring these things closer together.

How does the refereeing work in the Simulation League?
Klaus: A nice thing about Simulation Leagues is that there is a program which knows the real state of the world so we can build in the referee inside the simulator and it will not fail. For things like offside, whether the ball passed the goal line, that’s fail safe. All the referee decisions are taken by the system itself. We have a human referee but they never need to intervene. However, there are situations where we would like artificial intelligence to play a role. This is not currently the case in SimSpark because the rules are all hard coded. We have a lot of fouls that are debatable. For example, there are many fouls that teams agree should not have been a foul, and other fouls that are not called that should have been. It would be a nice AI learning task to get some situations judged by human referees and then train an AI model to better determine the rules for what is a foul and what isn’t a foul. But this is currently not the case.

Stefan: On the new simulator I am not that far into the development that I have implemented the automatic referee yet. I have some basic set of rules which progress the game as such, but judging fouls and deciding on special situations is not yet implemented in the new simulator.

What are the next steps for developing the simulator?
Stefan: One of the next major steps will be to refine the physics simulation. For instance, even though there exists a ball in the simulation, it is not yet really well refined. There are a lot of physics parameters which we have to decide on to reflect the real world as good as possible. This will likely require a series of experiments in order to get to the correct values for various aspects. In this aspect I’m hoping for some engagement of the community, as it is a great research opportunity and I personally would prefer the community to decide on a commonly accepted parameter set based on a level of evidence that I can’t easily provide all by myself. So in case someone is interested in refining the physics of the simulation such that it best reflects the real world, you are welcome to join!

Another major next step will be the development of the automatic referee of the soccer simulation, deciding on fouls, handling misbehaving agents and so on. In the first version, foul conditions will likely be judged by an expert system specifically designed for this purpose. The simulation league has developed a set of foul condition specifications which I plan to adapt. In a second step, I would like to integrate and support the development of AI based foul detection models. But yeah, one step after the other.

What are you particularly looking forward to at RoboCup2025?
Klaus: Well, with our team we have been vice world champion seven times in a row. This year we are really hoping to make it to world champion. We are very experienced in getting losses in finals and this year we are looking forward to changing that, from a team perspective.

Stefan: I’m going to Brazil in order to promote the simulator, not just for the Simulation League, but also across the boundaries for the Humanoid Leagues and the SPL Leagues. I think that this simulator is a great chance to bring people from all the leagues together. I’m particularly interested in the specific requirements of all the teams of the different leagues. This understanding will help me tailor the new simulator towards their needs. This is one of my major highlights for this year, I would say.


You can find out more about the new simulator at the project webpage, and from the documentation.


Klaus Dorer is professor for artificial intelligence, autonomous systems and software engineering at Offenburg University, Germany. He is also a member of the Institute for Machine Learning and Analytics IMLA. He has been team leader of the RoboCup simulation league teams magmaFreiburg (since 1999), living systems, magmaFurtwangen and is now team leader of magmaOffenburg since 2009. Since 2014, he has also been part of the humanoid adult size league team Sweaty.

Stefan Glaser is teaching assistant for artificial intelligence and intelligent autonomous systems at the Offenburg University, Germany. He has been part of the RoboCup simulation league team magmaOffenburg since 2009 and the RoboCup humanoid adult size league team Sweaty since 2014.

An interview with Nicolai Ommer: the RoboCupSoccer Small Size League

Kick-off in a Small Size League match. Image credit: Nicolai Ommer.

RoboCup is an international scientific initiative with the goal of advancing the state of the art of intelligent robots, AI and automation. The annual RoboCup event is due to take place from 15-21 July in Salvador, Brazil. The Soccer component of RoboCup comprises a number of Leagues, with one of these being the Small Size League (SSL). We caught up with Executive Committee member Nicolai Ommer to find out more about the SSL, how the auto referees work, and how teams use AI.

Could start by giving us a quick introduction to the Small Size League?

In the Small Size League (SSL) we have 11 robots per team – the only physical RoboCup soccer league to have the full number of players. The robots are small, cylindrical robots on wheels and they can move in any direction. They are self-built by the teams, so teams have to do both the hardware and the programming, and a lot of things have to work together to make a team work. The AI is central. We don’t have agents, so teams have a central computer at the field where they can do all the computation and then they send the commands to the robots in different abstractions. Some teams will just send velocity commands, other teams send a target.

We have a central vision system – this is maintained by the League, and has been since 2010. There are cameras above the field to track all the robots and the ball, so everyone knows where the robots are.

The robots can move up to 4 meters per second (m/s), after this point it gets quite unstable for the robots. They can change direction very quickly, and the ball can be kicked at 6.5 m/s. It’s quite fast and we’ve already had to limit the kick speed. Previously we had a limit of 8 m/s and before that 10m/s. However, no robot can catch a ball with this speed, so we decided to reduce it and put more focus on passing. This gives the keeper and the defenders a chance to actually intercept a kick.

It’s so fast that for humans it’s quite difficult to understand all the things that are going on. And that’s why, some years ago, we introduced auto refs, which help a lot in tracking, especially things like collisions and so on, where the human referee can’t watch everything at the same time.

How do the auto refs work then, and is there more than one operating at the same time?

When we developed the current system, to keep things fair, we decided to have multiple implementations of an auto ref system. These independent systems implement the same rules and then we do a majority vote on the decisions.

To do this we needed a middle component, so some years ago I started this project to have a new game controller. This is the user interface (UI) for the human referee who sits at a computer. In the UI you see the current game state, you can manipulate the game state, and this component coordinates the auto refs. The auto refs can connect and report fouls. If only one auto ref detects the foul, it won’t count it. But, if both auto refs report the foul within the time window, then it is counted. Part of the challenge was to make this all visual for the operator to understand. The human referee has the last word and makes the final decision.

We managed to establish two implementations. The aim was to have three implementations, which makes it easier to form a majority. However, it still works with just two implementations and we’ve had this for multiple years now. The implementations are from two different teams who are still active.

How do the auto refs deal with collisions?

We can detect collisions from the data. However, even for human referees it’s quite hard to determine who was at fault when two robots collide. So we had to just define a rule, and all the implementations of the auto ref implement the same rule. We wrote in the rulebook really specifically how you calculate if a collision happened and who was at fault. The first consideration is based on the velocity – below 1.5m/s it’s not a collision, above 1.5m/s it is. There is also another factor, relating to the angle calculation, that we also take into account to determine which robot was at fault.

What else do the auto refs detect?

Other fouls include the kick speed, and then there’s fouls relating to the adherence to normal game procedure. For example, when the other team has a free kick, then the opposing robots should maintain a certain distance from the ball.

The auto refs also observe non-fouls, in other words game events. For example, when the ball leaves the field. That’s the most common event. This one is actually not so easy to detect, particularly if there is a chip kick (where the ball leaves the playing surface). With the camera lens, the parabola of the ball can make it look like it’s outside the field of play when it isn’t. You need a robust filter to deal with this.

Also, when the auto refs detect a goal, we don’t trust them completely. When a goal is detected, we call it a “possible goal”. The match is halted immediately, all the robots stop, and the human referee can check all the available data before awarding the goal.

You’ve been involved in the League for a number of years. How has the League and the performance of the robots evolved over that time?

My first RoboCup was in 2012. The introduction of the auto refs has made the play a lot more fluent. Before this, we also introduced the concept of ball placement, so the robots would place the ball themselves for a free kick, or kick off, for example.

From the hardware side, the main improvement in recent years has been dribbling the ball in one-on-one situations. There has also been an improvement in the specialized skills performed by robots with a ball. For example, some years ago, one team (ZJUNlict) developed robots that could pull the ball backwards with them, move around defenders and then shoot at the goal. This was an unexpected movement, which we hadn’t seen before. Before this you had to do a pass to trick the defenders. Our team, TIGERs Mannheim, has also improved in this area now. But it’s really difficult to do this and requires a lot of tuning. It really depends on the field, the carpet, which is not standardized. So there’s a little bit of luck that your specifically built hardware is actually performing well on the competition carpet.

The Small Size League Grand Final at RoboCup 2024 in Eindhoven, Netherlands. TIGERs Mannheim vs. ZJUNlict. Video credit: TIGERs Mannheim. You can find the TIGERs’ YouTube channel here.

What are some of the challenges in the League?

One big challenge, and also maybe it’s a good thing for the League, is that we have a lot of undergraduate students in the teams. These students tend to leave the teams after their Bachelor’s or Master’s degree, the team members all change quite regularly, and that means that it’s difficult to retain knowledge in the teams. It’s a challenge to keep the performance of the team; it’s even hard to reproduce what previous members achieved. That’s why we don’t have large steps forward, because teams have to repeat the same things when new members join. However, it’s good for the students because they really learn a lot from the experience.

We are continuously working on identifying things which we can make available for everyone. In 2010 the vision system was established. It was a huge factor, meaning that teams didn’t have to do computer vision. And we are currently looking at establishing standards for wireless communication – this is currently done by everyone on their own. We want to advance the League, but at the same time, we also want to have this nature of being able to learn, being able to do all the things themselves if they want to.

You really need to have a team of people from different areas – mechanical engineering, electronics, project management. You also have to get sponsors, and you have to promote your project, get interested students in your team.

Could you talk about some of the AI elements to the League?

Most of our software is script-based, but we apply machine learning for small, subtle problems.

In my team, for example, we do model calibration with quite simple algorithms. We have a specific model for the chip kick, and another for the robot. The wheel friction is quite complicated, so we come up with a model and then we collect the data and use machine learning to detect the parameters.

For the actual match strategy, one nice example is from the team CMDragons. One year you could really observe that they had trained their model so that, once they scored goal, they upvoted the strategy that they applied before that. You could really see that the opponent reacted the same way all the time. They were able to score multiple goals, using the same strategy again and again, because they learned that if one strategy worked, they could use it again.

For our team, the TIGERs, our software is very much based on calculating scores for how good a pass is, how well can a pass be intercepted, and how we can improve the situation with a particular pass. This is hard-coded sometimes, with some geometry-based calculations, but there is also some fine-tuning. If we score a goal then we track back and see where the pass came from and we give bonuses on some of the score calculations. It’s more complicated than this, of course, but in general it’s what we try to do by learning during the game.

People often ask why we don’t do more with AI, and I think the main challenge is that, compared to other use cases, we don’t have that much data. It’s hard to get the data. In our case we have real hardware and we cannot just do matches all day long for days on end – the robots would break, and they need to be supervised. During a competition, we only have about five to seven matches in total. In 2016, we started to record all the games with a machine-readable format. All the positions are encoded, along with the referee decisions, and everything is in a log file which we publish centrally. I hope that with this growing amount of data we can actually apply some machine learning algorithms to see what previous matches and previous strategies did, and maybe get some insights.

What plans do you have for your team, the TIGERs?

We have actually won the competition for the last four years. We hope that there will be some other teams who can challenge us. Our defence has not really been challenged so we have a hard time finding weaknesses. We actually play against ourselves in simulation.

One thing that we want to improve on is precision because there is still some manual work to get everything calibrated and working as precisely as we want it. If some small detail is not working, for example the dribbling, then it risks the whole tournament. So we are working on making all these calibration processes easier, and to do more automatic data processing to determine the best parameters. In recent years we’ve worked a lot on dribbling in the 1 vs 1 situations. This has been a really big improvement for us and we are still working on that.

About Nicolai

Nicolai Ommer is a Software Engineer and Architect at QAware in Munich, specializing in designing and building robust software systems. He holds a B.Sc. in Applied Computer Science and an M.Sc. in Autonomous Systems. Nicolai began his journey in robotics with Team TIGERs Mannheim, participating in his first RoboCup in 2012. His dedication led him to join the RoboCup Small Size League Technical Committee and, in 2023, the Executive Committee. Passionate about innovation and collaboration, Nicolai combines academic insight with practical experience to push the boundaries of intelligent systems and contribute to the global robotics and software engineering communities.

RoboCupRescue: an interview with Adam Jacoff

The RoboCupRescue arena at RoboCup2024, Eindhoven.

RoboCup is an international scientific initiative with the goal of advancing the state of the science of intelligent robots, AI and automation. The annual RoboCup event will take place from 15-21 July in Salvador, Brazil. The RoboCupRescue League is an important element of the competition and focuses on the challenges involved in search and rescue applications. We caught up with Adam Jacoff, co-founder of the RoboCupRescue league, former RoboCup Trustee, and chair of the organising committee, to find out more.

Could you start by giving us an overview of the Rescue League?

The RoboCupRescue League is now in its 25th year hosting competitions and workshops all around the world. We’re focused on developing autonomous robots that can enable emergency responders to perform extremely hazardous tasks from safer stand-off distances. That includes search and rescue scenarios in compromised or collapsed structures, for example, but there are many tasks that firefighters and other public safety organizations do every day that would be safer with robots doing the dangerous part. We are really the only League at RoboCup that’s focused on emergency responders, so our arenas appear a little more chaotic than the other leagues, but in a controlled sort of way. We have twenty standard test methods that we use as challenge tasks that emergency responders and our league of teams around the world have helped develop and validate over many years. These competitions do three things at once.

First, they guide the research with tangible terrains, obstacles, and tasks that are representative of emergency response operations. The researchers likely do not know any emergency responders, but we do, and we have distilled their requirements into twenty or so standard test methods. So the teams know that if they can solve the challenges presented to them in the test lanes, maybe not all twenty, but ten of them, or five of them, that combination of capabilities are applicable to some public safety mission. And the more autonomy the researchers can implement into their robots, the more likely the robots will be effective and easy to use in the field. The main objective of our league is to quicken the development of effective and reliable autonomous behaviors for these complex environments.

RoboCup is an excellent incubator for developing and evaluating cutting edge research and at the same time for developing, validating, and disseminating new standard test methods. Our standard test methods are reproducible and approach the reality of very complex and hazardous environments in an incremental way. We have increasingly difficult settings for each test. So we start easy in the Preliminaries with all the terrain lanes flat so teams can optimize their approaches for each particular challenge. By Semi-Finals the terrain lanes are inclined into crossover 15° slopes with a complex terrain obstacle in the middle. The teams start performing sequences of multiple lanes, so they need to intelligently switch between different behaviors to succeed. Then for the Finals we inject more complexity with slippery features for the robots to lose traction, pinch points to force more steering, and higher step-over obstacles to make the sequences even harder. This is how we guide and challenge both autonomous and remotely operated robots beyond their comfort zone.

Second, the intensity of the experience and education for engineers and computer scientists leads to recruiting opportunities and careers. Everybody understands that teams who do well at RoboCupRescue have done something special. They can go on to more intense research or can help implement that technology in the commercial market. Our League has inspired and validated new robotic approaches to very complex environments and helped make the jump from research to commercial implementation.

The third piece of the puzzle is our connection to various communities of emergency responders. Our sequence of test lanes is exactly what emergency responders use to support their procurement and training. They use trial data captured with the robot manufacturers to help decide what robot to buy. They also use the test lanes to train remote operators and measure their proficiency. So RoboCupRescue helps emergency responders understand what the next wave of emerging technologies can do. For example, the nuclear waste cleanup industry wants to credential their remote robot operators to ensure robots don’t get stranded within extremely harsh radiation-filled environments. The robots need to keep working for 1,000 hours or more to be cost effective. They also want to ensure remote operators are proficient enough to safely perform the intended tasks without crashing the robot on the stairs and blocking the way for all other robots. Or worse yet, force humans in protective suits to go in and clear the way. Different organizations come to RoboCupRescue competitions and see various aspects of the league in action that are useful to them. In this case, as an obstacle course to get a robot driver’s license.

At Eindhoven last year (for RoboCup2024) we had the local bomb squads provide what we call a “reverse demonstration.” All these researchers give demonstrations of their robots at their schools to sponsors, family, friends, and maybe even to emergency responders. But they never get to see what the emergency responders do for a living – how dangerous their jobs are. So the RoboCupRescue league typically pauses for a while during the competition and comes together to watch a bomb technician get into their padded suit and go through the sequence of test lanes doing all the tasks that we’re asking the robots to perform. The researchers see how much of a burden it is just to wear the padded suits, which are very heavy and require forced air into the helmet. It’s not only physically difficult, they’re walking to and interacting with known explosives to render them safe – which is unconscionable given the capabilities of robots these days. They need to be working remotely from a safe stand-off distance. These “reverse demonstrations” are likely the first time the researchers ever get to meet people in bomb suits, and understand that it is these emergency responders we are working to help. Whether it’s firefighters or police, a natural or manmade disaster, we’re trying to help emergency responders stay out of harm’s way. They need capable, reliable, and easy to use robots to deal with extreme hazards more safely and effectively.

The RoboCupRescue arena at ICRA2025 (one of the qualifying events for RoboCup2025). Credit: Adam Jacoff.

So you mentioned trying to promote autonomy over teleoperation. How do you go about trying to encourage people to use more autonomy in the league?

Competitions need to have simple and clear rules. They need to challenge robots, lead them toward practical solutions, and highlight teams that are successful in a variety of challenges. So RoboCupRescue has developed scoring tasks for mobility, dexterity, and mapping within complex terrains. To score mobility points, for example, the robot needs to drive from one end of a terrain lane to the other end, or ascend and descend obstacles like stairs. They earn 1 point for each successful traverse up to 10 total repetitions to add some statistical significance. The operator is always out of sight of the robot, so working only through their interface as if the robot is inside a building. The terrain lanes and obstacles start easy and get harder, but all teams are being evaluated in the same difficulty setting as the competition progresses, so the points stay the same.

If the robot can successfully traverse the lane or perform the dexterity tasks without the operator touching the interface, we call that autonomy. At the beginning of each traverse the operator can manually set a goal point at the other end of the lane. Then they hit “go” and we all watch and learn together. If the robot gets to the other end without the operator touching the controls, the team gets 4 points instead of 1 point for that autonomous traverse. That 4:1 one ratio appears to be enough incentive to make sure teams develop autonomous behaviors and try them at each start point. However, they can always repair back to remote operation anytime the robot stalls to score at the lesser rate, so the rules encourage trying autonomy first. Also, we give Best-In-Class Autonomy awards to recognize teams that earned the most points autonomously throughout the competition.

Why do we need autonomy in emergency response operations? First, because the onboard autonomy typically makes the robot easier to use for a remote operator, which can also mean it is more likely to succeed down range or maybe be more reliable. Also, inside buildings there are often radio communications drop-out zones. Think about your cell phone reception inside concrete buildings or in basements under buildings, these places still need to be searched even when radio communication from a remote stand-off location outside the structure doesn’t support real-time video or control. So each RoboCupRescue lane, and the entire Labyrinth or Maze used for mapping, is considered a “radio drop-out zone” with tasks that score 4 to 1 if you can perform them autonomously. That applies to mobility, dexterity, and mapping tasks.

An example of one of the lanes (challenges) that teams tackle during competition. Credit: Adam Jacoff.

Will there be any new innovations to look out for this year?

What you’re going to see new this year are four-legged robots, called quadrupeds, that now have wheels as feet. There was a commercial robot that came to RoboCup2024 in Eindhoven. I invited them to demonstrate in the test lanes while the teams were at lunch, and this new robot design literally dominated our Semi-Finals sequence. Now that was a demonstration, conducted with the operator in direct line-of-sight of the robot, so much easier than remotely controlling the robot in the same terrains. But it didn’t have a remote interface. That’s exactly what our league of teams do best by adding interfaces, sensors, mapping, manipulators, and the autonomy to make it all effective and easier to use remotely.

These legged robots with wheels are a literal step function improvement in mobility, the likes of which I’ve only seen twice in thirty plus years. They have a level of mobility relative to their size that is astonishing. So we’re turning that new robot into our league’s first “standard platform.”

One of the four-legged robots with wheels as feet. Image from video taken at ICRA 2025, of the robot tackling one of the test lanes. Credit: Adam Jacoff.

Yes, I wanted to ask about the League’s plans for introducing a standard platform. Could you talk more about that?

This robot will provide a common development platform across teams with inherently good mobility within our terrains and obstacles. The league is negotiating with the manufacturer so that teams can purchase it at a relatively low price, which will bring down the cost of entry for new teams into the league. That level of mobility within our complex terrains and obstacles makes the autonomy a lot easier to develop – teams can focus on higher-level planning because the lower-level autonomous gaits and behaviors are already relatively effective. Similarly equipped teams can collaborate more closely, compare performance more directly, and share software solutions.

As a league, we can also start introducing teaming tasks for multiple robots. What kind of teaming might we explore? Maybe a certain object of interest is too heavy to drag for one robot, but could three of them work together to get it done like a team of dogs pulling a sled? Or maybe to carry payloads and supplies down-range, or even a non-ambulatory victim in a so-called litter back up-range to the base of operations. There will be lots of possibilities.

This new class of robots will operate in all the same test lanes along with all the other wheeled, tracked, and other robots developed by teams. We score every robot similarly of course, but then can separately compare the quantitative trial results for different classes of robots. It might turn out that it’s unfair to compare a new legged robot with wheels with a slower but stronger tracked robot that’s been winning RoboCupRescue for years. It’s not like tracked robots don’t have their use cases in emergency response operations. As soon as you get to tasks that involve exerting forces on the environment, like turning valves, the robots benefit from being heavier and stronger, which typically means tracks with independent front and rear flippers to negotiate the complex terrains and climb stairs. Everyone doesn’t need to adopt the new robot to compete, teams should work on implementations that push the state of the science in whatever direction they can. Then our Best-In-Class Awards can recognize the most effective robots in a variety of like-kind implementations, wheeled, tracked, legged, and maybe now legged with wheels.

When humanoids come into the league, and they will at some point, they’ll be welcomed and will be compared directly against other humanoids at least initially. We can also aggregate performance across a class of robots to identify advantages and apparent weaknesses. Emergency responders can then look at all the trial results to see which systems might work in their environments.

Do you know how many teams will be using this new robot?

I think we have almost twenty teams involved this year, but I’m not sure how many will be using the new legged robot with wheels. I expect maybe three or four initially, which would make a reasonable bracket of teams that can collaborate and compare more closely than others. There is a team from Austria that was in the German Open that is planning to come to Brazil. I’m sure when more people see the robot in action they’ll want one. Especially any new team coming into the league starting from scratch. Meanwhile, the RoboCupRescue league is poised to evaluate that robot design fully, add essential operational capabilities, and push the state of the science a bit further.

An example of the legged, wheeled robot in action. Credit: Adam Jacoff.

Will there be any different challenges this year?

Yes, this year we’re adding a new version of our original Stepfield terrain. They were our first rather complex terrain that inspired drastic changes in robot designs twenty years ago. The Stepfield terrain ensures the robots are always in contact with multiple different terrain elevations, and there is no respite throughout, even for legged robots with wheels. For the past couple of years, we got away from the Stepfield terrain because it was the hardest terrain to replicate at a temporary venue. But the year off for COVID was a growth time for us in terms of making the test methods as inexpensive and stowable as possible, so that small research organizations can replicate the tests to practice, refine approaches, and evaluate themselves during their development process. We first found a way to fabricate new Stepfields terrains from purchased, stackable crates. Now we have a fabricated version that accomplishes the same objectives less expensively. So RoboCupRescue teams will validate that the test apparatus as fabricated can stand up to the constant rigor of a very wide variety of robots across hundreds of trials in less than a week. That’s why RoboCupRescue is an excellent incubator for standard test methods too.

Is there anything else you’d like to highlight about the League?

One of the key features of RoboCupRescue is that the test lanes fabricated for the competition typically become a robot test facility for a local organization, like a nearby emergency responder facility or university. It can be set up perpetually, or it can move periodically to support regional competitions within the country. We usually try to connect the local organizers with their nearest emergency responders who already have robots or are seeking to purchase robots. The Olympic Village is the best analogy because it stays behind after the competition is over to be useful to the local region. So as we host our 25th year of RoboCupRescue in Brazil, we add to our world map of standard test facilities and new friends that we can continue to collaborate with professionally, and hopefully help their regional emergency responders use robots to stay out of harm’s way.

About Adam

Adam Jacoff is a robotics research engineer at the National Institute of Standards and Technology (NIST) which is part of the U.S. Department of Commerce. Over the past thirty years he has developed a variety of innovative robots and directed evaluations of more than a hundred others in a range of sizes, including the first technology readiness level assessment of autonomous mobility for the U.S. Army’s Experimental Unmanned Vehicle (XUV) (2002-2003).
 
His current efforts are focused toward developing 50 standard test methods to objectively evaluate ground robots, aerial drones, and aquatic vehicles for public safety operations. He has conducted 50 international robot competitions using the test methods as challenge tasks to guide innovation and measure progress (2000-present), and more than a hundred robot exercises to refine and validate the test methods with emergency responders and robot manufacturers (2005-present). These include dozens of comprehensive robot evaluations using the test methods to quantify key capabilities guiding more than $200M of purchasing decisions for civilian and military organizations (2010-present). He is now validating use of the test methods as repeatable practice tasks to focus remote operator/pilot training and measure proficiency for credentialing (2015-present).  He received a B.S. degree in Mechanical Engineering from the University of Maryland and a M.S. degree in Computer Science from Johns Hopkins University.

Gearing up for RoboCupJunior: Interview with Ana Patrícia Magalhães

Action from RoboCupJunior Rescue at RoboCup 2024. Photo: RoboCup/Bart van Overbeeke.

The annual RoboCup event, where teams gather from across the globe to take part in competitions across a number of leagues, will this year take place in Brazil, from 15-21 July. An important part of the week is RoboCupJunior, which is designed to introduce RoboCup to school children, and sees hundreds of kids taking part in a variety of challenges across different leagues. This year, the lead organizer for RoboCupJunior is Ana Patrícia Magalhães. We caught up with her to find out how the preparations are going, what to expect at this year’s competition, and how RoboCup inspires communities.

Could you tell us about RoboCupJunior and the plans you have for the competition this year?

RoboCup will take place from 15-21 July, in Salvador, Brazil. We expect to receive people from more than 40 countries, across the Junior and Major Leagues. We are preparing everything to accommodate all the students taking part in RoboCupJunior, who will participate in the Junior Leagues of Soccer, Rescue and OnStage. They are children and teenagers, so we have organized shuttles to take them from the hotels to the convention center. We’ve also prepared a handbook with recommendations about security, places they can visit, places to eat. The idea is to provide all the necessary support for them, because they are so young. We’re also organizing a welcome party for the Juniors so that they can experience a little bit of our culture. It will hopefully be a good experience for them.

The Juniors will be located on the first level of the mezzanine at the convention center. They will be separate from the Major Leagues, who will be on the ground floor. Of course, they’ll be able to visit the Major Leagues, and talk to the students and other competitors there, but it will be nice for them to have their own space. There will also be some parents and teachers with them, so we decided to use this special, dedicated space.

RoboCupJunior On Stage at RoboCup 2024. Photo: RoboCup/Bart van Overbeeke.

Do you have any idea of roughly how many teams will be taking part?

Yes, so we’ll have about 48 teams in the Soccer Leagues, 86 teams in the Rescue Leagues, and 27 in OnStage. That’s a lot of teams. Each team has about three or four students, and many of the parents, teachers and professors travel with them too. In total, we expect about 600 people to be associated with RoboCupJunior.

RoboCupJunior Soccer at RoboCup 2024. Photo: RoboCup/Bart van Overbeeke.

Have you got more RoboCupJunior participants from Brazil this year due to the location?

Yes, we have many teams from Brazil competing. I don’t know the exact number, but there are definitely more Brazilian teams this year, because it’s a lot cheaper and easier for them to travel here. When we have competitions in other countries, it’s expensive for them. For example, I have a team here in Salvador that qualified for the super regional event in the US and our team couldn’t go. They had qualified, but they couldn’t go because they didn’t have money to pay for the ticket. Now, it will be possible for all the Brazilian teams qualified to participate because it’s cheaper for them to come here. So it’s a big opportunity for development and to live the RoboCup experience. It’s very important for children and teenagers to share their research, meet people from other countries, and see what they are doing, and what research path they are following. They are very grateful for the opportunity to have their work tested against others. In a competition, it is possible to compare your research with others. So it’s different from conferences where you present a paper and show your work, but it’s not possible to compare and evaluate the results with other similar work. In a competition you have this opportunity. It’s a good way to get insights and improve your research.

RoboCupJunior Rescue at RoboCup 2024. Photo: RoboCup/Bart van Overbeeke.

Your role at this RoboCup will be organizing RoboCupJunior. Are you also involved in the Major Leagues?

Yes, so my main role is organizing RoboCupJunior and I am also one of the chairs of the RoboCup Symposium. Besides, some teams from my laboratory are competing in the Major leagues. My team participates in the @Home league, but I haven’t had much time to help them recently, with all the preparations for RoboCup2025. Our laboratory also has teams from the 3d Simulation Soccer League, and the Flying Robots Demo. This will be the first time we’ll see a flying robot demo league at a RoboCup.

We’ll also have two junior teams from the Rescue Simulation League. They are very excited about taking part.

RoboCupJunior Rescue at RoboCup 2024. Photo: RoboCup/Bart van Overbeeke.

RoboCup was last held in Brazil in 2014, and I understand that there were quite a lot of new people that were inspired to join a team after that. Do you think the 2025 RoboCup will have the same effect and will inspire more people in Brazil to take part?

Yes, I hope so. The last one inspired many, many students. We could perceive the difference before and after RoboCup at that time, related to projects in schools. In 2014, RoboCup was held in João Pessoa, a city in the north east that is not as developed or populated as many other states in Brazil. It really improved the research in that place and the interest in robotics especially. After the 2014 RoboCup, we’ve had many projects submitted to the Brazilian RoboCup competition from that state every year. We believe that it was because of RoboCup being held there.

We hope that RoboCup2025 next month will have the same effect. We think it might have an even bigger impact, because there is more social media now and the news can spread a lot further. We are expecting many visitors. We will have a form where schools that want to visit can enroll on a guided visit of RoboCup. This will go live on the website next week, but we are already receiving many messages from schools asking how they can participate with their group. They are interested in the events, so we have high expectations.

We have been working on organizing RoboCup2025 for over a year, and there is still much to do. We are excited to receive everybody here, both for the competition and to see the city. We have a beautiful city on the coast, and some beautiful places to visit, so I recommend that people come and stay for some days after the competition to get to know our city.

About Ana Patrícia

Ana Patrícia F. Magalhães Mascarenhas received her PhD in Computer Science from the Federal University of Bahia (2016) and Master in Mechatronics from the Federal University of Bahia (2007). She is currently an adjunct professor at the State University of Bahia (UNEB) at the Information Systems course. She is a researcher and vice coordinator of the Center for Research in Computer Architecture, Intelligent Systems and Robotics (ACSO). Her current research focuses on service robotics and software engineering, especially related to the use of Artificial Intelligence (AI) in the software development process and in Model-Driven Development (DDM).

Preparing for kick-off at RoboCup2025: an interview with General Chair Marco Simões

The Salvador Convention Center, where RoboCup 2025 will take place.

RoboCup is an international scientific initiative with the goal of advancing the state of the art of intelligent robots, AI and automation. The annual RoboCup event, where teams gather from across the globe to take part in competitions across a number of leagues, will this year take place in Brazil, from 15-21 July. We spoke to Marco Simões, one of the General Chairs of RoboCup 2025 and President of RoboCup Brazil, to find out what plans they have for the event, some new initiatives, and how RoboCup has grown in Brazil over the past ten years.

Marco Simões

Could you give us a quick introduction to RoboCup 2025?

RoboCup will be held in Salvador, Brazil. When RoboCup was held in Brazil 11 years ago, in 2014, we had a total of 100,000 visitors, so that was a great success. This year, we expect even more, around 150,000, during all the events. Nowadays, AI and robotics are attracting more attention. We are also in a town, Salvador, with a bigger population than the previous location (João Pessoa). For these reasons, we estimate the attendance to be about 150,000 people.

Regarding the number of teams, registration has not closed yet, so we’re unsure about the final numbers. However, we expect to have about 300-400 teams and around 3000 competitors. We have been helping with visas, so we hope to see higher participation from teams who couldn’t attend in the previous two years due to visa issues. We are doing our best to ensure people can come and have fun at RoboCup!

This is also a great year for the RoboCup community: We have just agreed on new global league partners, including the Chinese companies Unitree, Fourier, and Booster Robotics. They will bring their humanoids and four-legged robots to RoboCup. These will not only be exhibited to the public but also used by some teams. They are amazing robots with very good skills. So, I think this will be an amazing edition of RoboCup this year.

Did the 2014 event in Brazil inspire more teams to participate in RoboCup?

Yes, we have seen a significant increase in our RoboCup community. In the last two years, Brazil has had the fourth-largest number of teams and participants at RoboCup in Bordeaux (2023) and Eindhoven (2024). This was a very big increase because ten years ago, we were not even in the top eight or nine.

We’ve made a significant effort with RoboCupJunior in the last ten years. Most people who’ve taken part in RoboCupJunior have carried on and joined the RoboCup Major League. So, the number of teams in Brazil has been increasing year by year over the last ten years. This year, we have a great number of participants because of the lower travel costs. We are expecting to be in the top three this year in terms of the highest number of participants.

Photo of participants at RoboCup 2024, which took place in Eindhoven. Photo credit: RoboCup/Bart van Overbeeke

It’s impressive that so many RoboCupJunior participants go on to join a Major League team.

Yes, we have an initiative here in Brazil called the Brazilian Robotics Olympiad. In this event, we chose two Junior Leagues – OnStage and the Rescue Line League – and we organized a competition based on these two Leagues. We run it in regional competitions all over Brazil – so 27 states. We organize at least one competition in each state during the year, and the best teams from each state come to the national competition together with the Major Leagues. We organize the Brazilian Olympiad to get RoboCupJunior to more students. This is how we’ve managed to increase participation in RoboCupJunior. Then, when students go to university, many of them continue to participate, but in the Major Leagues. So that’s a very successful strategy we’ve used in Brazil in the last 10 years.

Could you tell us about some more of the highlights from the Brazilian RoboCup community in recent years?

Two or three years ago, one of the Brazilian teams was the champion of RoboCup @Home. We have seen a big increase in the number of teams in the @Home League. In the national competition in Brazil, we have more than 12 teams participating. Back in 2014, we only had one team participating. So we’ve had a great increase—this League is one of the highlights in Brazil.

More teams are also participating in the Small Size League (part of the soccer League). Two years ago, one of the Brazilian teams was the champion of the division B of the Small Size League. So, over the last five years, we’ve seen some Brazilian teams in the top three positions in Major Leagues in the RoboCup world competition. This is a result of the increase in the number of teams and the quality of what the teams are developing. So at this time, we have an increased number of publications and teams participating in the competition with good results, so that’s very important.

Another excellent contribution for this year is a league we created five years ago – a flying robot league, where autonomous drones perform some missions and tasks. We’ve proposed this League as a demo for RoboCup2025, and we will have a Flying Robot Demo at the competition this year. This will be the first time we’ll have autonomous drones at the RoboCup competition, and the Brazilian community proposed it.

RoboCup @Home with Toyota HSR robots in the Domestic Standard Platform League, RoboCup 2024. Photo: RoboCup/Bart van Overbeeke.

Will you be taking part in the competition this year, or will you be focusing entirely on your role as General Chair?

This year, my laboratory (ACSO/Uneb) has qualified for the 3d Simulation League (soccer), the Flying Robot Demo, and RoboCup @Home, so we are participating in three Leagues. We also supervise RoboCupJunior Teams in the Rescue Simulation League. This year, my students have had only a little supervision from me because I’ve been very engaged with the organization.

In our 3D simulation team, we have lots of developments with deep reinforcement learning and some new novel strategies that allow our teams to gain new skills, and we are combining the new skills with our former multi-agent coordination strategy. For this reason, I think we will have a robust team in the competition because we are not only working on skills, we are also working on multi-agency strategies. When both aspects are joined, you can have a really good soccer team that plays very well. We have a good team and expect to achieve a greater position this year. In the latter years, we were in the top four or five, but we hope to get into the top three this year.

In 3D, you not only work on multi-agent aspects but also need to work on skills such as walking, kicking, and running. Teams are now trying to develop new skills. For example, in recent years, our team has developed the sprint running movement, which was a result of deep reinforcement learning. It is not a natural running motion but a running movement that works according to the League’s rules. It makes the robots go very fast from one point to another, making the team very competitive.

Most teams are learning skills but don’t know how to exploit them strategically in the game. Our focus is not only on creating new skills but also on using them strategically. We are currently working on a very innovative approach.

This year, the simulation league will run a challenge using a new simulator based on MuJoCo. If the challenge goes well, we may move to this new simulator in the following years, which can more realistically simulate real humanoid robots.

Action from the semi-finals of RoboCup Soccer Humanoid League at RoboCup 2024. Photo: RoboCup/Bart van Overbeeke.

Finally, is there anything else you’d like to highlight about RoboCup2025?

We are working on partnerships with local companies. For example, we have sponsorship from Petrobras, one of the biggest oil companies in the world. They will discuss how they are using robotics and AI in their industry. They were also one of the first sponsors of the Flying Robots League. It’s important to have these links between industry and the RoboCup community.

We also have excellent support from local companies and the government. They will be showing the community their latest developments. In the Rescue League, for example, we’ll have a demonstration from the local force showing what they do to support people in disaster situations.

This event is also an excellent opportunity for RoboCuppers, especially those who have never been to Brazil, to spend some days after the event in Salvador, visiting some tourist spots. Salvador was the first Brazilian capital, so we have a rich history. There are a lot of historical sites to see and some great entertainment options, such as beaches or parties. People can have fun and enjoy the country!

About Marco

Marco Simões is an Associate Professor at Bahia State University, Salvador, Brazil. He is the General Chair of RoboCup2025, and President of RoboCup Brazil.

Interview with Amar Halilovic: Explainable AI for robotics

In this interview series, we’re meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. The Doctoral Consortium provides an opportunity for a group of PhD students to discuss and explore their research interests and career objectives in an interdisciplinary workshop together with a panel of established researchers. In this latest interview, we hear from Amar Halilovic, a PhD student at Ulm University.

Tell us a bit about your PhD – where are you studying, and what is the topic of your research?

I’m currently a PhD student at Ulm University in Germany, where I focus on explainable AI for robotics. My research investigates how robots can generate explanations of their actions in a way that aligns with human preferences and expectations, particularly in navigation tasks.

Could you give us an overview of the research you’ve carried out so far during your PhD?

So far, I’ve developed a framework for environmental explanations of robot actions and decisions, especially when things go wrong. I have explored black-box and generative approaches for the generation of textual and visual explanations. Furthermore, I have been working on planning of different explanation attributes, such as timing, representation, duration, etc. Lately, I’ve been working on methods for dynamically selecting the best explanation strategy depending on the context and user preferences.

Is there an aspect of your research that has been particularly interesting?

Yes, I find it fascinating how people interpret robot behavior differently depending on the urgency or failure context. It’s been especially rewarding to study how explanation expectations shift in different situations and how we can tailor explanation timing and content accordingly.

What are your plans for building on your research so far during the PhD – what aspects will you be investigating next?

Next, I’ll be extending the framework to incorporate real-time adaptation, enabling robots to learn from user feedback and adjust their explanations on the fly. I’m also planning more user studies to validate the effectiveness of these explanations in real-world human-robot interaction settings.

Amar with his poster at the AAAI/SIGAI Doctoral Consortium at AAAI 2025.

What made you want to study AI, and, in particular, explainable robot navigation?

I’ve always been interested in the intersection of humans and machines. During my studies, I realized that making AI systems understandable isn’t just a technical challenge—it’s key to trust and usability. Robot navigation struck me as a particularly compelling area because decisions are spatial and visual, making explanations both challenging and impactful.

What advice would you give to someone thinking of doing a PhD in the field?

Pick a topic that genuinely excites you—you’ll be living with it for several years! Also, build a support network of mentors and peers. It’s easy to get lost in the technical work, but collaboration and feedback are vital.

Could you tell us an interesting (non-AI related) fact about you?

I have lived and studied in four different countries.

About Amar

Amar is a PhD student at the Institute of Artificial Intelligence of Ulm University in Germany. His research focuses on Explainable Artificial Intelligence (XAI) in Human-Robot Interaction (HRI), particularly how robots can generate context-sensitive explanations for navigation decisions. He combines symbolic planning and machine learning to build explainable robot systems that adapt to human preferences and different contexts. Before starting his PhD, he studied Electrical Engineering at the University of Sarajevo in Sarajevo, Bosnia and Herzegovina, and Computer Science at Mälardalen University in Västerås, Sweden. Outside academia, Amar enjoys travelling, photography, and exploring connections between technology and society.

Congratulations to the #AAMAS2025 best paper, best demo, and distinguished dissertation award winners

winners' medal

The AAMAS 2025 best paper and demo awards were presented at the 24th International Conference on Autonomous Agents and Multiagent Systems, which took place from 19-23 May 2025 in Detroit. The Distinguished Dissertation Award was also recently announced. The winners in the various categories are as follows:


Best Paper Award

Winner

  • Soft Condorcet Optimization for Ranking of General Agents, Marc Lanctot, Kate Larson, Michael Kaisers, Quentin Berthet, Ian Gemp, Manfred Diaz, Roberto-Rafael Maura-Rivero, Yoram Bachrach, Anna Koop, Doina Precup

Finalists

  • Azorus: Commitments over Protocols for BDI Agents, Amit K. Chopra, Matteo Baldoni, Samuel H. Christie V, Munindar P. Singh
  • Curiosity-Driven Partner Selection Accelerates Convention Emergence in Language Games, Chin-Wing Leung, Paolo Turrini, Ann Nowe
  • Reinforcement Learning-based Approach for Vehicle-to-Building Charging with Heterogeneous Agents and Long Term Rewards, Fangqi Liu, Rishav Sen, Jose Paolo Talusan, Ava Pettet, Aaron Kandel, Yoshinori Suzue, Ayan Mukhopadhyay, Abhishek Dubey
  • Ready, Bid, Go! On-Demand Delivery Using Fleets of Drones with Unknown, Heterogeneous Energy Storage Constraints, Mohamed S. Talamali, Genki Miyauchi, Thomas Watteyne, Micael Santos Couceiro, Roderich Gross

Pragnesh Jay Modi Best Student Paper Award

Winners

  • Decentralized Planning Using Probabilistic Hyperproperties, Francesco Pontiggia, Filip Macák, Roman Andriushchenko, Michele Chiari, Milan Ceska
  • Large Language Models for Virtual Human Gesture Selection, Parisa Ghanad Torshizi, Laura B. Hensel, Ari Shapiro, Stacy Marsella

Runner-up

  • ReSCOM: Reward-Shaped Curriculum for Efficient Multi-Agent Communication Learning, Xinghai Wei, Tingting Yuan, Jie Yuan, Dongxiao Liu, Xiaoming Fu

Finalists

  • Explaining Facial Expression Recognition, Sanjeev Nahulanthran, Leimin Tian, Dana Kulic, Mor Vered
  • Agent-Based Analysis of Green Disclosure Policies and Their Market-Wide Impact on Firm Behavior, Lingxiao Zhao, Maria Polukarov, Carmine Ventre

Blue Sky Ideas Track Best Paper Award

Winner

  • Grounding Agent Reasoning in Image Schemas: A Neurosymbolic Approach to Embodied Cognition, François Olivier, Zied Bouraoui

Finalist

  • Towards Foundation-model-based multiagent system to Accelerate AI for social impact, Yunfan Zhao, Niclas Boehmer, Aparna Taneja, Milind Tambe

Best Demo Award

Winner

  • Serious Games for Ethical Preference Elicitation, Jayati Deshmukh, Zijie Liang, Vahid Yazdanpanah, Sebastian Stein, Sarvapali Ramchurn

Victor Lesser Distinguished Dissertation Award

The Victor Lesser Distinguished Dissertation Award is given for dissertations in the field of autonomous agents and multiagent systems that show originality, depth, impact, as well as quality of writing, supported by high-quality publications.

Winner

  • Jannik Peters. Thesis title: Facets of Proportionality: Selecting Committees, Budgets, and Clusters

Runner-up

  • Lily Xu. Thesis title: High-stakes decisions from low-quality data: AI decision-making for planetary health
Page 1 of 3
1 2 3