Page 10 of 523
1 8 9 10 11 12 523

Ultra-light robotic prosthetic hand enables efficient and stable grasping through simple control

An ultra-light robotic prosthetic hand has been developed that enables both precision fingertip control and shape-adaptive gripping through simple motion commands. The hand features an innovative mechanism that allows two degrees of freedom in the thumb (adduction/abduction and flexion/extension) to be independently controlled by a single actuator, achieving exceptional user convenience and weight reduction.

Congratulations to the #AAMAS2025 best paper, best demo, and distinguished dissertation award winners

winners' medal

The AAMAS 2025 best paper and demo awards were presented at the 24th International Conference on Autonomous Agents and Multiagent Systems, which took place from 19-23 May 2025 in Detroit. The Distinguished Dissertation Award was also recently announced. The winners in the various categories are as follows:


Best Paper Award

Winner

  • Soft Condorcet Optimization for Ranking of General Agents, Marc Lanctot, Kate Larson, Michael Kaisers, Quentin Berthet, Ian Gemp, Manfred Diaz, Roberto-Rafael Maura-Rivero, Yoram Bachrach, Anna Koop, Doina Precup

Finalists

  • Azorus: Commitments over Protocols for BDI Agents, Amit K. Chopra, Matteo Baldoni, Samuel H. Christie V, Munindar P. Singh
  • Curiosity-Driven Partner Selection Accelerates Convention Emergence in Language Games, Chin-Wing Leung, Paolo Turrini, Ann Nowe
  • Reinforcement Learning-based Approach for Vehicle-to-Building Charging with Heterogeneous Agents and Long Term Rewards, Fangqi Liu, Rishav Sen, Jose Paolo Talusan, Ava Pettet, Aaron Kandel, Yoshinori Suzue, Ayan Mukhopadhyay, Abhishek Dubey
  • Ready, Bid, Go! On-Demand Delivery Using Fleets of Drones with Unknown, Heterogeneous Energy Storage Constraints, Mohamed S. Talamali, Genki Miyauchi, Thomas Watteyne, Micael Santos Couceiro, Roderich Gross

Pragnesh Jay Modi Best Student Paper Award

Winners

  • Decentralized Planning Using Probabilistic Hyperproperties, Francesco Pontiggia, Filip Macák, Roman Andriushchenko, Michele Chiari, Milan Ceska
  • Large Language Models for Virtual Human Gesture Selection, Parisa Ghanad Torshizi, Laura B. Hensel, Ari Shapiro, Stacy Marsella

Runner-up

  • ReSCOM: Reward-Shaped Curriculum for Efficient Multi-Agent Communication Learning, Xinghai Wei, Tingting Yuan, Jie Yuan, Dongxiao Liu, Xiaoming Fu

Finalists

  • Explaining Facial Expression Recognition, Sanjeev Nahulanthran, Leimin Tian, Dana Kulic, Mor Vered
  • Agent-Based Analysis of Green Disclosure Policies and Their Market-Wide Impact on Firm Behavior, Lingxiao Zhao, Maria Polukarov, Carmine Ventre

Blue Sky Ideas Track Best Paper Award

Winner

  • Grounding Agent Reasoning in Image Schemas: A Neurosymbolic Approach to Embodied Cognition, François Olivier, Zied Bouraoui

Finalist

  • Towards Foundation-model-based multiagent system to Accelerate AI for social impact, Yunfan Zhao, Niclas Boehmer, Aparna Taneja, Milind Tambe

Best Demo Award

Winner

  • Serious Games for Ethical Preference Elicitation, Jayati Deshmukh, Zijie Liang, Vahid Yazdanpanah, Sebastian Stein, Sarvapali Ramchurn

Victor Lesser Distinguished Dissertation Award

The Victor Lesser Distinguished Dissertation Award is given for dissertations in the field of autonomous agents and multiagent systems that show originality, depth, impact, as well as quality of writing, supported by high-quality publications.

Winner

  • Jannik Peters. Thesis title: Facets of Proportionality: Selecting Committees, Budgets, and Clusters

Runner-up

  • Lily Xu. Thesis title: High-stakes decisions from low-quality data: AI decision-making for planetary health

Mid-air transformation helps flying, rolling robot to transition smoothly

Engineers have developed a real-life Transformer that has the 'brains' to morph in midair, allowing the drone-like robot to smoothly roll away and begin its ground operations without pause. The increased agility and robustness of such robots could be particularly useful for commercial delivery systems and robotic explorers.

AI meets game theory: How language models perform in human-like social scenarios

Large language models (LLMs) -- the advanced AI behind tools like ChatGPT -- are increasingly integrated into daily life, assisting with tasks such as writing emails, answering questions, and even supporting healthcare decisions. But can these models collaborate with others in the same way humans do? Can they understand social situations, make compromises, or establish trust? A new study reveals that while today's AI is smart, it still has much to learn about social intelligence.

Designing Pareto-optimal GenAI workflows with syftr

You’re not short on tools. Or models. Or frameworks.

What you’re short on is a principled way to use them — at scale.

Building effective generative AI workflows, especially agentic ones, means navigating a combinatorial explosion of choices.

Every new retriever, prompt strategy, text splitter, embedding model, or synthesizing LLM multiplies the space of possible workflows, resulting in a search space with over 10²³ possible configurations. 

Trial-and-error doesn’t scale. And model-level benchmarks don’t reflect how components behave when stitched into full systems.

That’s why we built syftr — an open source framework for automatically identifying Pareto-optimal workflows across accuracy, cost, and latency constraints.

The complexity behind generative AI workflows

To illustrate how quickly complexity compounds, consider even a relatively simple RAG pipeline like the one shown in Figure 1.

Each component—retriever, prompt strategy, embedding model, text splitter, synthesizing LLM—requires careful selection and tuning. And beyond those decisions, there’s an expanding landscape of end-to-end workflow strategies, from single-agent workflows like ReAct and LATS to multi-agent workflows like CaptainAgent and Magentic-One

Figure 1 syftr blog post
Figure 1. Even a simple AI workflow requires selecting and testing multiple components and hyperparameters.

What’s missing is a scalable, principled way to explore this configuration space.

That’s where syftr comes in.

Its open source framework uses multi-objective Bayesian Optimization to efficiently search for Pareto-optimal RAG workflows, balancing cost, accuracy, and latency across configurations that would be impossible to test manually.

Benchmarking Pareto-optimal workflows with syftr

Once syftr is applied to a workflow configuration space, it surfaces candidate pipelines that achieve strong tradeoffs across key performance metrics.

The example below shows syftr’s output on the CRAG (Comprehensive RAG) Sports benchmark, highlighting workflows that maintain high accuracy while significantly reducing cost.

Fogire 2 syftr blog post
Figure 2. syftr searches across a large workflow configuration space to identify Pareto-optimal RAG workflows — agentic and non-agentic — that balance accuracy and cost. On the CRAG Sports benchmark, syftr identifies workflows that match the accuracy of top-performing configurations while reducing cost by nearly two orders of magnitude.

While Figure 2 shows what syftr can deliver, it’s equally important to understand how those results are achieved. 

At the core of syftr is a multi-objective search process designed to efficiently navigate vast workflow configuration spaces. The framework prioritizes both performance and computational efficiency – essential requirements for real-world experimentation at scale.

Figure 3 syftr using multi objective Bayesian Optimization
Figure 3. syftr uses multi-objective Bayesian Optimization (BO) to search across a space of approximately 10²³ unique workflows.

Since evaluating every workflow in this space isn’t feasible, we typically evaluate around 500 workflows per run.

To make this process even more efficient, syftr includes a novel early stopping mechanism — Pareto Pruner — which halts evaluation of workflows that are unlikely to improve the Pareto frontier. This significantly reduces computational cost and search time while preserving result quality. 

Why current benchmarks aren’t enough

While model benchmarks, like MMLU, LiveBench, Chatbot Arena, and the Berkeley Function-Calling Leaderboard, have advanced our understanding of isolated model capabilities, foundation models rarely operate alone in real-world production environments.

Instead, they’re typically one component — albeit an essential one — within larger, sophisticated AI systems.

Measuring intrinsic model performance is critical, but it leaves open critical system-level questions: 

  • How do you construct a workflow that meets task-specific goals for accuracy, latency, and cost?
  • Which models should you use—and in which parts of the pipeline?


syftr addresses this gap by enabling automated, multi-objective evaluation across entire workflows.

It captures nuanced tradeoffs that emerge only when components interact within a broader pipeline, and systematically explores configuration spaces that are otherwise impractical to evaluate manually.

syftr is the first open-source framework specifically designed to automatically identify Pareto-optimal generative AI workflows that balance multiple competing objectives simultaneously — not just accuracy, but latency and cost as well.

It draws inspiration from existing research, including:

  • AutoRAG, which focuses solely on optimizing for accuracy
  • Kapoor et al. ‘s work, AI Agents That Matter, which emphasizes cost-controlled evaluation to prevent incentivizing overly costly, leaderboard-focused agents. This principle serves as one of our core research inspirations. 


Importantly, syftr is also orthogonal to LLM-as-optimizer frameworks like Trace and TextGrad, and generic flow optimizers like DSPy. Such frameworks can be combined with syftr to further optimize prompts in workflows. 

In early experiments, syftr first identified Pareto-optimal workflows on the CRAG Sports benchmark.

We then applied Trace to optimize prompts across all of those configurations — taking a two-stage approach: multi-objective workflow search followed by fine-grained prompt tuning.

The result: notable accuracy improvements, especially in low-cost workflows that initially exhibited lower accuracy (those clustered in the lower-left of the Pareto frontier). These gains suggest that post-hoc prompt optimization can meaningfully boost performance, even in highly cost-constrained settings.

This two-stage approach — first multi-objective configuration search, then prompt refinement — highlights the benefits of combining syftr with specialized downstream tools, enabling modular and flexible workflow optimization strategies.

Figure 4 prompt optimization with Trace further improves Pareto optimal flows identified by syftr
Figure 4. Prompt optimization with Trace further improves Pareto-optimal flows identified by syftr. In the CRAG Sports benchmark shown here, using Trace significantly enhanced the accuracy of lower-cost workflows, shifting the Pareto frontier upward.

Building and extending syftr’s search space

Syftr cleanly separates the workflow search space from the underlying optimization algorithm. This modular design enables users to easily extend or customize the space, adding or removing flows, models, and components by editing configuration files.

The default implementation uses Multi-Objective Tree-of-Parzen-Estimators (MOTPE), but syftr supports swapping in other optimization strategies.

Contributions of new flows, modules, or algorithms are welcomed via pull request at github.com/datarobot/syftr.

Figure 5 syftr blog post
Figure 5. The current search space includes both agentic workflows (e.g., SubQuestion RAG, Critique RAG, ReAct RAG, LATS) and non-agentic RAG pipelines. Agentic workflows use non-agentic flows as subcomponents. The full space contains ~10²³ configurations.

Built on the shoulders of open source

syftr builds on a number of powerful open source libraries and frameworks:

  • Ray for distributing and scaling search over large clusters of CPUs and GPUs
  • Ray Serve for autoscaling model hosting
  • Optuna for its flexible define-by-run interface (similar to PyTorch’s eager execution) and support for state-of-the-art multi-objective optimization algorithms
  • LlamaIndex for building sophisticated agentic and non-agentic RAG workflows
  • HuggingFace Datasets for fast, collaborative, and uniform dataset interface
  • Trace for optimizing textual components within workflows, such as prompts


syftr is framework-agnostic: workflows can be constructed using any orchestration library or modeling stack. This flexibility allows users to extend or adapt syftr to fit a wide variety of tooling preferences.

Case study: syftr on CRAG Sports

Benchmark setup

The CRAG benchmark dataset was introduced by Meta for the KDD Cup 2024 and includes three tasks:

  • Task 1: Retrieval summarization
  • Task 2: Knowledge graph and web retrieval
  • Task 3: End-to-end RAG

syftr was evaluated on Task 3 (CRAG3), which includes 4,400 QA pairs spanning a wide range of topics. The official benchmark performs RAG over 50 webpages retrieved for each question. 

To increase difficulty, we combined all webpages across all questions into a single corpus, creating a more realistic, challenging retrieval setting.

Figure 6 pareto optimal flows discovered by syftr on CRAG Task 3
Figure 6. Pareto-optimal flows discovered by syftr on CRAG Task 3 (Sports dataset). syftr identifies workflows that are both more accurate and significantly cheaper than a default RAG pipeline built in LlamaIndex (white box). It also outperforms Amazon Q on the same task—an expected result, given that Q is built for general-purpose usage while syftr is tuned for the dataset. This highlights a key insight: custom flows can meaningfully outperform off-the-shelf solutions, especially in cost-sensitive, accuracy-critical applications.

Note: Amazon Q pricing uses a per-user/month pricing model, which differs from the per-query token-based cost estimates used for syftr workflows.

Key observations and insights

Across datasets, syftr consistently surfaces meaningful optimization patterns:

  • Non-agentic workflows dominate the Pareto frontier. They’re faster and cheaper, leading the optimizer to favor these configurations more frequently than agentic ones.
  • GPT-4o-mini frequently appears in Pareto-optimal flows, suggesting it offers a strong balance of quality and cost as a synthesizing LLM.
  • Reasoning models like o3-mini perform well on quantitative tasks (e.g., FinanceBench, InfiniteBench), likely due to their multi-hop reasoning capabilities.
  • Pareto frontiers eventually flatten after an initial rise, with diminishing returns in accuracy relative to steep cost increases, underscoring the need for tools like syftr that help pinpoint efficient operating points.

    We routinely find that the workflow at the knee point of the Pareto frontier loses just a few percentage points in accuracy compared to the most accurate setup — while being 10x cheaper.

    syftr makes it easy to find that sweet spot.

Cost of running syftr

In our experiments, we allocated a budget of ~500 workflow evaluations per task. Although exact costs vary based on the dataset and search space complexity, we consistently identified strong Pareto frontiers with a one-time search cost of approximately $500 per use case.

We expect this cost to decrease as more efficient search algorithms and space definitions are developed.

Importantly, this initial investment is minimal relative to the long-term gains from deploying optimized workflows, whether through reduced compute usage, improved accuracy, or better user experience in high-traffic systems.

For detailed results across six benchmark tasks, including datasets beyond CRAG, refer to the full syftr paper. 

Getting started and contributing

To get started with syftr, clone or fork the repository on GitHub. Benchmark datasets are available on HuggingFace, and syftr also supports user-defined datasets for custom experimentation.

The current search space includes:

  • 9 proprietary LLMs
  • 11 embedding models
  • 4 general prompt strategies
  • 3 retrievers
  • 4 text splitters (with parameter configurations)
  • 4 agentic RAG flows and 1 non-agentic RAG flow, each with associated hierarchical hyperparameters


New components, such as models, flows, or search modules, can be added or modified via configuration files. Detailed walkthroughs are available to support customization.

syftr is developed fully in the open. We welcome contributions via pull requests, feature proposals, and benchmark reports. We’re particularly interested in ideas that advance the research direction or improve the framework’s extensibility.

What’s ahead for syftr

syftr is still evolving, with several active areas of research designed to extend its capabilities and practical impact:

  • Meta-learning
    Currently, each search is performed from scratch. We’re exploring meta-learning techniques that leverage prior runs across similar tasks to accelerate and guide future searches.
  • Multi-agent workflow evaluation
    While multi-agent systems are gaining traction, they introduce additional complexity and cost. We’re investigating how these workflows compare to single-agent and non-agentic pipelines, and when their tradeoffs are justified.
  • Composability with prompt optimization frameworks
    syftr is complementary to tools like DSPy, Trace, and TextGrad, which optimize textual components within workflows. We’re exploring ways to more deeply integrate these systems to jointly optimize structure and language.
  • More agentic tasks
    We started with question-answer tasks, a critical production use case for agents. Next, we plan to rapidly expand syftr’s task repertoire to code generation, data analysis, and interpretation. We also invite the community to suggest additional tasks for syftr to prioritize.


As these efforts progress, we aim to expand syftr’s value as a research tool, a benchmarking framework, and a practical assistant for system-level generative AI design.

If you’re working in this space, we welcome your feedback, ideas, and contributions.

Try the code, read the research

To explore syftr further, check out the GitHub repository or read the full paper on ArXiv for details on methodology and results.

Syftr has been accepted to appear at the International Conference on Automated Machine Learning (AutoML) in September, 2025 in New York City.

We look forward to seeing what you build and discovering what’s next, together.

The post Designing Pareto-optimal GenAI workflows with syftr appeared first on DataRobot.

Page 10 of 523
1 8 9 10 11 12 523