Skip to content
View RecursiveMAS's full-sized avatar

Block or report RecursiveMAS

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
RecursiveMAS/README.md

RecursiveMAS

Scaling agent collaboration through latent-space recursion.

Arxiv Playground Demo Website HF Daily Paper Huggingface Collection Hugging Face Models Hugging Face Datasets LinkedIn Coverage Twitter Coverage VentureBeat Coverage Youtube

9f57f7fce0bc345ea787405f6c5fa162_raw.mp4

💡 News

[2026.06.28] Complete training implementations & data (🤗 HF Datasets), and inference & evaluation pipelines for RecursiveMAS are now available! Also checkout our updated project website!

[2026.06.25] Try it out! RecursiveMAS now has a 🧩YouTube tutorial and an 🎮interactive playground demo! Special thanks to @TwoMinutePapers and @vishalmysore!

[2026.05.24] Check out the VentureBeat article featuring our research on RecursiveMAS!

[2026.05.01] Ours paper is featured as 🤗 HuggingFace 1st Paper of the Week/Day!

[2026.04.28] All collaboration styles and model checkpoints, with examplified downstream inference are now available. Stay tuned for the complete training/inference pipeline and additional features!

[2026.04.28] We have released the RecursiveMAS paper!

🌟 Overview

RecursiveMAS is a multi-agent framework that scales agent collaboration through latent-space recursion. Rather than treating each LLM agent as an isolated module, RecursiveMAS casts the whole multi-agent system as a unified recursive computation.

RecursiveMAS Overview

Heterogeneous agents are connected by lightweight RecursiveLink modules that let them exchange, refine, and evolve latent states across recursion rounds.

RecursiveMAS Overview

Correspondingly, we design an Inner-Outer Loop training paradigm for progressive co-optimization. The inner loop provides a preliminary model-level warm start for each agent. The outer loop then trains the outer RecursiveLink across agents at the system-level.

RecursiveMAS Overview

Across 9 benchmarks spanning mathematics, science, medicine, search, and code generation, RecursiveMAS improves multi-agent coordination by recursively refining shared latent states, delivering stronger performance across sequential, mixture, distillation, and deliberation MAS systems.

📋 Supported Features

✅ Release All Collaboration Patterns (Sequential, Mixture, Deliberation, Distillation).

✅ Release Demo Code for Inference (Commands Provided Below).

✅ Release Complete Inference Pipeline Across All Downstreams.

✅ Release All Training Data & Pipeline Implementation.

☑️ Add Additional Supported Model Family & MAS Collaboration Patterns.

🧩 Repository Roadmap

RecursiveMAS/
├── README.md
├── requirements.txt
├── inference/              # inference pipeline and downstream tasks evaluation
│   ├── run.py
│   ├── README.md
│   ├── dataset/
│   └── inference_utils/
└── train/                  # inner-outer loop training pipeline
    ├── train_inner.py
    ├── train_outer.py
    ├── README.md
    ├── data/
    └── outer/

🛠️ Environment Setup

Create a clean Python environment and install all project requirements from the repository root:

conda create -n recursivemas python=3.10 -y
conda activate recursivemas

Install the required packages:

pip install -r requirements.txt

For Deliberation-style runs on the search datasets (bamboogle, hotpotqa), the Tool-Caller agent queries a real web-search API (e.g., Tavily). Please put your Search API key in a plain-text file and pass it with --tavily_keys_file:

# e.g., keys.txt
tvly-xxxxxxxxxxxxxxxxxxxxxxxx

To enable open-ended questions grading by an LLM judge (e.g., OpenAI-compatible API). Configure the LLM judge through the following environment variables:

export API_KEY=...          # bearer token for the judge endpoint
export API_BASE_URL=...     # OpenAI-compatible base or chat-completions URL
export API_MODEL=...        # judge model id

🚀 Quick Start

🤗 Plug-and-Play Reference Checkpoints

To play around with RecursiveMAS, you can download our reference checkpoints under the RecursiveMAS Hugging Face organization.

📌 Kind Note: The released Hugging Face checkpoints are provided for quick, plug-and-play exploration and as reference systems, but NOT a single replacement for the task-specific training setups used across the paper.

The paper covers different collaboration styles and task-specific data settings; To repduce full paper results, please follow the training and inference pipeline below for complete downstream tasks evaluation.

The checkpoints are organized by MAS collaboration styles. Each collection contains (i) the individual role-specific agent, and (ii) their (inner/outer) RecursiveLink modules:

Agent Organization Download
Sequential-Light-Planner-Qwen3-1.7B 🤗 HuggingFace
Sequential-Light-Critic-Llama3.2-1B 🤗 HuggingFace
Sequential-Light-Solver-Qwen2.5-Math-1.5B 🤗 HuggingFace
Sequential-Light-Outerlinks 🤗 HuggingFace
Agent Organization Download
Sequential-Scaled-Planner-Gemma3-4B 🤗 HuggingFace
Sequential-Scaled-Critic-Llama3.2-3B 🤗 HuggingFace
Sequential-Scaled-Solver-Qwen3.5-4B 🤗 HuggingFace
Sequential-Scaled-Outerlinks 🤗 HuggingFace
Agent Organization Download
Mixture-Math-DeepSeek-R1-Distill-Qwen-1.5B 🤗 HuggingFace
Mixture-Code-Qwen2.5-Coder-3B 🤗 HuggingFace
Mixture-Science-BioMistral-7B 🤗 HuggingFace
Mixture-Summarizer-Qwen3.5-2B 🤗 HuggingFace
Mixture-Outerlinks 🤗 HuggingFace
Agent Organization Download
Distillation-Expert-Qwen3.5-9B 🤗 HuggingFace
Distillation-Learner-Qwen3.5-4B 🤗 HuggingFace
Distillation-Outerlinks 🤗 HuggingFace
Agent Organization Download
Deliberation-Reflector-Qwen3.5-4B 🤗 HuggingFace
Deliberation-Toolcaller-Qwen3.5-4B 🤗 HuggingFace
Deliberation-Outerlinks 🤗 HuggingFace

Here is an example of how to load the RecursiveMAS pipeline:

from system_loader import load_mas_system

mas = load_mas_system(
    style="sequential_light",
    device="cuda",
    trust_remote_code=True,
)

planner = mas.agents["planner"].model
critic = mas.agents["critic"].model
solver = mas.agents["solver"].model

To play around, you can run any collaboration styles by passing --style and --dataset. For example,

python inference/run.py \
  --style sequential_scaled \
  --dataset math500 \
  --device cuda

🧪 RecursiveMAS Training

To reproduce our experiments with task-specific configurations, please train the inner and outer RecursiveLink modules with the matching collaboration style and training data. The overall training includes two phases:

  1. Inner-Loop Training (train/train_inner.py): train each agent role-specific inner RecursiveLink (frozen base model + a small ln_res_adapter).
  2. Outer-Loop Training (train/train_outer.py): Connect all agents together and train the outer RecursiveLink between agents through recursion.

An example of the complete training pipeline is:

# Inner-Loop Training
python train/train_inner.py \
  --model_name_or_path Qwen/Qwen3-1.7B \
  --mas_design sequential \
  --mas_role planner \
  --mas_task math \
  --dataset_name RecursiveMAS/Sequential-Math \
  --save_dir train/ckpts/seq_light/planner_math

# Outer-Loop Training
python train/train_outer.py \
  --style sequential_light \
  --agent1_model_name_or_path Qwen/Qwen3-1.7B \
  --agent2_model_name_or_path meta-llama/Llama-3.2-1B-Instruct \
  --agent3_model_name_or_path Qwen/Qwen2.5-Math-1.5B-Instruct \
  --agent1_inner_aligner_path train/ckpts/seq_light/planner_math \
  --agent2_inner_aligner_path train/ckpts/seq_light/refiner_math \
  --agent3_inner_aligner_path train/ckpts/seq_light/solver_math \
  --mas_task math \
  --dataset_name RecursiveMAS/Sequential-Math \
  --save_dir train/ckpts/seq_light/outer_math

Additional detailed per-style commands are provided in our training guide (train/README.md).

🗂️ Training Data

We store all training data through Hugging Face datasets. Below is a concise overview of each training set, along with its corresponding description.

Dataset Used by
🤗 RecursiveMAS/Sequential-Math Sequential inner & outer loop training
🤗 RecursiveMAS/Sequential-Code Sequential inner & outer loop training
🤗 RecursiveMAS/Distillation-Math Distillation inner & outer loop training
🤗 RecursiveMAS/Distillation-Code Distillation inner & outer loop training
🤗 RecursiveMAS/Mixture-Math Mixture math expert inner loop training
🤗 RecursiveMAS/Mixture-Code Mixture code expert inner loop training
🤗 RecursiveMAS/Mixture-Science Mixture science expert inner loop training
🤗 RecursiveMAS/Mixture-Summarizer Mixture summarizer inner loop training
🤗 RecursiveMAS/Mixture-Outer Mixture outer loop training
🤗 RecursiveMAS/Deliberation Deliberation inner & outer loop training

For complete details, please kindly refer to our training data guide (train/data/README.md).

🔎 Inference and Evaluation

Use inference/run.py to evaluate a released reference system or a locally trained, task-specific configuration.

For example,

# Evaluate Sequential Light Style RecursiveMAS on Math500
python inference/run.py \
  --style sequential_light \
  --dataset math500 \
  --device cuda \
  --ckpt_override planner=train/ckpts/seq_light/planner_math \
  --ckpt_override critic=train/ckpts/seq_light/refiner_math \
  --ckpt_override solver=train/ckpts/seq_light/solver_math \
  --ckpt_override outer=train/ckpts/seq_light/outer_math

🧪 Supported Downstream Tasks

Benchmark Task Metric
math500 math reasoning accuracy
gpqa graduate-level science accuracy
medqa medical QA accuracy
mbppplus code generation test pass rate
aime25, aime26 competition math pass@10
livecodebench code generation pass@1
bamboogle, hotpotqa open-domain search QA EM/LLM-as-Judge

For complete influence and evaluation details, please kindly refer to our inference guide (inference/README.md).

📊 Experiment Results

To reproduce the paper’s results, train the corresponding collaboration style and data configuration, then run the provided inference pipeline using the resulting checkpoints.

In the following tables, we provide one single-run results across different RecursiveMAS collaboration styles and downstream tasks as references.

Sequential-Scaled

math500 gpqa medqa aime25 aime26 livecodebench
88.5 65.7 82.7 86.7 90.0 42.1

Sequential-Light

math500 gpqa medqa mbppplus aime25 aime26
78.0 32.3 32.0 37.3 33.3 20.0

Distillation

gpqa medqa mbppplus aime26 livecodebench
68.7 82.7 72.6 86.7 43.0

Mixture

gpqa medqa aime26 livecodebench
42.7 61.3 46.7 22.8

Deliberation

gpqa aime26 bamboogle hotpotqa
65.3 90.0 54.4 43.6

🙏 Acknowledgements

This project is built upon the excellent open-source community, including vLLM, ARPO, and TextGrad.

We welcome discussions and contributions to RecursiveMAS! If you would like to suggest improvements, please feel free to send a pull request or contact us through email!

📚 Citation

@misc{recursivemas,
      title={Recursive Multi-Agent Systems},
      author={Xiyuan Yang and Jiaru Zou and Rui Pan and Ruizhong Qiu and Pan Lu and Shizhe Diao and Jindong Jiang and Hanghang Tong and Tong Zhang and Markus J. Buehler and Jingrui He and James Zou},
      year={2026},
      eprint={2604.25917},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2604.25917},
}

🌟 Star History

Please kindly give us a GitHub Star ⭐️ if you find our project is helpful!

Star History Chart

Thanks a lot for your interest in our project! 😊

Pinned Loading

  1. RecursiveMAS RecursiveMAS Public

    Offical Implementation for "Recursive Multi-Agent Systems"

    Python 840 111