RecursiveMAS

Scaling agent collaboration through latent-space recursion.

9f57f7fce0bc345ea787405f6c5fa162_raw.mp4

💡 News

[2026.06.28] Complete training implementations & data (🤗 HF Datasets), and inference & evaluation pipelines for RecursiveMAS are now available! Also checkout our updated project website!

[2026.06.25] Try it out! RecursiveMAS now has a 🧩YouTube tutorial and an 🎮interactive playground demo! Special thanks to @TwoMinutePapers and @vishalmysore!

[2026.05.24] Check out the VentureBeat article featuring our research on RecursiveMAS!

[2026.05.01] Ours paper is featured as 🤗 HuggingFace 1st Paper of the Week/Day!

[2026.04.28] All collaboration styles and model checkpoints, with examplified downstream inference are now available. Stay tuned for the complete training/inference pipeline and additional features!

[2026.04.28] We have released the RecursiveMAS paper!

🌟 Overview

RecursiveMAS is a multi-agent framework that scales agent collaboration through latent-space recursion. Rather than treating each LLM agent as an isolated module, RecursiveMAS casts the whole multi-agent system as a unified recursive computation.

Heterogeneous agents are connected by lightweight RecursiveLink modules that let them exchange, refine, and evolve latent states across recursion rounds.

Correspondingly, we design an Inner-Outer Loop training paradigm for progressive co-optimization. The inner loop provides a preliminary model-level warm start for each agent. The outer loop then trains the outer RecursiveLink across agents at the system-level.

Across 9 benchmarks spanning mathematics, science, medicine, search, and code generation, RecursiveMAS improves multi-agent coordination by recursively refining shared latent states, delivering stronger performance across sequential, mixture, distillation, and deliberation MAS systems.

📋 Supported Features

✅ Release All Collaboration Patterns (Sequential, Mixture, Deliberation, Distillation).

✅ Release Demo Code for Inference (Commands Provided Below).

✅ Release Complete Inference Pipeline Across All Downstreams.

✅ Release All Training Data & Pipeline Implementation.

☑️ Add Additional Supported Model Family & MAS Collaboration Patterns.

🧩 Repository Roadmap

RecursiveMAS/
├── README.md
├── requirements.txt
├── inference/              # inference pipeline and downstream tasks evaluation
│   ├── run.py
│   ├── README.md
│   ├── dataset/
│   └── inference_utils/
└── train/                  # inner-outer loop training pipeline
    ├── train_inner.py
    ├── train_outer.py
    ├── README.md
    ├── data/
    └── outer/

🛠️ Environment Setup

Create a clean Python environment and install all project requirements from the repository root:

conda create -n recursivemas python=3.10 -y
conda activate recursivemas

Install the required packages:

pip install -r requirements.txt

For Deliberation-style runs on the search datasets (bamboogle, hotpotqa), the Tool-Caller agent queries a real web-search API (e.g., Tavily). Please put your Search API key in a plain-text file and pass it with --tavily_keys_file:

# e.g., keys.txt
tvly-xxxxxxxxxxxxxxxxxxxxxxxx

To enable open-ended questions grading by an LLM judge (e.g., OpenAI-compatible API). Configure the LLM judge through the following environment variables:

export API_KEY=...          # bearer token for the judge endpoint
export API_BASE_URL=...     # OpenAI-compatible base or chat-completions URL
export API_MODEL=...        # judge model id

🚀 Quick Start

🤗 Plug-and-Play Reference Checkpoints

To play around with RecursiveMAS, you can download our reference checkpoints under the RecursiveMAS Hugging Face organization.

📌 Kind Note: The released Hugging Face checkpoints are provided for quick, plug-and-play exploration and as reference systems, but NOT a single replacement for the task-specific training setups used across the paper.

The paper covers different collaboration styles and task-specific data settings; To repduce full paper results, please follow the training and inference pipeline below for complete downstream tasks evaluation.

The checkpoints are organized by MAS collaboration styles. Each collection contains (i) the individual role-specific agent, and (ii) their (inner/outer) RecursiveLink modules:

1. Sequential-Style (Light) RecursiveMAS Collection

Agent Organization	Download
Sequential-Light-Planner-Qwen3-1.7B	🤗 HuggingFace
Sequential-Light-Critic-Llama3.2-1B	🤗 HuggingFace
Sequential-Light-Solver-Qwen2.5-Math-1.5B	🤗 HuggingFace
Sequential-Light-Outerlinks	🤗 HuggingFace

2. Sequential-Style (Scaled) RecursiveMAS Collection

Agent Organization	Download
Sequential-Scaled-Planner-Gemma3-4B	🤗 HuggingFace
Sequential-Scaled-Critic-Llama3.2-3B	🤗 HuggingFace
Sequential-Scaled-Solver-Qwen3.5-4B	🤗 HuggingFace
Sequential-Scaled-Outerlinks	🤗 HuggingFace

3. Mixture-Style RecursiveMAS Collection

Agent Organization	Download
Mixture-Math-DeepSeek-R1-Distill-Qwen-1.5B	🤗 HuggingFace
Mixture-Code-Qwen2.5-Coder-3B	🤗 HuggingFace
Mixture-Science-BioMistral-7B	🤗 HuggingFace
Mixture-Summarizer-Qwen3.5-2B	🤗 HuggingFace
Mixture-Outerlinks	🤗 HuggingFace

4. Distillation-Style RecursiveMAS Collection

Agent Organization	Download
Distillation-Expert-Qwen3.5-9B	🤗 HuggingFace
Distillation-Learner-Qwen3.5-4B	🤗 HuggingFace
Distillation-Outerlinks	🤗 HuggingFace

5. Deliberation-Style RecursiveMAS Collection

Agent Organization	Download
Deliberation-Reflector-Qwen3.5-4B	🤗 HuggingFace
Deliberation-Toolcaller-Qwen3.5-4B	🤗 HuggingFace
Deliberation-Outerlinks	🤗 HuggingFace

Here is an example of how to load the RecursiveMAS pipeline:

from system_loader import load_mas_system

mas = load_mas_system(
    style="sequential_light",
    device="cuda",
    trust_remote_code=True,
)

planner = mas.agents["planner"].model
critic = mas.agents["critic"].model
solver = mas.agents["solver"].model

To play around, you can run any collaboration styles by passing --style and --dataset. For example,

python inference/run.py \
  --style sequential_scaled \
  --dataset math500 \
  --device cuda

🧪 RecursiveMAS Training

To reproduce our experiments with task-specific configurations, please train the inner and outer RecursiveLink modules with the matching collaboration style and training data. The overall training includes two phases:

Inner-Loop Training (train/train_inner.py): train each agent role-specific inner RecursiveLink (frozen base model + a small ln_res_adapter).
Outer-Loop Training (train/train_outer.py): Connect all agents together and train the outer RecursiveLink between agents through recursion.

An example of the complete training pipeline is:

# Inner-Loop Training
python train/train_inner.py \
  --model_name_or_path Qwen/Qwen3-1.7B \
  --mas_design sequential \
  --mas_role planner \
  --mas_task math \
  --dataset_name RecursiveMAS/Sequential-Math \
  --save_dir train/ckpts/seq_light/planner_math

# Outer-Loop Training
python train/train_outer.py \
  --style sequential_light \
  --agent1_model_name_or_path Qwen/Qwen3-1.7B \
  --agent2_model_name_or_path meta-llama/Llama-3.2-1B-Instruct \
  --agent3_model_name_or_path Qwen/Qwen2.5-Math-1.5B-Instruct \
  --agent1_inner_aligner_path train/ckpts/seq_light/planner_math \
  --agent2_inner_aligner_path train/ckpts/seq_light/refiner_math \
  --agent3_inner_aligner_path train/ckpts/seq_light/solver_math \
  --mas_task math \
  --dataset_name RecursiveMAS/Sequential-Math \
  --save_dir train/ckpts/seq_light/outer_math

Additional detailed per-style commands are provided in our training guide (train/README.md).

🗂️ Training Data

We store all training data through Hugging Face datasets. Below is a concise overview of each training set, along with its corresponding description.

Dataset	Used by
🤗 RecursiveMAS/Sequential-Math	Sequential inner & outer loop training
🤗 RecursiveMAS/Sequential-Code	Sequential inner & outer loop training
🤗 RecursiveMAS/Distillation-Math	Distillation inner & outer loop training
🤗 RecursiveMAS/Distillation-Code	Distillation inner & outer loop training
🤗 RecursiveMAS/Mixture-Math	Mixture math expert inner loop training
🤗 RecursiveMAS/Mixture-Code	Mixture code expert inner loop training
🤗 RecursiveMAS/Mixture-Science	Mixture science expert inner loop training
🤗 RecursiveMAS/Mixture-Summarizer	Mixture summarizer inner loop training
🤗 RecursiveMAS/Mixture-Outer	Mixture outer loop training
🤗 RecursiveMAS/Deliberation	Deliberation inner & outer loop training

For complete details, please kindly refer to our training data guide (train/data/README.md).

🔎 Inference and Evaluation

Use inference/run.py to evaluate a released reference system or a locally trained, task-specific configuration.

For example,

# Evaluate Sequential Light Style RecursiveMAS on Math500
python inference/run.py \
  --style sequential_light \
  --dataset math500 \
  --device cuda \
  --ckpt_override planner=train/ckpts/seq_light/planner_math \
  --ckpt_override critic=train/ckpts/seq_light/refiner_math \
  --ckpt_override solver=train/ckpts/seq_light/solver_math \
  --ckpt_override outer=train/ckpts/seq_light/outer_math

🧪 Supported Downstream Tasks

Benchmark	Task	Metric
`math500`	math reasoning	accuracy
`gpqa`	graduate-level science	accuracy
`medqa`	medical QA	accuracy
`mbppplus`	code generation	test pass rate
`aime25`, `aime26`	competition math	pass@10
`livecodebench`	code generation	pass@1
`bamboogle`, `hotpotqa`	open-domain search QA	EM/LLM-as-Judge

For complete influence and evaluation details, please kindly refer to our inference guide (inference/README.md).

📊 Experiment Results

To reproduce the paper’s results, train the corresponding collaboration style and data configuration, then run the provided inference pipeline using the resulting checkpoints.

In the following tables, we provide one single-run results across different RecursiveMAS collaboration styles and downstream tasks as references.

Sequential-Scaled

math500	gpqa	medqa	aime25	aime26	livecodebench
88.5	65.7	82.7	86.7	90.0	42.1

Sequential-Light

math500	gpqa	medqa	mbppplus	aime25	aime26
78.0	32.3	32.0	37.3	33.3	20.0

Distillation

gpqa	medqa	mbppplus	aime26	livecodebench
68.7	82.7	72.6	86.7	43.0

Mixture

gpqa	medqa	aime26	livecodebench
42.7	61.3	46.7	22.8

Deliberation

gpqa	aime26	bamboogle	hotpotqa
65.3	90.0	54.4	43.6

🙏 Acknowledgements

This project is built upon the excellent open-source community, including vLLM, ARPO, and TextGrad.

We welcome discussions and contributions to RecursiveMAS! If you would like to suggest improvements, please feel free to send a pull request or contact us through email!

📚 Citation

@misc{recursivemas,
      title={Recursive Multi-Agent Systems},
      author={Xiyuan Yang and Jiaru Zou and Rui Pan and Ruizhong Qiu and Pan Lu and Shizhe Diao and Jindong Jiang and Hanghang Tong and Tong Zhang and Markus J. Buehler and Jingrui He and James Zou},
      year={2026},
      eprint={2604.25917},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2604.25917},
}

🌟 Star History

Please kindly give us a GitHub Star ⭐️ if you find our project is helpful!

Thanks a lot for your interest in our project! 😊

Provide feedback

Saved searches

Use saved searches to filter your results more quickly