Swarm AI for prediction markets replaces single-model forecasting with thousands of AI agents that interact, form opinions, and produce emergent consensus. MiroFish—an open-source engine built in 10 days by a 20-year-old student, now at 44,200+ GitHub stars—is the most visible implementation. Whether collective intelligence reliably outperforms individual models remains an open empirical question with no published benchmarks.

What Is Swarm AI for Prediction Markets?

Traditional AI forecasting asks a single model: “What probability do you assign to X?” The model reasons from training data, returns a number, and that’s the prediction. The problem is well-documented: individual models share systematic biases, hallucinate calibration confidence, and have no mechanism to reconcile conflicting private signals.

Swarm intelligence takes a different approach. Modeled on how biological systems—bird flocks, ant colonies, neural networks—reach collective decisions no individual member could compute alone, swarm AI deploys many agents with independent perspectives and lets consensus emerge from their interactions. The output is not an average of individual estimates; it is what survives the social dynamics.

Applied to prediction markets, this means simulating thousands of agents with distinct personas, memories, and social positions. Rather than asking “what will happen?” directly, a swarm system asks “what would a realistic population of people with varying beliefs and information asymmetries conclude after debating this question?” The distinction matters: emergence is not aggregation.

How MiroFish Works: The Five-Stage Pipeline

MiroFish is built on OASIS (Open Agent Social Interaction Simulations), a simulation engine from CAMEL-AI that demonstrated emergent social behaviors at 1 million agents with 23 defined social action types. (Yang, Z., Zhang, Z., et al. “OASIS: Open Agent Social Interaction Simulations with One Million Agents.” arXiv

.11581, November 2024) MiroFish packages OASIS into a deployable tool with a Vue frontend, adding domain-specific components for prediction workflows.

The architecture runs five stages:

Stage 1 — Graph Construction: GraphRAG extracts entities, relationships, and contextual signals from seed documents. For a prediction about a regulatory outcome, you might feed in recent policy drafts, lobbying filings, and news coverage. For a market event, you’d seed with earnings reports, analyst notes, and social sentiment.

Stage 2 — Environment Setup: The system generates agent personas from the extracted entity graph, assigns behavioral parameters, and constructs a social network topology. Each agent carries an independent identity, knowledge state, and social position.

Stage 3 — Simulation: Agents interact across dual simulated platforms—one Twitter-like, one Reddit-like—with dynamic memory managed via Zep Cloud. Agents do not follow scripted dialogue trees; they respond to each other, update beliefs, and exhibit emergent phenomena including polarization, herd effects, and coalition formation.

Stage 4 — Report Generation: A dedicated ReportAgent observes the simulation output, analyzes emergent patterns, and produces a structured prediction covering key outcome probabilities, opinion dynamics, and coalition states.

Stage 5 — Deep Interaction: Users can query individual simulated agents directly, inject new variables (simulate a breaking news event, for instance), and re-run scenarios from a “God’s-eye view.”

MiroFish quickstart (from official documentation)

git clone https://github.com/666ghj/MiroFish.git cd MiroFish cp .env.example .env

Set LLM_API_KEY, ZEP_API_KEY in .env

docker-compose up -d

The recommended LLM backend is Alibaba’s Qwen-plus via Bailian—a pragmatic choice given the creator’s location and the token economics of running hundreds of agents per simulation round.

The Academic Foundation: What Collective Intelligence Research Actually Shows

The most rigorous published evidence for swarm AI forecasting superiority comes not from MiroFish, but from Unanimous AI, the company founded by Louis Rosenberg (Stanford PhD, 300+ patents) around the concept of “Collective Superintelligence.”

Rosenberg’s IEEE-published research on financial market prediction showed measurable gains: (Rosenberg, L.B., Willcox, G. “Artificial Swarm Intelligence to predict Financial Markets.” IEEE UEMCON, 2017)

Individual forecasters averaged 56.6% accuracy predicting weekly financial trends
The same forecasters working in real-time AI-mediated swarms achieved 77.0% accuracy—a 36% relative improvement
Hypothetical investments based on swarm forecasts produced a 13.3% ROI over 19 weeks versus 0.7% ROI for individual predictions

The mechanism differs significantly from MiroFish. Unanimous AI’s approach is human-in-the-loop: real human forecasters participate in real-time swarm sessions mediated by AI algorithms that apply biological flocking dynamics to converge on collective answers. No pure LLM agents—the humans bring genuine diverse priors.

MiroFish uses only LLM agents. Whether synthetic personas with generated memories produce genuinely diverse epistemic states—or merely simulate diversity while sharing the same underlying model weights—is the central unanswered question.

Swarm, Ensemble, and Individual LLMs: A Technical Comparison

These three approaches to AI forecasting are frequently conflated. They represent distinct architectures with different strengths:

Dimension	Individual LLM	Ensemble ML	Swarm AI (MiroFish)
Architecture	Single model, single prompt	Multiple models combined statistically	Many agents interacting socially
Output type	Probability estimate	Weighted average of model outputs	Emergent consensus from simulation
Bias handling	Shared model biases	Reduces variance via averaging	Theoretically surfaces private signals
Best for	Fast, structured questions	Structured prediction with historical data	Complex social/geopolitical dynamics
Published benchmarks	Yes (ForecastBench, Metaculus)	Yes (narrow structured tasks)	None as of March 2026
Typical accuracy	Brier score 0.101 (GPT-4.5) (PredictStreet. “The Great Forecast Convergence: AI Closing the 20% Gap on Human Superforecasters.” FinancialContent, January 18, 2026)	Up to 97%+ on structured tasks⁵	Unknown
Computational cost	Low	Medium	High (token-intensive per agent)
Setup complexity	Minimal	Moderate	High (GraphRAG, Zep, Docker)
Open source	Partially	Yes	Yes (AGPL-3.0)

The “97%+” accuracy figures for ensemble ML methods—XGBoost, Random Forest with SMOTE balancing—apply to narrow classification tasks with stable historical patterns (energy demand, churn prediction). They are not comparable to Brier scores from geopolitical or financial event forecasting. The domains are incompatible.

What the Benchmarks Actually Show (and What They Don’t)

The honest picture of AI forecasting capability in early 2026:

The best LLM forecaster (GPT-4.5) achieves a Brier score of 0.101 on ForecastBench—lower is better. The average Metaculus human crowd scores 0.149. The superforecaster panel mean is 0.1222. At this rate of improvement (~0.020 Brier points per year), AI is projected to reach parity with human superforecasters by November 2026. (PredictStreet. “The Great Forecast Convergence: AI Closing the 20% Gap on Human Superforecasters.” FinancialContent, January 18, 2026)

One critical caveat: multiple top LLMs show a 0.994 correlation with provided prediction market prices when given that data as context. This raises the possibility that leading LLM forecasters are partially arbitraging existing market prices rather than independently reasoning about outcomes—a form of sophisticated circular reference, not genuine forecasting.

For MiroFish specifically: no published benchmarks exist. The creator explicitly warns against using MiroFish for short-term price prediction. The most-cited “accuracy” evidence is a single developer’s claim of $4,266 profit over 338 Polymarket trades after using MiroFish to simulate 2,847 digital humans per trade—unverified, unaudited, and lacking methodology disclosure.

The prediction market bot landscape provides adjacent context: 14 of the 20 most profitable Polymarket wallets are bots—but their dominance comes primarily from arbitrage (exploiting price inconsistencies between platforms), not superior predictive accuracy. Arbitrage bots extracted approximately $40 million from Polymarket between April 2024 and April 2025. (FinanceMagnates. “Prediction Markets Are Turning Into a Bot Playground.” 2025)

The Investment Signal: Why Chen Tianqiao Moved in 24 Hours

The business story around MiroFish is as notable as the technical one. Guo Hangjiang—a 20-year-old senior at Beijing University of Posts and Telecommunications who describes his development approach as “vibe coding: fast, intuitive, and not over-designed”—built MiroFish in 10 days after his previous project, BettaFish (a multi-agent sentiment analyzer), hit 20,000 GitHub stars in a single week.

Within 24 hours of receiving a demo video, Chen Tianqiao—founder of Shanda Group and formerly the wealthiest person in China—committed 30 million yuan (~$4.1 million USD). Guo became CEO overnight.

The investment rationale is worth parsing: Chen is not betting on MiroFish’s current benchmark performance. He is betting on the architectural bet that collective AI simulation will become a foundational tool for social prediction, public opinion modeling, and market intelligence—applications where simulating realistic population dynamics is inherently more tractable than training for specific outcomes.

That’s a reasonable thesis. It’s also a bet on infrastructure, not proven accuracy.

The Documented Limitations

Technical critics have raised several substantive concerns about MiroFish that practitioners should evaluate before deployment:

The persona depth problem: Critics characterize agents as “system prompts wearing costumes”—each agent interaction is a single LLM call with generated context, not a persistent cognitive entity. Whether this produces genuinely diverse reasoning or correlated outputs from a shared model is unresolved.

Environment fidelity: Simulated social platforms lack the algorithmic amplification, feed personalization, and engagement mechanics of real social media. An agent “going viral” on a MiroFish Reddit-like board is structurally different from actual virality dynamics.

Scale constraints: Zep Cloud’s free tier limits to 5 API requests per minute. At practical scale (MiroFish’s own documentation suggests meaningful simulations max around 40 interaction rounds), token costs constrain simulation depth. A 700-agent simulation with multiple rounds rapidly becomes expensive.

Version maturity: Version 0.1.2 is a functional prototype. Production deployments should expect sharp edges, limited error handling, and dependency volatility.

The offline fork: MiroFish-Offline (by developer nikmcfly) removes the cloud dependencies entirely, using Neo4j and Ollama local models. For practitioners concerned about API costs or data privacy, this variant addresses the most acute operational constraints—at the expense of model quality.

What Collective AI Forecasting Needs to Prove

The theoretical case for swarm AI forecasting is coherent: diverse agents with independent beliefs, interacting through structured social dynamics, should surface private information and average out individual biases. The Unanimous AI research demonstrates this works for human participants.

The open empirical question is whether LLM-generated personas provide genuine epistemic diversity or synthetic diversity—the appearance of varied perspectives from agents that share underlying model weights and training distributions. If agent diversity is cosmetic, swarm AI produces elaborate-looking outputs with no forecasting advantage over individual models.

The field needs controlled prospective studies: identical prediction questions routed to individual LLMs, ensemble methods, and swarm systems like MiroFish, evaluated against actual outcomes over six to twelve months. Until that evidence exists, swarm AI for prediction markets is a compelling architectural hypothesis, not a validated forecasting method.

The hypothesis may prove correct. The architecture maps onto real mechanisms in collective intelligence theory. The viral adoption of MiroFish has created a community capable of generating that evidence. But as of today, practitioners should build with swarm AI for its qualitative outputs—scenario exploration, stakeholder modeling, opinion dynamics—rather than as a source of calibrated probability estimates.

Frequently Asked Questions

Q: Does MiroFish have published accuracy benchmarks for prediction markets? A: No. As of March 2026, no controlled studies compare MiroFish predictions to actual outcomes. All accuracy claims in circulation are anecdotal. The creator explicitly warns against using MiroFish for short-term price prediction.

Q: How does swarm AI differ from ensemble methods? A: Ensemble methods combine the statistical outputs of multiple independently trained models. Swarm AI produces predictions through emergent agent interaction—the consensus is what survives social dynamics, not an average. Ensemble methods have strong benchmarks on structured tasks; swarm AI’s advantage (if validated) would be on complex, social, and geopolitical prediction.

Q: What does MiroFish actually require to run? A: Python 3.11–3.12, Node.js 18+, the uv package manager, Docker Compose, an OpenAI SDK-compatible LLM API key, and a Zep Cloud API key. The free Zep tier limits to 5 API requests per minute, which constrains simulation scale without a paid subscription.

Q: Is swarm AI ready for production trading applications? A: Not on current evidence. MiroFish v0.1.2 is a prototype. Polymarket bot dominance comes from arbitrage, not predictive accuracy. Swarm AI’s theoretical advantages require empirical validation before deployment in production forecasting pipelines.

Q: What is the best-validated collective intelligence forecasting system? A: Unanimous AI’s human-in-the-loop swarm platform has IEEE-published results showing 36% relative improvement in financial trend prediction accuracy (56.6% individual → 77.0% swarm). It requires real human participants, not LLM agents, and represents a different—and more validated—approach than fully synthetic agent simulations.