groundy
models & research

Can RoboSSM's State-Space Backbone Replace Transformer Imitation Policies?

The RoboSSM preprint swaps the transformer backbone of in-context robot imitation for a Longhorn state-space model and claims LIBERO gains. The full paper is still pending.

4 min · · · 3 sources ↓

The arXiv abstract for RoboSSM has surfaced since the last review, so the question is no longer completely evidence-free. The full PDF still needs to be read, but the abstract already states the authors’ central claim: a Longhorn state-space model can outperform transformer policies on the LIBERO benchmark by handling longer demonstration contexts at test time.

The RoboSSM preprint

RoboSSM (arXiv 2509.19658) was submitted in September 2025 and revised in June 2026. The abstract frames it as a recipe for in-context robot imitation learning that swaps the usual Transformer backbone for Longhorn, an SSM that promises linear-time inference and strong extrapolation. It reports experiments on LIBERO and claims improved generalization to unseen and long-horizon tasks by processing longer test-time prompts than transformer ICIL methods. The abstract calls this the first demonstration that SSMs can serve as an efficient and scalable backbone for ICIL. It does not, however, include success rates, memory curves, or the exact transformer baselines.

Longer test-time contexts matter because ICIL methods are usually trained on short demonstration prompts; when the deployed prompt exceeds that length, transformers hit the computational limitations the RoboSSM abstract notes, and performance degrades. RoboSSM’s bet is that an SSM’s linear-time inference avoids both costs.

The fetched sources are unrelated

The retrieval returned two unrelated June 2026 arXiv papers. One applies Sparse Identification of Nonlinear Dynamics (SINDy) to learn control-effectiveness mappings for overactuated aircraft from flight data. The other frames Information Lattice Learning (ILL) as probabilistic graphical-model structure learning, interpreting rules learned by projecting signals onto a partition lattice. Both are control or learning papers, but neither addresses robot manipulation, in-context imitation, or state-space-model policies.

The nearest control-learning result

The SINDy result is the closest practical neighbor. The authors show that a learned, physics-constrained analytical model of control effectiveness can match the accuracy of a full nonlinear onboard model on a high-fidelity aircraft benchmark across aggressive maneuvers, while cutting computational cost relative to established baselines. The learned mapping also admits analytical derivatives, which helps inside nonlinear solvers that include actuator dynamics, and an online residual-monitoring step refreshes the model when plant changes are detected.

That pattern matters for the robot question because it shows one way to escape expensive structure at inference: replace a heavy first-principles computation with a compact learned surrogate that is still compatible with downstream optimization. But the SINDy paper’s setting is aircraft control allocation, not imitation learning, and the efficiency gain comes from sparse analytical structure, not from a linear-cost sequence model.

The remaining gaps

Three pieces are still missing. First, the actual RoboSSM architecture and training setup: whether it uses pure Mamba-style recurrence, S4, a hybrid design, or something else, and how it ingests multi-demo context. Second, task-level results: success rates on training tasks and generalization to held-out tasks, with transformer baselines run under the same conditions. Third, the cost profile: memory and compute scaling with demonstration length, which is the central economic argument for swapping attention for recurrence in robot policies.

We also need to know whether the reported gains come from the SSM backbone itself or from simply feeding more demonstrations at test time.

The RoboSSM abstract is a hypothesis, not a measurement. Until the PDF is read and the benchmark tables are checked, the evidence remains partial: the question has a venue, but not a verdict. The adjacent SINDy result is interesting on its own, but it cannot fill that gap.

Frequently Asked Questions

Does the SINDy result generalize beyond aircraft control allocation?

Not directly. The SINDy paper learns a physics-constrained control-effectiveness mapping directly from flight data and includes an online residual monitor that can refresh the model under actuator failures or other plant changes. Its gains come from sparse analytical structure, not from a linear-cost sequence model, so robot imitation would need its own validation on manipulation benchmarks.

How does SINDy’s compactness differ from what RoboSSM proposes?

SINDy compresses a first-principles aircraft model into sparse analytical equations whose derivatives feed into nonlinear solvers, and it does so without requiring an onboard model. RoboSSM, according to its abstract, replaces Transformer attention with a Longhorn state-space model to process longer demonstration prompts at lower inference cost. One exploits physical sparsity; the other exploits sub-quadratic sequence processing.

What should teams verify before switching a robot ICIL policy to an SSM backbone?

They need to confirm training stability when the policy ingests multi-modal observations such as proprioception and vision through a recurrent backbone, and they need to check how variable-length demonstration prompts are padded or packed without wasting memory. They also need benchmark success rates and an ablation separating backbone effects from simply using longer test-time contexts.

What could invalidate RoboSSM’s claimed advantage over transformers?

LIBERO tasks may not require contexts long enough to stress a standard transformer, in which case the attention cost gap shrinks. If the improvement is mostly from feeding more demonstrations, a transformer with chunked retrieval or a larger context budget could match it. The abstract does not provide the ablations needed to rule that out.

What is the strongest near-term risk in covering RoboSSM now?

The full PDF is still unread, so any benchmark margin or scaling claim is speculative. Groundy can add value by retrieving and verifying the actual RoboSSM tables first, while treating the SINDy result as a separate, aircraft-only proof that learned physics-constrained surrogates can cut inference cost.

sources · 3 cited

  1. RoboSSM: State-Space Models for In-Context Robot Imitation Learning primary accessed 2026-06-20
  2. An integrated interpretable control effectiveness learning and nonlinear control allocation methodology for overactuated aircrafts primary accessed 2026-06-20
  3. Information Lattice Learning as Probabilistic Graphical Model Structure Learning primary accessed 2026-06-20