Can a Language Model Work Without a Neural Network? A New arXiv Paper Says Yes

A language model with no neural network

A single-author preprint on arXiv claims to have built a working language model whose core component is not a deep neural network. Submitted May 28 by Vincent Granville, the paper proposes using radial basis function (RBF) networks, a classical machine learning architecture dating back to the late 1980s, as the foundation for language modeling. The headline claim: the model solves for the global optimum of its loss function in closed form, in a single iteration, eliminating the backpropagation loop that every transformer, state-space model, and hybrid architecture depends on.

What the paper claims

Granville’s paper, titled “LLMs Without Deep Neural Networks: New Architecture, Benefits and Case Study,” argues that the key machinery of a language model can be built from RBF-network components rather than stacked layers of attention and feed-forward matrices trained by gradient descent. The abstract states that the model achieves its optimum in one step, offers increased explainability over standard deep networks, and includes a case study comparing the approach to existing methods.

The paper is 9 pages with 5 figures, categorized under Machine Learning (cs.LG) and Artificial Intelligence (cs.AI). It has not been peer-reviewed.

RBF networks and the closed-form claim

Radial basis function networks are not new. They have been used for interpolation, classification, and function approximation since at least the work of Broomhead and Lowe in 1988. A standard RBF network has a single hidden layer whose units use radial basis functions (typically Gaussians) centered on prototype vectors in the input space. The output layer is linear.

This structure matters because, once the centers and widths of the radial functions are fixed, the output weights can be computed analytically. This is the closed-form solution Granville references. In conventional RBF usage, this means training reduces to a linear algebra problem rather than an iterative optimization over a non-convex loss landscape.

Whether this property transfers cleanly to the scale and complexity of language modeling is the open question. Standard LLMs use backpropagation through dozens or hundreds of transformer layers precisely because the mapping from token sequences to probability distributions over vocabularies is high-dimensional and non-convex. Granville’s claim that the same task admits a closed-form global optimum, if correct, would challenge the assumption that iterative gradient-based training is a structural requirement for language modeling rather than a consequence of choosing deep architectures.

The case study: what we know and don’t know

The paper includes a case study with comparisons to similar methods, according to the abstract. The specific benchmarks, accuracy figures, and throughput measurements reside in the full PDF, not in the abstract, and were not available for independent verification at the time of writing.

That gap matters. The editorial claim that an RBF-based model can perform LLM-class tasks rests entirely on whatever tables and figures appear in those nine pages. Without knowing whether the case study evaluates the model on next-token prediction over a standard corpus, on downstream NLP benchmarks, or on a smaller synthetic task, the result’s significance remains uncertain. A model that performs well on a constrained vocabulary with short sequences is a different claim than one that approaches transformer baselines on, say, perplexity over OpenWebText.

The Chinese RBF research context

Granville notes in the abstract that Chinese researchers have recently shown significant interest in RBF networks as substitutes for deep neural networks, citing increased explainability and higher accuracy. The abstract does not cite specific Chinese publications by name. Whether this is a coordinated research thread, independent convergence on an underused architecture, or an anecdotal observation is unclear from the abstract alone.

Architecture-bound or scale-bound?

The deeper question this paper raises is independent of whether Granville’s specific approach holds up. Which capabilities we associate with LLMs are properties of the transformer architecture (or deep nets more broadly), and which are properties of training on large text corpora?

State-space models like Mamba and architectures like RWKV have already demonstrated that transformer-specific mechanisms are not strictly necessary for competitive language modeling (full self-attention, positional encodings). But those alternatives are still deep neural networks trained with backpropagation. Granville’s claim is more radical: no deep net at all.

If an RBF-based model, even a limited one, can approximate the input-output mapping of a language model without gradient-based training, the implication is that some of what we attribute to the inductive biases of deep architectures may instead be recoverable from the statistical structure of natural language itself, given enough data and the right parameterization. That does not mean RBF networks will replace transformers. It means the boundary between “what the architecture contributes” and “what the data contributes” is less settled than the current consensus assumes.

Conversely, if the case study shows the model lagging transformers on tasks requiring long-range dependencies, compositional reasoning, or in-context learning, that gap becomes evidence for which capabilities are genuinely architecture-bound.

Caveats

Several deserve flagging. The paper is single-authored and not peer-reviewed. RBF networks are a well-studied classical technique; the novelty here is the application to language modeling and the closed-form optimization claim, not the underlying math. The abstract provides no benchmark numbers, model sizes, or dataset descriptions, making independent assessment impossible without the full PDF. The reference to Chinese RBF research cites no specific papers in the abstract. And the claim of a closed-form global optimum in one iteration, while mathematically plausible for certain RBF configurations, has a long history in the literature of being correct in restricted settings and difficult to extend to the scale where modern language models operate.

None of this means the paper is wrong. It means the burden of proof is on the full case study, and that burden has not yet been met in public.

Frequently Asked Questions

What happens to compute requirements if you train an RBF language model with billions of parameters?

The closed-form solution requires computing a matrix pseudoinverse over all hidden units, an operation that is cubic in the number of centers. With billions of parameters, inverting that matrix could exceed the combined cost of hundreds of gradient-descent steps. The one-iteration claim is correct in principle, but the single step may not be computationally cheaper than iterative training.

How do arXiv’s recent submission policy changes relate to this paper?

In November 2025, arXiv stopped accepting unvetted computer science review and position papers due to a rise in AI-generated submissions. Granville’s paper, categorized under cs.LG and cs.AI, passed that tightened moderation. arXiv is also scheduled to separate from Cornell University and become an independent nonprofit on July 1, 2026.

Where would an RBF architecture most likely fall short compared to a transformer?

RBF networks map every input to a fixed set of radial centers with no native mechanism for variable-length sequences or positional information. Tasks that require tracking token order over long contexts, such as multi-hop question answering or code generation, depend on the recursive, depth-dependent computation that stacked transformer layers provide. Radial-basis interpolation is fundamentally a spatial similarity measure, not a sequential one.

Has anyone independently replicated or benchmarked this approach?

As of early June 2026, no mainstream research group or tech outlet appears to have covered or replicated Granville’s results. The non-transformer architectures that have gained adoption, specifically Mamba and RWKV, all retain the deep-network-plus-backpropagation structure that Granville’s model rejects. The Chinese RBF research referenced in the abstract cites no specific publications, making convergent results difficult to assess.