JetBrains' New Language Lets You Talk to LLMs in Specs, Not English

Kotlin creator Andrey Breslav built a language where you write what software should do, not how to do it—and an LLM handles the rest. CodeSpeak compiles structured English specifications into Python, Go, JavaScript, TypeScript, Kotlin, or Swift. On four real open source projects, it reduced code volume by 5.9x to 9.9x while maintaining passing test suites.

What Is CodeSpeak?

CodeSpeak is a specification language where developers describe software behavior in structured English, and LLMs compile those specs into production-grade code. Created by Andrey Breslav—the original designer of Kotlin at JetBrains—it launched in alpha (version 0.1.0) via PyPI in late 2025 and reached version 0.3.4 in early March 2026.¹

The design philosophy sits between two extremes that Breslav explicitly rejects: traditional programming languages (too much low-level detail) and ad hoc prompting (too ambiguous). CodeSpeak aims to be “neither a formal language, nor just prompting”—a structured middle ground engineered for professional developers building production systems, not casual users looking for quick scripts.²

The core premise: developers should only write what they uniquely know. Everything the LLM can infer—data structures, validation logic, boilerplate, test scaffolding—should be generated, not hand-authored.

# Install CodeSpeak CLI
uv tool install codespeak-cli

# Convert existing code to a specification
codespeak takeover <file>

# Compile specs to production code
codespeak build

How Does CodeSpeak Work?

A CodeSpeak specification is a plain-text markdown file. You describe what each component should do; the CLI sends it to a configured LLM and writes back generated code that passes a test suite. The project supports mixed-mode development: some files can be spec-generated, others hand-coded, and they coexist in the same repository.

A simple API endpoint specification illustrates the syntax:

endpoint POST /auth/login {
  request {
    body {
      email: string @format(email) @required
      password: string @minLength(8) @required
    }
  }
  response 200 {
    body {
      token: string @format(jwt)
      expiresIn: number @unit(seconds)
    }
  }
}

Rather than describing implementation steps, the developer describes constraints: field types, validation rules, format requirements. The LLM handles the actual HTTP handler, input parsing, error responses, and middleware wiring.

Version 0.3.4 introduced modularity: specs can now import each other, and CodeSpeak builds them in dependency order—dependencies compile first, then specs that reference them.⁴

The codespeak takeover command works the other direction: it takes existing code and reverse-engineers a specification from it. A February 2026 blog post demonstrated this against Microsoft’s MarkItDown library, fixing a GitHub issue by converting existing Python to a spec, editing the spec to add missing fields, then calling codespeak build. The result: 23 spec lines added produced 221 generated code lines—approximately a 10x expansion ratio.⁵

The Core Problem: Why English Prompts Break Down

The ambiguity problem is fundamental to natural language, not incidental. Research published in ACM Transactions on Software Engineering found that even precise docstrings produce underdetermined behavior—an instruction to “remove all duplicates from a list” doesn’t specify whether all copies or just the extra copies get removed.⁶ LLMs make different choices depending on phrasing, context window contents, temperature settings, and model version.

For exploratory prototyping, this variability is tolerable. For production systems that get maintained, upgraded, and audited, non-determinism compounds over time. Every model update can silently change behavior in spec-generated code.

Breslav’s answer is structure that removes interpretive latitude at the margins. “The CodeSpeak compiler makes sure there’s no significant ambiguity,” according to the project documentation. “If something can be interpreted in two or more substantially different ways, CodeSpeak will ask you what you meant and help you change the spec accordingly.”¹

This is materially different from prompt engineering. Prompt engineering tries to guide LLM behavior through clever phrasing; CodeSpeak tries to eliminate the categories of questions the LLM would otherwise have to guess at.

Real-World Benchmarks

CodeSpeak has published case studies converting modules from four real open source Python projects:¹

Project Module	Original LOC	Spec LOC	Reduction	Tests
WebVTT support (yt-dlp)	255	38	6.7x	Passing + new tests added
Italian SSN generator (Faker)	165	21	7.9x	Passing + new tests added
Encoding detection (BeautifulSoup4)	826	141	5.9x	Passing + new tests added
EML converter (MarkItDown)	139	14	9.9x	Passing + new tests added

All four conversions maintained passing test suites, and in each case the test count increased post-conversion—the LLM added coverage beyond what the original code had.

The 5.9x to 9.9x range tracks with CodeSpeak’s stated 5–10x code reduction target. The variance is meaningful: highly algorithmic code with complex logic compresses less than procedural boilerplate. Encoding detection in BeautifulSoup4 involves heuristics that require more explicit spec guidance; a simple data converter like MarkItDown’s EML handler has fewer edge cases to specify.

Why This Approach Is Gaining Traction Now

Programming abstraction has followed a consistent pattern. Assembly gave way to C; procedural code gave way to object-oriented languages; scripting languages pushed higher still. Each layer moved routine decisions out of developer scope. JetBrains CEO Skrygan articulated this lineage explicitly: “And now it’s time to move even higher.”³

The academic research community reached a similar conclusion independently. A 2025 survey from Software Engineering found that LLMs can already translate natural language intent to formal postconditions with enough accuracy to catch 64 real-world historical bugs in the Defects4J dataset—demonstrating that machine formalization is increasingly viable.⁶ The researchers note that barriers remain: semantic ambiguity, tool interoperability, and LLM instability between versions. But the direction of progress is clear.

CodeSpeak’s timing reflects this maturity window. The LLMs capable of reliably producing production Python from a short English description didn’t exist at scale three years ago. Now they do.

The Criticisms Worth Taking Seriously

The Hacker News thread on CodeSpeak surfaced three legitimate objections that the project has not fully resolved.⁷

The spec-is-as-hard-as-code problem. Joel Spolsky’s 2007 argument survives the LLM era: if your spec is precise enough to drive correct code generation, it approaches the complexity of the code itself. Breslav’s answer is that specs don’t need to capture everything—only what the developer uniquely knows. But critics argue that for non-trivial systems, the “unique knowledge” surface area is large.

Non-determinism persists below the spec. CodeSpeak reduces ambiguity in the instruction; it doesn’t make LLMs deterministic. The same spec can generate different code between runs, between model updates, and when context window contents change. One commenter in the thread noted: “Models aren’t deterministic—every time you would try to re-apply you’d likely get different output.” Spec drift—where the generated code diverges from spec intent over model iterations—is a real maintenance concern.

The bottleneck may not be ambiguity. Several developers building production agent systems pushed back on the core premise: “the model’s context understanding” is the actual constraint, not prompt phrasing. Precise specifications don’t help if the model lacks sufficient context about the broader system to make correct implementation decisions.

These aren’t fatal objections, but they shape where CodeSpeak is likely to work well versus where it won’t. Isolated, well-bounded modules with clear input/output contracts are strong candidates. Complex systems with many interdependencies and unclear requirements are weaker candidates.

How CodeSpeak Compares to Alternative Approaches

Approach	Ambiguity Handling	Determinism	Maintenance	Learning Curve
Natural language prompts	None—interpreted by LLM	Low (varies per run)	High spec drift risk	Minimal
CodeSpeak specs	Structured—compiler flags ambiguity	Medium (reduced surface area)	Lower—maintain spec, not code	1–2 weeks
Traditional code	Complete—every detail explicit	High	Standard	Language-dependent
Formal methods (TLA+, Alloy)	Complete—mathematical spec	High (verified)	Low—proven correctness	High

CodeSpeak occupies a specific niche: more structured than natural language, significantly less verbose than hand-written code, and practically approachable unlike formal verification tools. Whether that niche is large enough to anchor a durable workflow is the open question.

What Practitioners Need to Know

CodeSpeak is currently alpha software. The CLI installs via uv tool install codespeak-cli, and version 0.3.4 is the current stable release. The project supports Python, Go, JavaScript, TypeScript, Kotlin, and Swift as output targets.¹

For teams evaluating it, the most credible use case is greenfield utility modules with well-understood behavior: data transformers, API clients, validators, and converters. The MarkItDown and yt-dlp case studies are representative of where the tool performs. Complex domain logic, state machines, and cross-cutting architectural concerns are better candidates for manual implementation—or at minimum, for CodeSpeak mixed-mode projects where humans write the spec and the LLM fills in the scaffolding.

The maintenance model deserves serious thought before adoption. CodeSpeak’s premise is that you maintain specs, not code—but this requires discipline to keep specs synchronized with intent, especially as product requirements evolve. Teams that already struggle to keep documentation current should plan for this as a first-class process concern.

Frequently Asked Questions

Q: Is CodeSpeak the same as prompt engineering? A: No. Prompt engineering optimizes natural language instructions to guide LLM behavior. CodeSpeak defines a structured syntax with a compiler that actively flags ambiguity—it’s closer to a type system than a prompting technique. The specs are versioned files in your repository, not chat inputs.

Q: Does CodeSpeak require a specific LLM? A: CodeSpeak is LLM-agnostic; the CLI delegates to a configurable model. The project has tested against major code-capable LLMs. Performance varies by model, and the project’s benchmarks don’t specify which model was used—a relevant caveat for reproducibility.

Q: Can I adopt CodeSpeak incrementally in an existing project? A: Yes. The codespeak takeover command converts existing code files to specs, and CodeSpeak supports mixed projects where spec-generated and hand-written files coexist. The project recommends starting with isolated utility modules rather than attempting wholesale conversion.

Q: What happens when the LLM generates incorrect code from a spec? A: CodeSpeak requires test suites and validates that generated code passes them. If it doesn’t, the build fails. You modify the spec to add constraints that prevent the error and rebuild. The failure mode is explicit rather than silent—but it requires maintained test coverage to be meaningful.

Q: How does this relate to JetBrains’ own language work? A: Andrey Breslav created CodeSpeak independently after leaving his role at JetBrains. The company separately announced work on an unnamed higher-abstraction language in July 2025 under CEO Kirill Skrygan. Both efforts pursue similar ideas about specification-driven development, but they are distinct projects with no announced coordination.

Sources:

CodeSpeak. “Software Engineering with AI.” codespeak.dev. March 2026. https://codespeak.dev/ ↩ ↩² ↩³ ↩⁴
The Pragmatic Engineer. “The programming language after Kotlin – with the creator of Kotlin.” newsletter.pragmaticengineer.com. 2026. https://newsletter.pragmaticengineer.com/p/the-programming-language-after-kotlin ↩
Krill, Paul. “JetBrains working on higher-abstraction programming language.” InfoWorld. July 2025. https://www.infoworld.com/article/4029053/jetbrains-working-on-higher-abstraction-programming-language.html ↩ ↩²
CodeSpeak Blog. “First glimpse of codespeak takeover: Transition from Code to Specs in Real Projects.” codespeak.dev. February 2026. https://codespeak.dev/blog/codespeak-takeover-20260223 ↩
UBOS Tech. “CodeSpeak Introduces Innovative Mixed-Project Workflow Platform.” ubos.tech. 2026. https://ubos.tech/news/codespeak-introduces-innovative-mixed%E2%80%91project-workflow-platform/ ↩
Kande, M. et al. “Formal requirements engineering and large language models: A two-way roadmap.” Science of Computer Programming, ScienceDirect. 2025. https://www.sciencedirect.com/article/pii/S0950584925000369 ↩ ↩²
Hacker News. “Kotlin creator’s new language: talk to LLMs in specs, not English.” Hacker News. 2026. https://news.ycombinator.com/item?id=47350931 ↩

What Is CodeSpeak?

How Does CodeSpeak Work?

The Core Problem: Why English Prompts Break Down

Real-World Benchmarks

Why This Approach Is Gaining Traction Now

The Criticisms Worth Taking Seriously

How CodeSpeak Compares to Alternative Approaches

What Practitioners Need to Know

Frequently Asked Questions

Footnotes

Related Articles

Constraint Propagation for Fun: When Algorithms Feel Like Puzzles

Returning to Rails in 2026: Why Developers Are Abandoning React Complexity

Rust Is Quietly Replacing Python in AI Infrastructure

Enjoyed this article?