Hugging Face Skills are structured instruction packages that give any compatible coding agent—Claude Code, OpenAI Codex, Google Gemini CLI, or Cursor—the procedural knowledge to execute AI/ML workflows end-to-end. Released in November 2025 under Apache 2.0, the library accumulated 7,500 GitHub stars by early 2026, signaling rapid practitioner adoption. Skills solve a narrow but critical problem: a model that can write code doesn’t automatically know the best practices for training on HF infrastructure.
What Is the HF Skills Library?
The Hugging Face Skills repository is a curated collection of Agent Skills-formatted instruction packages. Each skill is a directory with a SKILL.md file at its core: a markdown document with YAML frontmatter (name, description, optional compatibility metadata) followed by the task guidance the agent reads upon activation.
Think of each skill as a domain-aware runbook. Where MCP tools give agents access to live data and API calls, skills give agents the judgment to use that data correctly—which GPU to select for a 3B parameter model, when to apply LoRA versus full fine-tuning, how to chain SFT into DPO training runs.
The Agent Skills format itself, now hosted at agentskills.io, was open-sourced by Anthropic as a cross-platform specification. Hugging Face’s repository was among the first major adopters, bringing 9 domain-specific skills that cover the full ML lifecycle.
How Do Skills Work?
The SKILL.md Structure
Every skill follows a deliberate three-tier information architecture designed for token efficiency:
- Metadata (~100 tokens): The
nameanddescriptionfields are loaded at startup across all installed skills. Agents use these to decide which skill applies to a given request. - Instructions (<5,000 tokens recommended): The full SKILL.md body loads when the agent activates the skill. This contains step-by-step workflows, decision trees, and best practices.
- Supporting resources (on demand): Scripts in
scripts/, reference docs inreferences/, and templates inassets/load only when the agent needs them.
A minimal SKILL.md looks like:
---name: hugging-face-model-trainerdescription: Train or fine-tune language models using TRL on Hugging Face Jobs. Covers SFT, DPO, GRPO, and reward modeling. Use when the user wants to train a model, fine-tune on custom data, or set up an RL training pipeline.license: Apache-2.0metadata: author: huggingface version: "1.0"---The body then contains the actual workflow: how to validate the dataset format, select hardware, generate the training script, submit the job, and monitor it through Trackio.
The Cross-Platform Interoperability Model
The defining characteristic of HF Skills is that the same skill works across four major coding agents without modification:
| Coding Agent | Installation Method |
|---|---|
| Claude Code | /plugin marketplace add huggingface/skills, then /plugin install <skill-name>@huggingface/skills |
| OpenAI Codex | $skill-installer install <skill> or from cloned local directory |
| Google Gemini CLI | gemini extensions install |
| Cursor | .cursor-plugin/plugin.json + .mcp.json manifests |
This interoperability is non-trivial. Before the Agent Skills standard, teams building with multiple coding agents had to maintain parallel documentation or agent-specific configuration for every domain capability. A single SKILL.md eliminates that duplication.
The Nine Available Skills
As of early 2026, the repository contains nine skills, split across two categories:
Domain-Specific Skills
These target AI/ML workflows directly:
| Skill | Core Capability |
|---|---|
hugging-face-model-trainer | Fine-tune LLMs via TRL (SFT, DPO, GRPO), hardware selection, LoRA configuration |
hugging-face-datasets | Create, validate, and push datasets to HF Hub |
hugging-face-evaluation | Add structured evaluation results to model cards |
hugging-face-jobs | Submit and monitor compute jobs on HF infrastructure |
hugging-face-trackio | Track training metrics with real-time Trackio visualizations |
hugging-face-paper-publisher | Index arXiv papers on HF Hub, link to models and datasets |
Tool Skills
These teach agents how to use HF’s own tooling:
| Skill | Core Capability |
|---|---|
hugging-face-cli | Hub operations (upload, download, auth) via the hf CLI |
gradio | Build interactive ML demos and web UIs |
hugging-face-tool-builder | Generate reusable scripts for repeated HF API operations |
A Concrete Workflow: Fine-Tuning With One Instruction
The model trainer skill is the most documented capability, and it illustrates the practical gap Skills bridge. A user can issue:
Fine-tune Qwen3-0.6B on the open-r1/codeforces-cots dataset for instruction followingThe agent, with the hugging-face-model-trainer skill active, will:
- Validate the dataset schema against TRL’s expected
messagesorprompt/completionformat - Select hardware based on model size (t4-small for <1B parameter models)
- Generate a TRL training script with appropriate hyperparameters
- Submit the job to HF Jobs infrastructure
- Return a Trackio dashboard URL for real-time loss and learning rate monitoring
- Push the final model checkpoint to HF Hub under the user’s namespace
According to Hugging Face’s blog post announcing the capability, a complete training run on a 0.6B model costs approximately $0.30 at roughly $0.75/hour on a t4-small instance.1
The hardware decision tree embedded in the skill reflects production knowledge that would otherwise require reading multiple docs pages:
| Model Size | Recommended Hardware | Approach |
|---|---|---|
| <1B parameters | t4-small (~$0.75/hr) | Full fine-tune |
| 1–3B parameters | t4-medium or a10g-small | Full fine-tune |
| 3–7B parameters | a10g-large | LoRA (auto-applied) |
| >7B parameters | Not supported via this skill | Requires custom setup |
Skills vs. Smolagents: Complementary Architectures
A common point of confusion is how HF Skills relates to smolagents, Hugging Face’s Python framework for building code-writing agents. They address different layers of the stack:
| Dimension | HF Skills | smolagents |
|---|---|---|
| Type | Instruction packages | Python framework |
| Primary user | Developers using existing coding agents | Developers building new agents |
| Format | Markdown + YAML (SKILL.md) | Python code |
| Agent compatibility | Claude Code, Codex, Gemini CLI, Cursor | Any LLM via HF Inference or API |
| Composability mechanism | Skill activation via plugin/extension system | Code generation with tool nesting |
| Hub integration | Skills shared via GitHub repo | Tools/agents shared as Gradio Spaces |
| Distribution | Apache 2.0 GitHub repo | PyPI package |
smolagents introduced a key architectural insight: having agents write actions in Python code—rather than JSON tool calls—enables natural composability through function nesting, loops, and conditionals. A smolagents CodeAgent can call a tool, process the result, branch conditionally, and call another tool—all in a single generated code block.
HF Skills operate at a higher abstraction level: they provide the domain knowledge that makes either a smolagents-powered agent or a third-party coding agent effective at ML tasks, without requiring the practitioner to write agent infrastructure code.
The Agent Skills Standard: Context
The Agent Skills format HF Skills uses was open-sourced by Anthropic and is now defined at agentskills.io. The spec follows a progressive disclosure model:
- Metadata fields (
name,description) are token-efficient identifiers loaded at startup - Instruction body content is loaded only when a skill is activated
- Reference files in
scripts/andreferences/subdirectories are fetched on demand
The specification includes a skills-ref validation tool for checking SKILL.md compliance:
skills-ref validate ./my-skillThis positions Agent Skills as an interoperability layer analogous to what MCP became for tool connectivity—a format that benefits from wide adoption because skill packs become reusable across the ecosystem rather than locked to specific agents.
Failure Modes and Practical Limitations
The library is young and carries meaningful limitations practitioners should understand before relying on it in production.
Model size ceiling: The model trainer skill’s documented upper limit for practical use is approximately 7B parameters. The HF blog post initially claimed 70B support, but later clarified the actual ceiling is smaller—agents selecting hardware for large models will encounter job failures or need manual intervention.1
Job account requirements: Cloud compute submission via HF Jobs requires a paid HF account. Teams evaluating the library on free-tier accounts will hit this constraint immediately when attempting training workflows.
Skill quality variance: The nine skills vary in maturity. The model trainer and CLI skills have detailed reference documentation and tested scripts; newer additions like hugging-face-paper-publisher are thinner. As with any open-source repository at this stage, skills are maintained by different contributors with different documentation standards.
No execution sandboxing: Skills load instructions into the agent’s context—the agent then executes code in your environment. There is no built-in sandboxing. The spec includes an experimental allowed-tools frontmatter field for pre-approving specific tool calls, but support varies by agent implementation.
Who Should Use HF Skills Today?
The library is well-suited for three practitioner profiles:
-
ML engineers iterating on models: The model trainer skill removes the friction of looking up TRL configuration syntax and HF Jobs submission patterns every fine-tuning run.
-
Researchers publishing on HF Hub: The datasets, evaluation, and paper-publisher skills systematize the Hub housekeeping that often gets deferred—model card eval tables, arXiv paper linking, dataset schema validation.
-
Teams standardizing agent workflows: Organizations using multiple coding agents (some developers on Claude Code, others on Cursor or Codex) get consistent AI/ML workflow guidance without maintaining parallel documentation.
The library is less suitable as a production automation backbone today. Skills provide guidance, not guarantees—agent execution still involves stochastic behavior, and the lack of built-in error recovery means complex multi-skill pipelines require human oversight.
Frequently Asked Questions
Q: Do HF Skills require Hugging Face’s own agents or smolagents? A: No. Skills work with any Agent Skills-compatible coding agent: Claude Code, OpenAI Codex, Google Gemini CLI, and Cursor are all supported. Smolagents is a separate HF framework for building agents, not a prerequisite for using Skills.
Q: How do Skills differ from MCP tools? A: MCP gives agents live access to data—Hub search results, model metadata, API responses. Skills give agents procedural knowledge—the judgment to use those tools correctly for ML tasks like fine-tuning or evaluation. They are complementary; the HF Skills repo includes MCP configuration so both can run together.
Q: Can I write a custom skill for my organization’s ML workflow?
A: Yes. The Agent Skills specification is open, and the skills-ref CLI validates custom SKILL.md files. HF’s own blog recommends using the official skills as building blocks for more domain-specific capabilities, rather than as exhaustive coverage of every workflow.
Q: What does it cost to run the model training skill? A: According to Hugging Face’s documentation, a complete SFT run on a 0.6B model costs approximately $0.30 using a t4-small instance at ~$0.75/hour. Costs scale with model size and hardware tier. HF Jobs requires a Pro or Team account.
Q: Are there limitations on model size? A: The model trainer skill practically supports models up to approximately 7B parameters, with LoRA automatically applied for models above 3B to manage memory. Larger models require infrastructure not covered by the current skill set.
Sources:
- GitHub - huggingface/skills
- Hugging Face Hub: Skills
- Hugging Face Hub: Agents Overview
- We Got Claude to Fine-Tune an Open Source LLM
- Agent Skills Specification
- Introducing smolagents
- smolagents documentation
- Agent Skills: Anthropic’s Next Bid to Define AI Standards - The New Stack
Footnotes
-
Hugging Face Blog. “We Got Claude to Fine-Tune an Open Source LLM.” Hugging Face, 2025. https://huggingface.co/blog/hf-skills-training ↩ ↩2