Hugging Face Skills: Pretrained Agent Capabilities

Hugging Face Skills are structured instruction packages that give any compatible coding agent—Claude Code, OpenAI Codex, Google Gemini CLI, or Cursor—the procedural knowledge to execute AI/ML workflows end-to-end. Released in November 2025 under Apache 2.0, the library accumulated 7,500 GitHub stars by early 2026, signaling rapid practitioner adoption. Skills solve a narrow but critical problem: a model that can write code doesn’t automatically know the best practices for training on HF infrastructure.

What Is the HF Skills Library?

The Hugging Face Skills repository is a curated collection of Agent Skills-formatted instruction packages. Each skill is a directory with a SKILL.md file at its core: a markdown document with YAML frontmatter (name, description, optional compatibility metadata) followed by the task guidance the agent reads upon activation.

Think of each skill as a domain-aware runbook. Where MCP tools give agents access to live data and API calls, skills give agents the judgment to use that data correctly—which GPU to select for a 3B parameter model, when to apply LoRA versus full fine-tuning, how to chain SFT into DPO training runs.

The Agent Skills format itself, now hosted at agentskills.io, was open-sourced by Anthropic as a cross-platform specification. Hugging Face’s repository was among the first major adopters, bringing 9 domain-specific skills that cover the full ML lifecycle.

How Do Skills Work?

The SKILL.md Structure

Every skill follows a deliberate three-tier information architecture designed for token efficiency:

Metadata (~100 tokens): The name and description fields are loaded at startup across all installed skills. Agents use these to decide which skill applies to a given request.
Instructions (<5,000 tokens recommended): The full SKILL.md body loads when the agent activates the skill. This contains step-by-step workflows, decision trees, and best practices.
Supporting resources (on demand): Scripts in scripts/, reference docs in references/, and templates in assets/ load only when the agent needs them.

A minimal SKILL.md looks like:

---
name: hugging-face-model-trainer
description: Train or fine-tune language models using TRL on Hugging Face Jobs.
  Covers SFT, DPO, GRPO, and reward modeling. Use when the user wants to train
  a model, fine-tune on custom data, or set up an RL training pipeline.
license: Apache-2.0
metadata:
  author: huggingface
  version: "1.0"
---

The body then contains the actual workflow: how to validate the dataset format, select hardware, generate the training script, submit the job, and monitor it through Trackio.

The Cross-Platform Interoperability Model

The defining characteristic of HF Skills is that the same skill works across four major coding agents without modification:

Coding Agent	Installation Method
Claude Code	`/plugin marketplace add huggingface/skills`, then `/plugin install <skill-name>@huggingface/skills`
OpenAI Codex	`$skill-installer install <skill>` or from cloned local directory
Google Gemini CLI	`gemini extensions install`
Cursor	`.cursor-plugin/plugin.json` + `.mcp.json` manifests

This interoperability is non-trivial. Before the Agent Skills standard, teams building with multiple coding agents had to maintain parallel documentation or agent-specific configuration for every domain capability. A single SKILL.md eliminates that duplication.

The Nine Available Skills

As of early 2026, the repository contains nine skills, split across two categories:

Domain-Specific Skills

These target AI/ML workflows directly:

Skill	Core Capability
`hugging-face-model-trainer`	Fine-tune LLMs via TRL (SFT, DPO, GRPO), hardware selection, LoRA configuration
`hugging-face-datasets`	Create, validate, and push datasets to HF Hub
`hugging-face-evaluation`	Add structured evaluation results to model cards
`hugging-face-jobs`	Submit and monitor compute jobs on HF infrastructure
`hugging-face-trackio`	Track training metrics with real-time Trackio visualizations
`hugging-face-paper-publisher`	Index arXiv papers on HF Hub, link to models and datasets

Tool Skills

These teach agents how to use HF’s own tooling:

Skill	Core Capability
`hugging-face-cli`	Hub operations (upload, download, auth) via the `hf` CLI
`gradio`	Build interactive ML demos and web UIs
`hugging-face-tool-builder`	Generate reusable scripts for repeated HF API operations

A Concrete Workflow: Fine-Tuning With One Instruction

The model trainer skill is the most documented capability, and it illustrates the practical gap Skills bridge. A user can issue:

Fine-tune Qwen3-0.6B on the open-r1/codeforces-cots dataset for instruction following

The agent, with the hugging-face-model-trainer skill active, will:

Validate the dataset schema against TRL’s expected messages or prompt/completion format
Select hardware based on model size (t4-small for <1B parameter models)
Generate a TRL training script with appropriate hyperparameters
Submit the job to HF Jobs infrastructure
Return a Trackio dashboard URL for real-time loss and learning rate monitoring
Push the final model checkpoint to HF Hub under the user’s namespace

According to Hugging Face’s blog post announcing the capability, a complete training run on a 0.6B model costs approximately $0.30 at roughly $0.75/hour on a t4-small instance.¹

The hardware decision tree embedded in the skill reflects production knowledge that would otherwise require reading multiple docs pages:

Model Size	Recommended Hardware	Approach
<1B parameters	t4-small (~$0.75/hr)	Full fine-tune
1–3B parameters	t4-medium or a10g-small	Full fine-tune
3–7B parameters	a10g-large	LoRA (auto-applied)
>7B parameters	Not supported via this skill	Requires custom setup

Skills vs. Smolagents: Complementary Architectures

A common point of confusion is how HF Skills relates to smolagents, Hugging Face’s Python framework for building code-writing agents. They address different layers of the stack:

Dimension	HF Skills	smolagents
Type	Instruction packages	Python framework
Primary user	Developers using existing coding agents	Developers building new agents
Format	Markdown + YAML (SKILL.md)	Python code
Agent compatibility	Claude Code, Codex, Gemini CLI, Cursor	Any LLM via HF Inference or API
Composability mechanism	Skill activation via plugin/extension system	Code generation with tool nesting
Hub integration	Skills shared via GitHub repo	Tools/agents shared as Gradio Spaces
Distribution	Apache 2.0 GitHub repo	PyPI package

smolagents introduced a key architectural insight: having agents write actions in Python code—rather than JSON tool calls—enables natural composability through function nesting, loops, and conditionals. A smolagents CodeAgent can call a tool, process the result, branch conditionally, and call another tool—all in a single generated code block.

HF Skills operate at a higher abstraction level: they provide the domain knowledge that makes either a smolagents-powered agent or a third-party coding agent effective at ML tasks, without requiring the practitioner to write agent infrastructure code.

The Agent Skills Standard: Context

The Agent Skills format HF Skills uses was open-sourced by Anthropic and is now defined at agentskills.io. The spec follows a progressive disclosure model:

Metadata fields (name, description) are token-efficient identifiers loaded at startup
Instruction body content is loaded only when a skill is activated
Reference files in scripts/ and references/ subdirectories are fetched on demand

The specification includes a skills-ref validation tool for checking SKILL.md compliance:

skills-ref validate ./my-skill

This positions Agent Skills as an interoperability layer analogous to what MCP became for tool connectivity—a format that benefits from wide adoption because skill packs become reusable across the ecosystem rather than locked to specific agents.

Failure Modes and Practical Limitations

The library is young and carries meaningful limitations practitioners should understand before relying on it in production.

Model size ceiling: The model trainer skill’s documented upper limit for practical use is approximately 7B parameters. The HF blog post initially claimed 70B support, but later clarified the actual ceiling is smaller—agents selecting hardware for large models will encounter job failures or need manual intervention.¹

Job account requirements: Cloud compute submission via HF Jobs requires a paid HF account. Teams evaluating the library on free-tier accounts will hit this constraint immediately when attempting training workflows.

Skill quality variance: The nine skills vary in maturity. The model trainer and CLI skills have detailed reference documentation and tested scripts; newer additions like hugging-face-paper-publisher are thinner. As with any open-source repository at this stage, skills are maintained by different contributors with different documentation standards.

No execution sandboxing: Skills load instructions into the agent’s context—the agent then executes code in your environment. There is no built-in sandboxing. The spec includes an experimental allowed-tools frontmatter field for pre-approving specific tool calls, but support varies by agent implementation.

Who Should Use HF Skills Today?

The library is well-suited for three practitioner profiles:

ML engineers iterating on models: The model trainer skill removes the friction of looking up TRL configuration syntax and HF Jobs submission patterns every fine-tuning run.
Researchers publishing on HF Hub: The datasets, evaluation, and paper-publisher skills systematize the Hub housekeeping that often gets deferred—model card eval tables, arXiv paper linking, dataset schema validation.
Teams standardizing agent workflows: Organizations using multiple coding agents (some developers on Claude Code, others on Cursor or Codex) get consistent AI/ML workflow guidance without maintaining parallel documentation.

The library is less suitable as a production automation backbone today. Skills provide guidance, not guarantees—agent execution still involves stochastic behavior, and the lack of built-in error recovery means complex multi-skill pipelines require human oversight.

Frequently Asked Questions

Q: Do HF Skills require Hugging Face’s own agents or smolagents? A: No. Skills work with any Agent Skills-compatible coding agent: Claude Code, OpenAI Codex, Google Gemini CLI, and Cursor are all supported. Smolagents is a separate HF framework for building agents, not a prerequisite for using Skills.

Q: How do Skills differ from MCP tools? A: MCP gives agents live access to data—Hub search results, model metadata, API responses. Skills give agents procedural knowledge—the judgment to use those tools correctly for ML tasks like fine-tuning or evaluation. They are complementary; the HF Skills repo includes MCP configuration so both can run together.

Q: Can I write a custom skill for my organization’s ML workflow? A: Yes. The Agent Skills specification is open, and the skills-ref CLI validates custom SKILL.md files. HF’s own blog recommends using the official skills as building blocks for more domain-specific capabilities, rather than as exhaustive coverage of every workflow.

Q: What does it cost to run the model training skill? A: According to Hugging Face’s documentation, a complete SFT run on a 0.6B model costs approximately $0.30 using a t4-small instance at ~$0.75/hour. Costs scale with model size and hardware tier. HF Jobs requires a Pro or Team account.

Q: Are there limitations on model size? A: The model trainer skill practically supports models up to approximately 7B parameters, with LoRA automatically applied for models above 3B to manage memory. Larger models require infrastructure not covered by the current skill set.

Sources:

Hugging Face Blog. “We Got Claude to Fine-Tune an Open Source LLM.” Hugging Face, 2025. https://huggingface.co/blog/hf-skills-training ↩ ↩²