Table of Contents

Hugging Face Skills are structured instruction packages that give any compatible coding agent—Claude Code, OpenAI Codex, Google Gemini CLI, or Cursor—the procedural knowledge to execute AI/ML workflows end-to-end. Released in November 2025 under Apache 2.0, the library accumulated 7,500 GitHub stars by early 2026, signaling rapid practitioner adoption. Skills solve a narrow but critical problem: a model that can write code doesn’t automatically know the best practices for training on HF infrastructure.

What Is the HF Skills Library?

The Hugging Face Skills repository is a curated collection of Agent Skills-formatted instruction packages. Each skill is a directory with a SKILL.md file at its core: a markdown document with YAML frontmatter (name, description, optional compatibility metadata) followed by the task guidance the agent reads upon activation.

Think of each skill as a domain-aware runbook. Where MCP tools give agents access to live data and API calls, skills give agents the judgment to use that data correctly—which GPU to select for a 3B parameter model, when to apply LoRA versus full fine-tuning, how to chain SFT into DPO training runs.

The Agent Skills format itself, now hosted at agentskills.io, was open-sourced by Anthropic as a cross-platform specification. Hugging Face’s repository was among the first major adopters, bringing 9 domain-specific skills that cover the full ML lifecycle.

How Do Skills Work?

The SKILL.md Structure

Every skill follows a deliberate three-tier information architecture designed for token efficiency:

  1. Metadata (~100 tokens): The name and description fields are loaded at startup across all installed skills. Agents use these to decide which skill applies to a given request.
  2. Instructions (<5,000 tokens recommended): The full SKILL.md body loads when the agent activates the skill. This contains step-by-step workflows, decision trees, and best practices.
  3. Supporting resources (on demand): Scripts in scripts/, reference docs in references/, and templates in assets/ load only when the agent needs them.

A minimal SKILL.md looks like:

---
name: hugging-face-model-trainer
description: Train or fine-tune language models using TRL on Hugging Face Jobs.
Covers SFT, DPO, GRPO, and reward modeling. Use when the user wants to train
a model, fine-tune on custom data, or set up an RL training pipeline.
license: Apache-2.0
metadata:
author: huggingface
version: "1.0"
---

The body then contains the actual workflow: how to validate the dataset format, select hardware, generate the training script, submit the job, and monitor it through Trackio.

The Cross-Platform Interoperability Model

The defining characteristic of HF Skills is that the same skill works across four major coding agents without modification:

Coding AgentInstallation Method
Claude Code/plugin marketplace add huggingface/skills, then /plugin install <skill-name>@huggingface/skills
OpenAI Codex$skill-installer install <skill> or from cloned local directory
Google Gemini CLIgemini extensions install
Cursor.cursor-plugin/plugin.json + .mcp.json manifests

This interoperability is non-trivial. Before the Agent Skills standard, teams building with multiple coding agents had to maintain parallel documentation or agent-specific configuration for every domain capability. A single SKILL.md eliminates that duplication.

The Nine Available Skills

As of early 2026, the repository contains nine skills, split across two categories:

Domain-Specific Skills

These target AI/ML workflows directly:

SkillCore Capability
hugging-face-model-trainerFine-tune LLMs via TRL (SFT, DPO, GRPO), hardware selection, LoRA configuration
hugging-face-datasetsCreate, validate, and push datasets to HF Hub
hugging-face-evaluationAdd structured evaluation results to model cards
hugging-face-jobsSubmit and monitor compute jobs on HF infrastructure
hugging-face-trackioTrack training metrics with real-time Trackio visualizations
hugging-face-paper-publisherIndex arXiv papers on HF Hub, link to models and datasets

Tool Skills

These teach agents how to use HF’s own tooling:

SkillCore Capability
hugging-face-cliHub operations (upload, download, auth) via the hf CLI
gradioBuild interactive ML demos and web UIs
hugging-face-tool-builderGenerate reusable scripts for repeated HF API operations

A Concrete Workflow: Fine-Tuning With One Instruction

The model trainer skill is the most documented capability, and it illustrates the practical gap Skills bridge. A user can issue:

Fine-tune Qwen3-0.6B on the open-r1/codeforces-cots dataset for instruction following

The agent, with the hugging-face-model-trainer skill active, will:

  1. Validate the dataset schema against TRL’s expected messages or prompt/completion format
  2. Select hardware based on model size (t4-small for <1B parameter models)
  3. Generate a TRL training script with appropriate hyperparameters
  4. Submit the job to HF Jobs infrastructure
  5. Return a Trackio dashboard URL for real-time loss and learning rate monitoring
  6. Push the final model checkpoint to HF Hub under the user’s namespace

According to Hugging Face’s blog post announcing the capability, a complete training run on a 0.6B model costs approximately $0.30 at roughly $0.75/hour on a t4-small instance.1

The hardware decision tree embedded in the skill reflects production knowledge that would otherwise require reading multiple docs pages:

Model SizeRecommended HardwareApproach
<1B parameterst4-small (~$0.75/hr)Full fine-tune
1–3B parameterst4-medium or a10g-smallFull fine-tune
3–7B parametersa10g-largeLoRA (auto-applied)
>7B parametersNot supported via this skillRequires custom setup

Skills vs. Smolagents: Complementary Architectures

A common point of confusion is how HF Skills relates to smolagents, Hugging Face’s Python framework for building code-writing agents. They address different layers of the stack:

DimensionHF Skillssmolagents
TypeInstruction packagesPython framework
Primary userDevelopers using existing coding agentsDevelopers building new agents
FormatMarkdown + YAML (SKILL.md)Python code
Agent compatibilityClaude Code, Codex, Gemini CLI, CursorAny LLM via HF Inference or API
Composability mechanismSkill activation via plugin/extension systemCode generation with tool nesting
Hub integrationSkills shared via GitHub repoTools/agents shared as Gradio Spaces
DistributionApache 2.0 GitHub repoPyPI package

smolagents introduced a key architectural insight: having agents write actions in Python code—rather than JSON tool calls—enables natural composability through function nesting, loops, and conditionals. A smolagents CodeAgent can call a tool, process the result, branch conditionally, and call another tool—all in a single generated code block.

HF Skills operate at a higher abstraction level: they provide the domain knowledge that makes either a smolagents-powered agent or a third-party coding agent effective at ML tasks, without requiring the practitioner to write agent infrastructure code.

The Agent Skills Standard: Context

The Agent Skills format HF Skills uses was open-sourced by Anthropic and is now defined at agentskills.io. The spec follows a progressive disclosure model:

  • Metadata fields (name, description) are token-efficient identifiers loaded at startup
  • Instruction body content is loaded only when a skill is activated
  • Reference files in scripts/ and references/ subdirectories are fetched on demand

The specification includes a skills-ref validation tool for checking SKILL.md compliance:

Terminal window
skills-ref validate ./my-skill

This positions Agent Skills as an interoperability layer analogous to what MCP became for tool connectivity—a format that benefits from wide adoption because skill packs become reusable across the ecosystem rather than locked to specific agents.

Failure Modes and Practical Limitations

The library is young and carries meaningful limitations practitioners should understand before relying on it in production.

Model size ceiling: The model trainer skill’s documented upper limit for practical use is approximately 7B parameters. The HF blog post initially claimed 70B support, but later clarified the actual ceiling is smaller—agents selecting hardware for large models will encounter job failures or need manual intervention.1

Job account requirements: Cloud compute submission via HF Jobs requires a paid HF account. Teams evaluating the library on free-tier accounts will hit this constraint immediately when attempting training workflows.

Skill quality variance: The nine skills vary in maturity. The model trainer and CLI skills have detailed reference documentation and tested scripts; newer additions like hugging-face-paper-publisher are thinner. As with any open-source repository at this stage, skills are maintained by different contributors with different documentation standards.

No execution sandboxing: Skills load instructions into the agent’s context—the agent then executes code in your environment. There is no built-in sandboxing. The spec includes an experimental allowed-tools frontmatter field for pre-approving specific tool calls, but support varies by agent implementation.

Who Should Use HF Skills Today?

The library is well-suited for three practitioner profiles:

  1. ML engineers iterating on models: The model trainer skill removes the friction of looking up TRL configuration syntax and HF Jobs submission patterns every fine-tuning run.

  2. Researchers publishing on HF Hub: The datasets, evaluation, and paper-publisher skills systematize the Hub housekeeping that often gets deferred—model card eval tables, arXiv paper linking, dataset schema validation.

  3. Teams standardizing agent workflows: Organizations using multiple coding agents (some developers on Claude Code, others on Cursor or Codex) get consistent AI/ML workflow guidance without maintaining parallel documentation.

The library is less suitable as a production automation backbone today. Skills provide guidance, not guarantees—agent execution still involves stochastic behavior, and the lack of built-in error recovery means complex multi-skill pipelines require human oversight.


Frequently Asked Questions

Q: Do HF Skills require Hugging Face’s own agents or smolagents? A: No. Skills work with any Agent Skills-compatible coding agent: Claude Code, OpenAI Codex, Google Gemini CLI, and Cursor are all supported. Smolagents is a separate HF framework for building agents, not a prerequisite for using Skills.

Q: How do Skills differ from MCP tools? A: MCP gives agents live access to data—Hub search results, model metadata, API responses. Skills give agents procedural knowledge—the judgment to use those tools correctly for ML tasks like fine-tuning or evaluation. They are complementary; the HF Skills repo includes MCP configuration so both can run together.

Q: Can I write a custom skill for my organization’s ML workflow? A: Yes. The Agent Skills specification is open, and the skills-ref CLI validates custom SKILL.md files. HF’s own blog recommends using the official skills as building blocks for more domain-specific capabilities, rather than as exhaustive coverage of every workflow.

Q: What does it cost to run the model training skill? A: According to Hugging Face’s documentation, a complete SFT run on a 0.6B model costs approximately $0.30 using a t4-small instance at ~$0.75/hour. Costs scale with model size and hardware tier. HF Jobs requires a Pro or Team account.

Q: Are there limitations on model size? A: The model trainer skill practically supports models up to approximately 7B parameters, with LoRA automatically applied for models above 3B to manage memory. Larger models require infrastructure not covered by the current skill set.


Sources:

Footnotes

  1. Hugging Face Blog. “We Got Claude to Fine-Tune an Open Source LLM.” Hugging Face, 2025. https://huggingface.co/blog/hf-skills-training 2

Enjoyed this article?

Stay updated with our latest insights on AI and technology.