Data IS Your PRD: Andrew Ng's Framework for AI Product Management

Data IS Your PRD: Andrew Ng’s Framework for AI Product Management

In traditional software development, the Product Requirements Document (PRD) has been the north star. A meticulous blueprint that dictates what the product should do, how it should behave, and what success looks like. But as artificial intelligence fundamentally reshapes how we build software, a provocative new framework is emerging from one of AI’s most influential voices: Andrew Ng’s “Data is Your PRD.”

This isn’t just a catchy slogan—it’s a paradigm shift that challenges everything product managers thought they knew about building successful AI applications. And for organizations still clinging to traditional specification documents, it may be the difference between AI products that thrive and those that fail.

The Death of the Traditional PRD

For decades, software development followed a predictable pattern: product managers wrote detailed PRDs, engineers implemented the specifications, and QA teams verified the outputs matched the requirements. This worked because traditional software is deterministic—given the same inputs, the same code produces the same outputs every time.

AI systems break this contract.

Large Language Models (LLMs) and other AI systems are probabilistic by nature. The same prompt can yield different responses. Edge cases that would be caught in traditional software testing multiply exponentially. And perhaps most importantly, the “code” that determines system behavior isn’t written by engineers—it’s learned from data.

As Andrej Karpathy famously observed in his seminal 2017 essay “Software 2.0,” neural networks represent a fundamental shift in how we develop software. In Software 1.0, humans write explicit instructions. In Software 2.0, humans define objectives and provide data, and the system learns the implementation. The source code of Software 2.0 “comprises 1) the dataset that defines the desirable behavior and 2) the neural net architecture that gives the rough skeleton of the code.”

This observation, made years before the generative AI explosion, now forms the theoretical foundation for Ng’s practical framework.

What “Data is Your PRD” Actually Means

Andrew Ng’s framework, articulated through his work at DeepLearning.AI and various public appearances, rests on three fundamental principles that every AI product manager must internalize:

1. Specifications Become Data Collections

In traditional product management, you define what the product should do through written specifications. In AI product management, you define what the product should do through carefully curated datasets.

Instead of writing “the chatbot should provide helpful customer service responses,” you collect thousands of examples of helpful customer service interactions. Instead of specifying “the code review tool should identify security vulnerabilities,” you assemble datasets of code with and without security issues.

The data becomes the specification. The quality, diversity, and relevance of your data directly determine the quality, diversity, and relevance of your product’s capabilities.

2. Success Metrics Shift from Features to Performance

Traditional PRDs measure success through feature completion: “Did we build the login system?” “Does the dashboard display analytics?” AI PRDs must measure success through performance on representative tasks: “Does our model achieve 95% accuracy on our evaluation set?” “Do human evaluators rate 90% of outputs as ‘good’ or better?”

This requires product managers to become fluent in evaluation metrics that were once the domain of ML engineers. Accuracy, precision, recall, F1 scores, perplexity, BLEU scores, human evaluation protocols—these become the language of product requirements.

3. Iteration Cycles Accelerate Dramatically

Traditional software releases happen in weeks or months. AI systems can iterate in hours or days—if you have the right data infrastructure. When your PRD is a dataset, “updating requirements” means collecting new examples, not rewriting documents.

This acceleration is both an opportunity and a challenge. Product managers must develop new muscles for rapid experimentation, A/B testing at scale, and continuous deployment of model improvements.

Concrete Examples: Data as PRD in Practice

To understand how this works in practice, let’s examine three real-world scenarios where traditional PRDs fail and data-driven requirements succeed.

Example 1: Customer Support Chatbots

Traditional PRD Approach:

  • Write detailed conversation flows
  • Define keyword triggers
  • Specify response templates
  • Result: Brittle systems that break when users deviate from expected patterns

Data-as-PRD Approach:

  • Collect 10,000 real customer support conversations
  • Label ideal responses and common failure modes
  • Define evaluation: 85% of responses rated “helpful” by human evaluators
  • Continuously collect new conversations and retrain
  • Result: Adaptive systems that handle unexpected queries gracefully

Example 2: Code Generation Assistants

Traditional PRD Approach:

  • Specify supported programming languages
  • Define code style guidelines
  • Document API integrations
  • Result: Limited to explicitly programmed capabilities

Data-as-PRD Approach:

  • Curate millions of high-quality code examples from open-source repositories
  • Label code by functionality, quality, and security characteristics
  • Define evaluation: Generated code passes test cases 90% of the time
  • Result: Systems that generate novel solutions not seen in training

Example 3: Document Analysis Tools

Traditional PRD Approach:

  • Define document types to support
  • Specify fields to extract
  • Write extraction rules
  • Result: Fragile parsers that fail on format variations

Data-as-PRD Approach:

  • Collect diverse document examples with human-annotated extractions
  • Define evaluation: 95% field extraction accuracy across document variations
  • Continuously expand dataset with new document types
  • Result: Robust systems that generalize to unseen formats

The Infrastructure Required

Embracing “Data is Your PRD” requires building infrastructure that traditional software teams rarely need:

Data Collection Pipelines

You need systematic ways to collect, clean, and annotate data. This often means building human-in-the-loop systems where domain experts label examples, review model outputs, and provide feedback that becomes training data.

Evaluation Frameworks

You need rigorous evaluation protocols that measure performance on tasks that matter to users. This includes both automated metrics (accuracy, F1, etc.) and human evaluation protocols that capture subjective quality.

Version Control for Data

Just as code has version control, your datasets need versioning. Tools like DVC (Data Version Control), Weights & Biases, and MLflow help track dataset changes, model performance, and experimental results.

Continuous Training Pipelines

You need infrastructure to retrain models as new data becomes available. This includes data validation, model training, evaluation, and deployment pipelines that can run automatically or on-demand.

Challenges and Pitfalls

The “Data is Your PRD” framework is powerful but not without challenges:

Data Quality vs. Quantity

More data isn’t always better. Poor-quality data leads to poor-quality models. Product managers must develop taste for what constitutes “good” training data, which often requires deep domain expertise.

The Cold Start Problem

New AI products face a chicken-and-egg problem: you need data to build the product, but you need the product to collect data. Successful teams often bootstrap with synthetic data, open datasets, or manual data collection before launching.

Bias and Fairness

Your data reflects the world as it is, including its biases. If your training data comes from historical hiring decisions, your AI will replicate historical biases. Product managers must actively work to identify and mitigate these issues.

Regulatory Compliance

As AI regulation increases (from the EU AI Act to various U.S. state laws), maintaining documentation of what data was used to train models becomes a compliance requirement, not just a best practice.

The Role of the AI Product Manager

In this new paradigm, what does an AI product manager actually do? The role evolves in three key directions:

Data Strategist

AI PMs must think deeply about what data to collect, how to collect it ethically, and how to maintain data quality over time. They become stewards of their organization’s most valuable AI asset: its datasets.

Evaluation Designer

Instead of writing feature specifications, AI PMs design evaluation protocols. They define what “good” looks like, create test sets that represent real user needs, and establish metrics that correlate with user satisfaction.

Iteration Orchestrator

AI PMs orchestrate rapid iteration cycles. They decide when to collect more data, when to retrain models, when to run experiments, and when to deploy improvements. They balance the tradeoffs between model performance, latency, cost, and user experience.

Looking Forward: The Future of AI Product Management

As AI capabilities continue to advance, the “Data is Your PRD” framework will likely evolve in several directions:

Automated Data Generation

Synthetic data generation—creating training examples through AI rather than human collection—is becoming increasingly sophisticated. This may reduce the burden of manual data collection while raising new questions about data quality and diversity.

Real-Time Adaptation

Future AI systems may adapt in real-time based on user interactions, continuously updating their “PRDs” without explicit retraining. This raises both technical challenges (catastrophic forgetting) and product challenges (predictable behavior).

Multi-Modal Requirements

As AI systems handle text, images, audio, and video, the “data” in “Data is Your PRD” becomes increasingly complex. Product managers will need to think in terms of multi-modal datasets and cross-modal evaluation.

Conclusion

Andrew Ng’s “Data is Your PRD” framework represents more than a new way of documenting requirements—it’s a fundamental rethinking of what product management means in the age of AI. It shifts the focus from prescriptive specifications to empirical performance, from feature lists to dataset curation, from deterministic outcomes to probabilistic excellence.

For product managers entering the AI space, the message is clear: your most important deliverable isn’t a document—it’s a dataset. Your most critical skill isn’t writing specifications—it’s designing evaluations. And your most valuable contribution isn’t defining what the product should do—it’s curating the data that teaches the product how to succeed.

The organizations that master this shift will build AI products that adapt, improve, and delight users in ways that traditional software never could. Those that cling to old paradigms will find themselves writing PRDs for products that never quite work as specified.

In the end, data isn’t just your PRD—it’s your product’s DNA, its curriculum, and its conscience. Treat it accordingly.


Sources and References:

  • Karpathy, Andrej. “Software 2.0.” Medium, 2017.
  • Ng, Andrew. Various lectures and writings on AI product management via DeepLearning.AI
  • Industry practices from leading AI companies including OpenAI, Anthropic, Google, and Meta
  • Academic research on ML evaluation and data-centric AI