The AI Bill of Materials: Governing Data, Features, Models, and Prompts
We've all seen the AI demos. In 15 minutes, someone spins up a slick RAG app or fine-tunes a model to automate a repetitive task. It's exciting stuff. But as a data architect helping teams bring these projects into production, I keep seeing the same thing happen: everything slows down when governance, security, and reliability show up.
That gap between a cool demo and a stable production system? It's wide—and often underestimated. This post is about how my team and I bridge that gap by applying a concept from manufacturing—the Bill of Materials—and how we bring it to life using Unity Catalog as our governance backbone.
Copyright © 2025 Bayesian AI Solutions Consulting Partners
Where Things Break Down
Once a project leaves the sandbox and enters a shared environment, things get real. Here are the patterns I see over and over:
Unclear Ownership
Who's responsible if a prompt starts hallucinating or a model's performance drifts? Usually, the person who built it has moved on, and the ops team doesn't have the context to fix it.
Missing Lineage
Someone asks, "Did this model train on restricted data?" and it turns into a days-long investigation.
Untracked Assets
Prompts and eval sets go untracked. Critical assets live in notebooks or local files, making audits and reproducibility a nightmare.
No Promotion Gates
Models get pushed to prod without checks for bias, data quality, or safety.
Copyright © 2025 Bayesian AI Solutions Consulting Partners
Introducing the AI Bill of Materials
To get a handle on this, we created an AI Bill of Materials (AI-BoM)—a structured inventory of everything that goes into building, deploying, and monitoring AI systems. It gives us the transparency we need to move faster without cutting corners.
Think of the AI-BoM as more than a list—it's a connected map of every moving part. This approach transforms governance from a bottleneck into an accelerator, providing the visibility and control needed for production AI systems.
Copyright © 2025 Bayesian AI Solutions Consulting Partners
What's in My AI-BoM
Datasets
Tables, views, volumes—raw and transformed, training sets, inference logs. We record schema, sensitivity, and retention policies.
Features & Embeddings
Where they come from, how they're calculated, and where they're stored (e.g., a Vector Search index).
Models
The artifact, hyperparameters, training code, version, stage (Staging, Prod), and eval metrics.
Prompts & Templates
Treated like code. We track changes, ownership, and usage patterns.
Eval Datasets
Ground truth data and performance scores. We need to know which model version was tested on what.
Serving Endpoints
We log config, uptime, and the models behind them for complete visibility.
For each of these, we add metadata: owner, purpose, sensitivity, retention period, lifecycle stage. This metadata turns a plain inventory into a usable governance tool.
Copyright © 2025 Bayesian AI Solutions Consulting Partners
Unity Catalog: The Governance Backbone
If the AI-BoM is the what, Unity Catalog (UC) is the where. I've come to see UC as more than a data catalog—it's the backbone of our AI governance.
Three-Level Structure
We organize everything using UC's namespace: Catalog.Schema.Table/Model/Volume. We use domain, environment, and product as the structure—like finance_prod.risk_modeling.loan_applications.
Automatic Lineage
When someone trains a model using features from a table and registers it in UC, that connection is captured. This is gold when you're debugging, auditing, or just trying to answer "what changed?"
Consistent tagging across all these assets makes them discoverable and enforceable. Because we can tag tables, models, and volumes in one place, we can create rules that apply across the board.
Copyright © 2025 Bayesian AI Solutions Consulting Partners
Tagging That People Actually Use
A governance system only works if people actually use it. We make tagging easy and useful by standardizing just a few labels:
Sensitivity
public, internal, confidential, restricted
PII Tags
email, phone, government_id
Purpose
training, eval, inference, monitoring
Owner
Team email alias
Retention
Number of days
ALTER TABLE finance.customers SET TBLPROPERTIES ( 'uc.owner'='[email protected]', 'uc.sensitivity'='restricted', 'uc.pii'='email,phone', 'uc.purpose'='inference', 'uc.retention_days'='365' );
This makes it easy to filter assets, apply access controls, and avoid surprises during audits.
Copyright © 2025 Bayesian AI Solutions Consulting Partners
Policies as Code = Less Stress
Good governance should enable teams, not block them. We manage all access policies through Infrastructure as Code (usually Terraform), so every change is versioned, reviewed, and roll-backable.
One key practice: separating dev identities from runtime identities. A dev might debug a pipeline, but the pipeline runs under a scoped service principal.
We also use UC's dynamic masking and row filtering to protect sensitive data without making copies. This approach ensures data protection while maintaining usability for authorized users.
CREATE OR REPLACE FUNCTION mask_email(e STRING) RETURNS STRING RETURN CASE WHEN is_account_group_member('pii_readers') THEN e ELSE regexp_replace(e, '(^.).+(@.+)$', '\\1***\\2') END;
Copyright © 2025 Bayesian AI Solutions Consulting Partners
Promotion Gates: Dev → Staging → Prod
We set up automated gates to control what gets promoted. These gates ensure quality and compliance at every stage:
1
Datasets & Features
Check for data freshness, null spikes, and schema validation before promotion.
2
Models & Prompts
Require passing eval scores, bias checks, and prompt regressions before deployment.
3
Compliance
Block models trained on restricted data from being promoted to unauthorized use cases.
SELECT CASE WHEN eval.f1 >= 0.80 AND NOT EXISTS (SELECT 1 FROM training_sources WHERE sensitivity IN ('restricted')) THEN 'ALLOW' ELSE 'BLOCK' END AS gate_result;
And yes, we link all gate results back to the AI-BoM for auditing.
Copyright © 2025 Bayesian AI Solutions Consulting Partners
Real-World Example: Governed Support Assistant
One of our recent projects was a RAG-based support assistant. Here's how governance showed up throughout the system:
1
Data Layer
chat_history (the core dataset) was tagged as restricted with PII columns masked for protection.
2
Feature Store
Embeddings were stored in a Vector Search index with lineage tracked back to source data.
3
Model Layer
Prompt templates were versioned and governed in UC, with evaluation sets clearly labeled.
4
Production
The final assistant ran behind a model endpoint using a scoped service principal with read-only access.
We can trace any bad response back to the original data, prompt, and model version—complete transparency for debugging and auditing.
Copyright © 2025 Bayesian AI Solutions Consulting Partners
Your AI-BoM Starter Checklist
You don't need to build the perfect system on day one. Start small with these foundational steps:
01
Define a Tag Taxonomy
Keep it simple and focused on your most critical governance needs.
02
Structure UC Namespaces
Use domains and environments for clear organization.
03
Write Policies as Code
Track all changes in Git for version control and rollback capability.
04
Set Up Promotion Gates
Start with basic checks and expand over time.
05
Learn Lineage Queries
Practice answering real questions with audit data.
{ "taxonomy_version": "1.0", "labels": { "sensitivity": ["public", "internal", "confidential", "restricted"], "pii": ["none", "name", "email", "phone", "address", "government_id"], "purpose": ["training", "eval", "inference", "monitoring", "analytics"], "retention_days": [30, 90, 365, 1825], "owner": ["[email protected]"], "risk_level": ["low", "medium", "high"] } }
By treating Unity Catalog as the central nervous system of our AI-BoM, we've turned governance from a bottleneck into an accelerator. If you're on a similar journey and want templates for taxonomy, Terraform, or gate SQL, just reach out—happy to share.
Copyright © 2025 Bayesian AI Solutions Consulting Partners
0