Common Metadata Framework (CMF) is a metadata tracking and versioning system for ML pipelines. It tracks code, data, and pipeline metricsβoffering Git-like metadata management across distributed environments.
- β Track artifacts (datasets, models, metrics) using content-based hashes
- β Automatically logs code versions (Git) and data versions (DVC)
- β Push/pull metadata via CLI across distributed sites
- β REST API for direct server interaction
- β Implicit & explicit tracking of pipeline execution
- β Fine-grained or coarse-grained metric logging
- Linux/Ubuntu/Debian
- Python >=3.9, <3.11
- Git (latest)
Conda
conda create -n cmf python=3.10
conda activate cmf
Virtualenv
virtualenv --python=3.10 .cmf
source .cmf/bin/activate
Latest from GitHub
pip install git+https://github.com/HewlettPackard/cmf
Stable from PyPI
pip install cmflib
π Follow the guide in docs/cmf_server/cmf-server.md
CMF tracks pipeline stages, inputs/outputs, metrics, and code. It supports decentralized execution across datacenters, edge, and cloud.
- Artifacts are versioned using DVC (
.dvc
files). - Code is tracked with Git.
- Metadata is logged to relational DB (e.g., SQLite, PostgreSQL)
- Sync metadata with
cmf metadata push
andcmf metadata pull
.
CMF is composed of:
- cmflib - metadata library provides API to log/query metadata
- cmf-client β CLI to sync metadata with server, push/pull artifacts to the user-specified repo, push/pull code from git.
- cmf-server β REST API for metadata merge
- Central Repositories β Git (code), DVC (artifacts), CMF (metadata)
from cmflib.cmf import Cmf
from ml_metadata.proto import metadata_store_pb2 as mlpb
cmf = Cmf(filepath="mlmd", pipeline_name="test_pipeline")
context: mlpb.proto.Context = cmf.create_context(
pipeline_stage="prepare",
custom_properties ={"user-metadata1": "metadata_value"}
)
execution: mlpb.proto.Execution = cmf.create_execution(
execution_type="Prepare",
custom_properties = {"split": split, "seed": seed}
)
artifact: mlpb.proto.Artifact = metawriter.log_dataset(
"artifacts/data.xml.gz", "input",
custom_properties={"user-metadata1": "metadata_value"}
)
cmf # CLI to manage metadata and artifacts
cmf init # Initialize artifact repository
cmf init show # Show current CMF config
cmf metadata push # Push metadata to server
cmf metadata pull # Pull metadata from server
β‘οΈ For the complete list of commands, please refer to the Command Reference
- Full ML pipeline observability
- Unified metadata, artifact, and code tracking
- Scalable metadata syncing
- Team collaboration on metadata
- π¬ Join CMF on Slack
- π§ Contact: annmary.roy@hpe.com
Licensed under the Apache 2.0 License
Β© Hewlett Packard Enterprise. Built for reproducibility in ML.