Skip to content

varkha-d-sharma/cmf

 
 

Repository files navigation

Common Metadata Framework (CMF)

Deploy Docs PyPI version Docs License

Common Metadata Framework (CMF) is a metadata tracking and versioning system for ML pipelines. It tracks code, data, and pipeline metrics—offering Git-like metadata management across distributed environments.


🚀 Features

  • ✅ Track artifacts (datasets, models, metrics) using content-based hashes
  • ✅ Automatically logs code versions (Git) and data versions (DVC)
  • ✅ Push/pull metadata via CLI across distributed sites
  • ✅ REST API for direct server interaction
  • ✅ Implicit & explicit tracking of pipeline execution
  • ✅ Fine-grained or coarse-grained metric logging

📦 Installation

Requirements

  • Linux/Ubuntu/Debian
  • Python >=3.9, <3.11
  • Git (latest)

Virtual Environment

Conda
conda create -n cmf python=3.10
conda activate cmf
Virtualenv
virtualenv --python=3.10 .cmf
source .cmf/bin/activate

Install CMF

Latest from GitHub
pip install git+https://github.com/HewlettPackard/cmf
Stable from PyPI
pip install cmflib

Server Setup

📖 Follow the guide in docs/cmf_server/cmf-server.md


📘 Documentation


🧠 How It Works

CMF tracks pipeline stages, inputs/outputs, metrics, and code. It supports decentralized execution across datacenters, edge, and cloud.

  • Artifacts are versioned using DVC (.dvc files).
  • Code is tracked with Git.
  • Metadata is logged to relational DB (e.g., SQLite, PostgreSQL)
  • Sync metadata with cmf metadata push and cmf metadata pull.

🏛 Architecture

CMF is composed of:

  • cmflib - metadata library provides API to log/query metadata
  • cmf-client – CLI to sync metadata with server, push/pull artifacts to the user-specified repo, push/pull code from git.
  • cmf-server – REST API for metadata merge
  • Central Repositories – Git (code), DVC (artifacts), CMF (metadata)


🔧 Sample Usage

from cmflib.cmf import Cmf
from ml_metadata.proto import metadata_store_pb2 as mlpb
cmf = Cmf(filepath="mlmd", pipeline_name="test_pipeline")
context: mlpb.proto.Context = cmf.create_context(
    pipeline_stage="prepare",
    custom_properties ={"user-metadata1": "metadata_value"}
)
execution: mlpb.proto.Execution = cmf.create_execution(
    execution_type="Prepare",
    custom_properties = {"split": split, "seed": seed}
)
artifact: mlpb.proto.Artifact = metawriter.log_dataset(
	"artifacts/data.xml.gz", "input",
	custom_properties={"user-metadata1": "metadata_value"}
)
cmf                          # CLI to manage metadata and artifacts
cmf init                     # Initialize artifact repository
cmf init show                # Show current CMF config
cmf metadata push            # Push metadata to server
cmf metadata pull            # Pull metadata from server

➡️ For the complete list of commands, please refer to the Command Reference


✅ Benefits

  • Full ML pipeline observability
  • Unified metadata, artifact, and code tracking
  • Scalable metadata syncing
  • Team collaboration on metadata

🎤 Talks & Publications


🌐 Related Projects


🤝 Community


📄 License

Licensed under the Apache 2.0 License


© Hewlett Packard Enterprise. Built for reproducibility in ML.

Releases

No releases published

Packages

No packages published

Languages

  • Python 75.7%
  • JavaScript 16.0%
  • Jupyter Notebook 5.0%
  • CSS 2.4%
  • Dockerfile 0.5%
  • HTML 0.2%
  • Shell 0.2%