Skip to content

CMF library helps to collect and store information associated with ML pipelines. It tracks the lineages for artifacts and executions of distributed AI pipelines. It provides API's to record and query the metadata associated with ML pipelines. The framework adopts a data first approach and all artifacts recorded in the framework are versioned and…

License

Notifications You must be signed in to change notification settings

HewlettPackard/cmf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Common Metadata Framework (CMF)

Deploy Docs PyPI version Docs License

Common Metadata Framework (CMF) is a metadata tracking and versioning system for ML pipelines. It tracks code, data, and pipeline metricsβ€”offering Git-like metadata management across distributed environments.


πŸš€ Features

  • βœ… Track artifacts (datasets, models, metrics) using content-based hashes
  • βœ… Automatically logs code versions (Git) and data versions (DVC)
  • βœ… Push/pull metadata via CLI across distributed sites
  • βœ… REST API for direct server interaction
  • βœ… Implicit & explicit tracking of pipeline execution
  • βœ… Fine-grained or coarse-grained metric logging

πŸ“¦ Installation

Requirements

  • Linux/Ubuntu/Debian
  • Python >=3.9, <3.11
  • Git (latest)

Virtual Environment

Conda
conda create -n cmf python=3.10
conda activate cmf
Virtualenv
virtualenv --python=3.10 .cmf
source .cmf/bin/activate

Install CMF

Latest from GitHub
pip install git+https://github.com/HewlettPackard/cmf
Stable from PyPI
pip install cmflib

Server Setup

πŸ“– Follow the guide in docs/cmf_server/cmf-server.md


πŸ“˜ Documentation


🧠 How It Works

CMF tracks pipeline stages, inputs/outputs, metrics, and code. It supports decentralized execution across datacenters, edge, and cloud.

  • Artifacts are versioned using DVC (.dvc files).
  • Code is tracked with Git.
  • Metadata is logged to relational DB (e.g., SQLite, PostgreSQL)
  • Sync metadata with cmf metadata push and cmf metadata pull.

πŸ› Architecture

CMF is composed of:

  • cmflib - metadata library provides API to log/query metadata
  • cmf-client – CLI to sync metadata with server, push/pull artifacts to the user-specified repo, push/pull code from git.
  • cmf-server – REST API for metadata merge
  • Central Repositories – Git (code), DVC (artifacts), CMF (metadata)


πŸ”§ Sample Usage

from cmflib.cmf import Cmf
from ml_metadata.proto import metadata_store_pb2 as mlpb
cmf = Cmf(filepath="mlmd", pipeline_name="test_pipeline")
context: mlpb.proto.Context = cmf.create_context(
    pipeline_stage="prepare",
    custom_properties ={"user-metadata1": "metadata_value"}
)
execution: mlpb.proto.Execution = cmf.create_execution(
    execution_type="Prepare",
    custom_properties = {"split": split, "seed": seed}
)
artifact: mlpb.proto.Artifact = metawriter.log_dataset(
	"artifacts/data.xml.gz", "input",
	custom_properties={"user-metadata1": "metadata_value"}
)
cmf                          # CLI to manage metadata and artifacts
cmf init                     # Initialize artifact repository
cmf init show                # Show current CMF config
cmf metadata push            # Push metadata to server
cmf metadata pull            # Pull metadata from server

➑️ For the complete list of commands, please refer to the Command Reference


βœ… Benefits

  • Full ML pipeline observability
  • Unified metadata, artifact, and code tracking
  • Scalable metadata syncing
  • Team collaboration on metadata

🎀 Talks & Publications


🌐 Related Projects


🀝 Community


πŸ“„ License

Licensed under the Apache 2.0 License


Β© Hewlett Packard Enterprise. Built for reproducibility in ML.

About

CMF library helps to collect and store information associated with ML pipelines. It tracks the lineages for artifacts and executions of distributed AI pipelines. It provides API's to record and query the metadata associated with ML pipelines. The framework adopts a data first approach and all artifacts recorded in the framework are versioned and…

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 13