-
Neural Magic
Stars
neuralmagic / nm-vllm
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Sparsity-aware deep learning inference runtime for CPUs
Refine high-quality datasets and visual AI models
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
ML model optimization product to accelerate inference.
Top-level directory for documentation and general content
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes