Stars
一些经典的CTR算法的复现; LR, FM, FFM, AFM, DeepFM, xDeepFM, PNN, DCN, DCNv2, DIFM, AutoInt, FiBiNet,AFN,ONN,DIN, DIEN ... (pytorch, tf2.0)
A high-throughput and memory-efficient inference and serving engine for LLMs
Learn GPU Programming in Mojo🔥 by Solving Puzzles
JAX-like Neural Network Training Library in Python with CPU/GPU Acceleration via Mojo and MAX
Examples for Recommenders - easy to train and deploy on accelerated infrastructure.
Backward compatible ML compute opset inspired by HLO/MHLO
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
Flax is a neural network library for JAX that is designed for flexibility.
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
Fast ML inference & training for ONNX models in Rust
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
SGLang is a fast serving framework for large language models and vision language models.
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
Intel® Extension for TensorFlow*
A personal experimental C++ Syntax 2 -> Syntax 1 compiler
A retargetable MLIR-based machine learning compiler and runtime toolkit.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Enabling PyTorch on XLA Devices (e.g. Google TPU)
Development repository for the Triton language and compiler
A list of awesome compiler projects and papers for tensor computation and deep learning.
A machine learning compiler for GPUs, CPUs, and ML accelerators
An implementation of a deep learning recommendation model (DLRM)
Zerocopy makes zero-cost memory manipulation effortless. We write `unsafe` so you don’t have to.