- Bay area
- http://idning.github.io/
Stars
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Implementation of papers in 100 lines of code.
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Instead of running one environment at a time or one per thread, run everything in batch using numpy on a single core.
Fully open reproduction of DeepSeek-R1
🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton
A PyTorch native library for large model training
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
FlagGems is an operator library for large language models implemented in Triton Language.
PyTorch native quantization and sparsity for training and inference
Development repository for the Triton language and compiler
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
A very simple shared memory dict implementation
Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
Seamless operability between C++11 and Python
Implementation of Denoising Diffusion Probabilistic Model in Pytorch
Denoising Diffusion Probabilistic Models
A minimal PyTorch implementation of probabilistic diffusion models for 2D datasets.
An open source implementation of CLIP.
PyTorch Implementation of OpenAI's Image GPT
Large Language Model-enhanced Recommender System Papers
An unnecessarily tiny implementation of GPT-2 in NumPy.
Code and model for the paper "Improving Language Understanding by Generative Pre-Training"