Skip to content
View ShlokVFX's full-sized avatar
💭
cooking GPU's
💭
cooking GPU's

Block or report ShlokVFX

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Modular RDMA Interface

C++ 42 6 Updated Sep 30, 2025

Kanban board to manage your AI coding agents

Rust 5,294 508 Updated Sep 30, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,135 83 Updated Aug 28, 2025

Perplexity GPU Kernels

C++ 476 60 Updated Sep 19, 2025

Introduction to Machine Learning Systems

Python 2,494 280 Updated Sep 29, 2025

ArcticInference: vLLM plugin for high-throughput, low-latency inference

Python 267 36 Updated Sep 30, 2025

Manage your GPUs across NVIDIA, AMD, Intel, and Apple Silicon systems.

Rust 230 6 Updated Sep 28, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 3,823 522 Updated Sep 30, 2025

Real-time terminal monitor for InfiniBand networks - htop for high-speed interconnects

Rust 42 1 Updated Sep 3, 2025

RDMA core userspace libraries and daemons

C 1,967 783 Updated Sep 21, 2025

Open MPI main development repository

C 2,425 922 Updated Sep 29, 2025
Python 78 8 Updated Sep 21, 2025

LLM inference in C/C++

C++ 87,096 13,192 Updated Sep 30, 2025

Notes for using Julia while learning calculus

Typst 164 31 Updated Aug 29, 2025

MIT IAP short course: Matrix Calculus for Machine Learning and Beyond

Jupyter Notebook 536 80 Updated Feb 3, 2025

Code to create the figures and solve the exercises in the textbook

Jupyter Notebook 36 7 Updated May 26, 2025

Master calculus 1 using Python: derivatives and applications

Jupyter Notebook 73 52 Updated May 12, 2025

Distributed View Extension for Kokkos

C++ 48 20 Updated Dec 2, 2024

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 807 139 Updated Sep 26, 2025

NCCL Examples from Official NVIDIA NCCL Developer Guide.

CMake 19 10 Updated May 29, 2018

Pretrained model hub for Keras 3.

Python 936 299 Updated Sep 26, 2025

example code for using DC QP for providing RDMA READ and WRITE operations to remote GPU memory

C 144 36 Updated Jul 30, 2024

Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, allGather, reduceScatter and sendRecv operations.

35 8 Updated Aug 28, 2023

High Performance Computing Project 2021-2022.

C 2 Updated May 2, 2023

Implementation of the allreduce algorithm using only MPI point to point communication routines (MPI_Send, MPI_Recv).

C++ 7 Updated Feb 26, 2018
Python 40 6 Updated Aug 27, 2024

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 418 68 Updated Sep 30, 2025

Microsoft Collective Communication Library

C++ 360 31 Updated Sep 20, 2023
Next