ShlokVFX

Follow

💭

cooking GPU's

Shlok_Limbhare ShlokVFX

💭

cooking GPU's

Follow

GPU dev

21 followers · 427 following

mumbai
20:44 (UTC +05:30)
in/shlok-limbhare-50180120b

Achievements

Achievements

Stars

ROCm / mori

Modular RDMA Interface

C++ 42 6 Updated Sep 30, 2025

BloopAI / vibe-kanban

Kanban board to manage your AI coding agents

Rust 5,294 508 Updated Sep 30, 2025

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,135 83 Updated Aug 28, 2025

perplexityai / pplx-kernels

Perplexity GPU Kernels

C++ 476 60 Updated Sep 19, 2025

harvard-edge / cs249r_book

Introduction to Machine Learning Systems

Python 2,494 280 Updated Sep 29, 2025

snowflakedb / ArcticInference

ArcticInference: vLLM plugin for high-throughput, low-latency inference

Python 267 36 Updated Sep 30, 2025

kagehq / gpu-kill

Manage your GPUs across NVIDIA, AMD, Intel, and Apple Silicon systems.

Rust 230 6 Updated Sep 28, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 3,823 522 Updated Sep 30, 2025

JannikSt / ibtop

Real-time terminal monitor for InfiniBand networks - htop for high-speed interconnects

Rust 42 1 Updated Sep 3, 2025

linux-rdma / rdma-core

RDMA core userspace libraries and daemons

C 1,967 783 Updated Sep 21, 2025

open-mpi / ompi

Open MPI main development repository

C 2,425 922 Updated Sep 29, 2025

SzymonOzog / GPU_Programming

Python 78 8 Updated Sep 21, 2025

ggml-org / llama.cpp

LLM inference in C/C++

C++ 87,096 13,192 Updated Sep 30, 2025

jverzani / CalculusWithJuliaNotes.jl

Notes for using Julia while learning calculus

Typst 164 31 Updated Aug 29, 2025

mitmath / matrixcalc

MIT IAP short course: Matrix Calculus for Machine Learning and Beyond

Jupyter Notebook 536 80 Updated Feb 3, 2025

mikexcohen / Calculus_book

Code to create the figures and solve the exercises in the textbook

Jupyter Notebook 36 7 Updated May 26, 2025

mikexcohen / calculusWithPython

Master calculus 1 using Python: derivatives and applications

Jupyter Notebook 73 52 Updated May 12, 2025

amd / HPCTrainingExamples

C++ 113 55 Updated Sep 27, 2025

kokkos / kokkos-remote-spaces

Distributed View Extension for Kokkos

C++ 48 20 Updated Dec 2, 2024

NVIDIA / multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 807 139 Updated Sep 26, 2025

1duo / nccl-examples

NCCL Examples from Official NVIDIA NCCL Developer Guide.

CMake 19 10 Updated May 29, 2018

keras-team / keras-hub

Pretrained model hub for Keras 3.

Python 936 299 Updated Sep 26, 2025

Mellanox / gpu_direct_rdma_access

example code for using DC QP for providing RDMA READ and WRITE operations to remote GPU memory

C 144 36 Updated Jul 30, 2024

muriloboratto / NCCL

Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, allGather, reduceScatter and sendRecv operations.

35 8 Updated Aug 28, 2023

baidu-research / baidu-allreduce

Cuda 597 114 Updated Apr 6, 2018

Francesco-Parisi / MPI_Allgather-Optimization

High Performance Computing Project 2021-2022.

C 2 Updated May 2, 2023

andrea-garritano / allreduce

Implementation of the allreduce algorithm using only MPI point to point communication routines (MPI_Send, MPI_Recv).

C++ 7 Updated Feb 26, 2018

microsoft / TE-CCL

Python 40 6 Updated Aug 27, 2024

microsoft / mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 418 68 Updated Sep 30, 2025

microsoft / msccl

Microsoft Collective Communication Library

C++ 360 31 Updated Sep 20, 2023