Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
-
Updated
Sep 24, 2025 - Python
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
A High-Efficiency System of Large Language Model Based Search Agents
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
Official PyTorch implementation of the paper "Dataset Distillation via the Wasserstein Metric" (ICCV 2025).
Dynamic Attention Mask (DAM) generate adaptive sparse attention masks per layer and head for Transformer models, enabling long-context inference with lower compute and memory overhead without fine-tuning.
MOCA-Net: Novel neural architecture with sparse MoE, external memory, and budget-aware computation. Real Stanford SST-2 integration, O(L) complexity, 96.40% accuracy. Built for efficient sequence modeling.
QuantLab-8bit is a reproducible benchmark of 8-bit quantization on compact vision backbones. It includes FP32 baselines, PTQ (dynamic & static), QAT, ONNX exports, parity checks, ORT CPU latency, and visual diagnostics.
Code for paper "Automated Design for Hardware-aware Graph Neural Networks on Edge Devices"
Add a description, image, and links to the efficient-ai topic page so that developers can more easily learn about it.
To associate your repository with the efficient-ai topic, visit your repo's landing page and select "manage topics."