diff --git a/CNAME b/CNAME
index c101f6da020d..583993f7b85f 100644
--- a/CNAME
+++ b/CNAME
@@ -1 +1 @@
-pytorch.org
\ No newline at end of file
+docs.pytorch.org
diff --git a/_community_stories/57.md b/_community_stories/57.md
new file mode 100644
index 000000000000..7e717dfd000b
--- /dev/null
+++ b/_community_stories/57.md
@@ -0,0 +1,8 @@
+---
+title: 'How IBM Research Uses PyTorch and TerraTorch to Make Geospatial Computer Vision Accessible for Everyone'
+ext_url: /blog/how-ibm-uses-pt-terratorch/
+date: May 1, 2025
+tags: ["Computer Vision"]
+---
+
+Geospatial computer vision is essential for understanding our planet — from monitoring deforestation to tracking urban development and analyzing the impacts of climate change. However, the coding and deep learning skills for applying AI models to satellite imagery and earth observation data has traditionally been a major barrier for many practitioners.
diff --git a/_events/docathon-2025.md b/_events/docathon-2025.md
new file mode 100644
index 000000000000..88bc55a52724
--- /dev/null
+++ b/_events/docathon-2025.md
@@ -0,0 +1,16 @@
+---
+category: event
+title: "Docathon 2025"
+date: Jun 3, 2025
+---
+
+**Date**: June 3-18, 2025
+**Location**: Online
+
+
+
+
+
+The PyTorch Docathon 2025, akin to a hackathon, is an event dedicated to enhancing the quality of the PyTorch documentation with the invaluable assistance of our community. This is an inclusive event designed to be accessible to all levels of expertise, from newcomers to experienced ML/PyTorch users. It offers a rewarding experience as participants can see the direct impact of their contributions on the project's usability and accessibility. The Docathon promotes a collaborative environment, allowing participants to work with other contributors and PyTorch maintainers, fostering the exchange of ideas and networking. It also provides a rich learning experience, offering the opportunity to explore PyTorch modules, update docstrings, and test tutorials.
+
+[RSVP Now](https://community.linuxfoundation.org/events/details/lfhq-pytorch-foundation-presents-pytorch-docathon-june-3rd-18th-2025/)
\ No newline at end of file
diff --git a/_get_started/installation/linux.md b/_get_started/installation/linux.md
index 35deaa0cde02..7461e2dfcd26 100644
--- a/_get_started/installation/linux.md
+++ b/_get_started/installation/linux.md
@@ -40,26 +40,10 @@ If you decide to use APT, you can run the following command to install it:
sudo apt install python
```
-> If you use [Anaconda](#anaconda) to install PyTorch, it will install a sandboxed version of Python that will be used for running PyTorch applications.
-
### Package Manager
{: #linux-package-manager}
-To install the PyTorch binaries, you will need to use one of two supported package managers: [Anaconda](https://www.anaconda.com/download/#linux) or [pip](https://pypi.org/project/pip/). Anaconda is the recommended package manager as it will provide you all of the PyTorch dependencies in one, sandboxed install, including Python.
-
-#### Anaconda
-
-To install Anaconda, you will use the [command-line installer](https://www.anaconda.com/download/#linux). Right-click on the 64-bit installer link, select `Copy Link Location`, and then use the following commands:
-
-```bash
-# The version of Anaconda may be different depending on when you are installing`
-curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
-sh Miniconda3-latest-Linux-x86_64.sh
-# and follow the prompts. The defaults are generally good.`
-```
-
-> You may have to open a new terminal or re-source your `~/.bashrc `to get access to the `conda` command.
-
+To install the PyTorch binaries, you will need to use the supported package manager: [pip](https://pypi.org/project/pip/).
#### pip
*Python 3*
@@ -75,24 +59,6 @@ sudo apt install python3-pip
## Installation
{: #linux-installation}
-### Anaconda
-{: #linux-anaconda}
-
-#### No CUDA/ROCm
-
-To install PyTorch via Anaconda, and do not have a [CUDA-capable](https://developer.nvidia.com/cuda-zone) or [ROCm-capable](https://rocm.docs.amd.com/) system or do not require CUDA/ROCm (i.e. GPU support), in the above selector, choose OS: Linux, Package: Conda, Language: Python and Compute Platform: CPU.
-Then, run the command that is presented to you.
-
-#### With CUDA
-
-To install PyTorch via Anaconda, and you do have a [CUDA-capable](https://developer.nvidia.com/cuda-zone) system, in the above selector, choose OS: Linux, Package: Conda and the CUDA version suited to your machine. Often, the latest CUDA version is better.
-Then, run the command that is presented to you.
-
-#### With ROCm
-
-PyTorch via Anaconda is not supported on ROCm currently. Please use pip instead.
-
-
### pip
{: #linux-pip}
@@ -148,7 +114,7 @@ For the majority of PyTorch users, installing from a pre-built binary via a pack
### Prerequisites
{: #linux-prerequisites-2}
-1. Install [Anaconda](#anaconda) or [Pip](#pip)
+1. Install [Pip](#pip)
2. If you need to build PyTorch with GPU support
a. for NVIDIA GPUs, install [CUDA](https://developer.nvidia.com/cuda-downloads), if your machine has a [CUDA-enabled GPU](https://developer.nvidia.com/cuda-gpus).
b. for AMD GPUs, install [ROCm](https://rocm.docs.amd.com/), if your machine has a [ROCm-enabled GPU](https://rocm.docs.amd.com/)
diff --git a/_get_started/installation/mac.md b/_get_started/installation/mac.md
index 5294b865a38c..d3803a1ee278 100644
--- a/_get_started/installation/mac.md
+++ b/_get_started/installation/mac.md
@@ -1,7 +1,7 @@
# Installing on macOS
{:.no_toc}
-PyTorch can be installed and used on macOS. Depending on your system and GPU capabilities, your experience with PyTorch on a Mac may vary in terms of processing time.
+PyTorch can be installed and used on macOS. Depending on your system and GPU capabilities, your experience with PyTorch on macOS may vary in terms of processing time.
## Prerequisites
{: #mac-prerequisites}
@@ -14,24 +14,13 @@ PyTorch is supported on macOS 10.15 (Catalina) or above.
{: #mac-python}
It is recommended that you use Python 3.9 - 3.12.
-You can install Python either through the Anaconda
-package manager (see [below](#anaconda)), [Homebrew](https://brew.sh/), or
+You can install Python either through [Homebrew](https://brew.sh/) or
the [Python website](https://www.python.org/downloads/mac-osx/).
### Package Manager
{: #mac-package-manager}
-To install the PyTorch binaries, you will need to use one of two supported package managers: [pip](https://pypi.org/project/pip/) or [Anaconda](https://www.anaconda.com/download/#macos).
-#### Anaconda
-
-To install Anaconda, you can [download graphical installer](https://www.anaconda.com/download/#macos) or use the command-line installer. If you use the command-line installer, you can right-click on the installer link, select `Copy Link Address`, or use the following commands on Mac computer with Apple silicon:
-
-```bash
-# The version of Anaconda may be different depending on when you are installing`
-curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh
-sh Miniconda3-latest-MacOSX-arm64.sh
-# and follow the prompts. The defaults are generally good.`
-```
+To install the PyTorch binaries, you will need to use the supported package manager: [pip](https://pypi.org/project/pip/).
#### pip
*Python 3*
@@ -43,19 +32,10 @@ If you installed Python via Homebrew or the Python website, `pip` was installed
## Installation
{: #mac-installation}
-### Anaconda
-{: #mac-anaconda}
-
-To install PyTorch via Anaconda, use the following conda command:
-
-```bash
-conda install pytorch torchvision -c pytorch
-```
-
### pip
{: #mac-pip}
-To install PyTorch via pip, use one of the following two commands, depending on your Python version:
+To install PyTorch via pip, use the following command, depending on your Python version:
```bash
# Python 3.x
@@ -91,7 +71,7 @@ For the majority of PyTorch users, installing from a pre-built binary via a pack
### Prerequisites
{: #mac-prerequisites-2}
-1. [Optional] Install [Anaconda](#anaconda)
+1. [Optional] Install [pip](https://pypi.org/project/pip/)
2. Follow the steps described here: [https://github.com/pytorch/pytorch#from-source](https://github.com/pytorch/pytorch#from-source)
You can verify the installation as described [above](#mac-verification).
diff --git a/_get_started/installation/windows.md b/_get_started/installation/windows.md
index 2d0ec5c041f5..8000e9cddbc6 100644
--- a/_get_started/installation/windows.md
+++ b/_get_started/installation/windows.md
@@ -24,9 +24,6 @@ As it is not installed by default on Windows, there are multiple ways to install
* [Chocolatey](https://chocolatey.org/)
* [Python website](https://www.python.org/downloads/windows/)
-* [Anaconda](#anaconda)
-
-> If you use Anaconda to install PyTorch, it will install a sandboxed version of Python that will be used for running PyTorch applications.
> If you decide to use Chocolatey, and haven't installed Chocolatey yet, ensure that you are running your command prompt as an administrator.
@@ -39,12 +36,7 @@ choco install python
### Package Manager
{: #windows-package-manager}
-To install the PyTorch binaries, you will need to use at least one of two supported package managers: [Anaconda](https://www.anaconda.com/download/#windows) and [pip](https://pypi.org/project/pip/). Anaconda is the recommended package manager as it will provide you all of the PyTorch dependencies in one, sandboxed install, including Python and `pip.`
-
-#### Anaconda
-
-To install Anaconda, you will use the [64-bit graphical installer](https://www.anaconda.com/download/#windows) for PyTorch 3.x. Click on the installer link and select `Run`. Anaconda will download and the installer prompt will be presented to you. The default options are generally sane.
-
+To install the PyTorch binaries, you will need to use the supported package manager: [pip](https://pypi.org/project/pip/).
#### pip
If you installed Python by any of the recommended ways [above](#windows-python), [pip](https://pypi.org/project/pip/) will have already been installed for you.
@@ -52,22 +44,6 @@ If you installed Python by any of the recommended ways [above](#windows-python),
## Installation
{: #windows-installation}
-### Anaconda
-{: #windows-anaconda}
-
-To install PyTorch with Anaconda, you will need to open an Anaconda prompt via `Start | Anaconda3 | Anaconda Prompt`.
-
-#### No CUDA
-
-To install PyTorch via Anaconda, and do not have a [CUDA-capable](https://developer.nvidia.com/cuda-zone) system or do not require CUDA, in the above selector, choose OS: Windows, Package: Conda and CUDA: None.
-Then, run the command that is presented to you.
-
-#### With CUDA
-
-To install PyTorch via Anaconda, and you do have a [CUDA-capable](https://developer.nvidia.com/cuda-zone) system, in the above selector, choose OS: Windows, Package: Conda and the CUDA version suited to your machine. Often, the latest CUDA version is better.
-Then, run the command that is presented to you.
-
-
### pip
{: #windows-pip}
@@ -126,7 +102,7 @@ For the majority of PyTorch users, installing from a pre-built binary via a pack
### Prerequisites
{: #windows-prerequisites-2}
-1. Install [Anaconda](#anaconda)
+1. Install [pip](https://pypi.org/project/pip/)
2. Install [CUDA](https://developer.nvidia.com/cuda-downloads), if your machine has a [CUDA-enabled GPU](https://developer.nvidia.com/cuda-gpus).
3. If you want to build on Windows, Visual Studio with MSVC toolset, and NVTX are also needed. The exact requirements of those dependencies could be found out [here](https://github.com/pytorch/pytorch#from-source).
4. Follow the steps described here: [https://github.com/pytorch/pytorch#from-source](https://github.com/pytorch/pytorch#from-source)
diff --git a/_includes/quick_start_local.html b/_includes/quick_start_local.html
index b542a56821ad..81bd69fbf1d4 100644
--- a/_includes/quick_start_local.html
+++ b/_includes/quick_start_local.html
@@ -58,16 +58,13 @@
Package
-
-
Conda
-
-
+
Pip
-
+
LibTorch
-
+
Source
@@ -107,7 +104,7 @@
Run this Command:
-
conda install pytorch torchvision -c pytorch
+
pip install torch torchvision
diff --git a/_posts/2025-04-25-pytorch-2-7-intel-gpus.md b/_posts/2025-04-25-pytorch-2-7-intel-gpus.md
index 945dee36e224..7643d20ae51b 100644
--- a/_posts/2025-04-25-pytorch-2-7-intel-gpus.md
+++ b/_posts/2025-04-25-pytorch-2-7-intel-gpus.md
@@ -48,7 +48,7 @@ Graph model (torch.compile) is enabled in Windows 11 for the first time across I
-* Optimize the performance of PyTorch 2 Export Post Training Quantization (PT2E) on Intel GPU to provide full graph mode quantization pipelines with enhanced computational efficiency. Refer to[ PT2E tutorial](https://pytorch.org/tutorials/prototype/inductor_windows.html) for details.
+* Optimize the performance of PyTorch 2 Export Post Training Quantization (PT2E) on Intel GPU to provide full graph mode quantization pipelines with enhanced computational efficiency. Refer to [PT2E tutorial](https://pytorch.org/tutorials/prototype/pt2e_quant_xpu_inductor.html) for details.
* Enable AOTInductor and torch.export on Linux to simplify deployment workflows. Refer to[ AOTInductor tutorial](https://pytorch.org/docs/main/torch.compiler_aot_inductor.html) for details.
* Enable profiler on both Windows and Linux to facilitate model performance analysis. Refer to the[ PyTorch profiler tutorial](https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html#pytorch-profiler) for details.
diff --git a/_posts/2025-04-28-accelerating-training-float8-rowwise-crusoe.md b/_posts/2025-04-28-accelerating-training-float8-rowwise-crusoe.md
index 8c0ad7057ce4..245688c07605 100644
--- a/_posts/2025-04-28-accelerating-training-float8-rowwise-crusoe.md
+++ b/_posts/2025-04-28-accelerating-training-float8-rowwise-crusoe.md
@@ -4,7 +4,7 @@ title: "Accelerating Large Scale Training and Convergence with PyTorch Float8 Ro
author: Meta and Crusoe
---
-**Meta**: Less Wright, Hamid Shojanazeri, Vasiliy Kuznetsov, Daniel Vega-Myhre, Gokul Nadathur, Will Constable, Tianyu Liu, Tristan Rice, Driss Guessous, Josh Fromm, Luca Wehrstedt, Jiecao Yu, Sandeep Parab
+**Meta**: Less Wright, Hamid Shojanazeri, Vasiliy Kuznetsov, Daniel Vega-Myhre, Gokul Nadathur, Will Constable, Tianyu Liu, Tristan Rice, Driss Guessous, Josh Fromm, Luca Wehrstedt, Jiecao Yu
**Crusoe**: Ethan Petersen, Martin Cala, Chip Smith
Working with [Crusoe.AI](http://Crusoe.AI) we were provided access to one of their new 2K H200 clusters in Iceland, which enabled us to showcase training accelerations of 34 - 43% at scale by leveraging TorchTitan’s HSDP2 and TorchAO’s new float8 rowwise, with comparable convergence and stability vs BF16.
diff --git a/_posts/2025-04-30-6x-faster-async-checkpointing.md b/_posts/2025-04-30-6x-faster-async-checkpointing.md
new file mode 100644
index 000000000000..12a2f9e1b1de
--- /dev/null
+++ b/_posts/2025-04-30-6x-faster-async-checkpointing.md
@@ -0,0 +1,108 @@
+---
+layout: blog_detail
+title: "6x faster Async Checkpointing in PyTorch, using Cached Plans, no GIL contention"
+author: Meta and Crusoe
+---
+
+**Meta**: Less Wright, Meet Vadakkanchery, Saurabh Mishra, Ela Krepska, Hamid Shojanazeri, Pradeep Fernando
+**Crusoe**: Ethan Petersen, Martin Cala, Chip Smith
+
+PyTorch DCP (Distributed Checkpointing) has recently enabled new optimizations in asynchronous checkpointing to reduce GPU utilization drop by minimizing collective overhead and improving overall checkpointing efficiency.
+
+Using Crusoe’s 2K H200 cluster, with TorchTitan and training a Llama3-70B, we were able to verify these new features deliver substantial speedups at 1856 GPU scale, reducing the background processing time for async DCP checkpoints from ~436 seconds to ~67 seconds.
+
+This is roughly a 6.5x reduction in background checkpoint processing time, enabling even more total training time to proceed at full training throughput.
+
+{:style="width:100%"}
+
+
+*Fig 1: 1856 training run with high frequency checkpointing. The first checkpoint (drop down in tps) does not have a cached save plan, and the background processing takes far longer than the rest where the cached plan is used.*
+
+
+## Background: What is Asynchronous Checkpointing?
+
+In a standard checkpointing workflow, GPUs are blocked while the checkpointing data is offloaded from GPU to CPU and then written to storage. After the save to physical media is complete, training can resume.
+
+Asynchronous checkpointing greatly reduces this downtime by enabling the actual saving to storage to be done via CPU threads, allowing GPU-based training to continue while the checkpoint data is being persisted in parallel. It is used primarily for intermediate/fault tolerant checkpoints as it unblocks the GPUs much faster compared to the synchronous checkpoints. \
+For example, in our large-scale experiment, GPU training was blocked for less than a second (.78 seconds at 1856 scale) while checkpoint data was moved from GPU to CPU (staging). At that point, GPU training immediately continues, which is a substantial training time improvement over traditional checkpointing. For reference, Async Checkpointing is covered in more detail [here](https://pytorch.org/blog/reducing-checkpointing-times/).
+
+
+## Challenges with Asynchronous Checkpointing
+
+However, the background processing inherent in Asynchronous Checkpointing has additional challenges that result in a temporary reduction of training throughput while the storage phase is being completed. These are highlighted below.
+
+
+### GPU utilization drop from GIL contention:
+
+The Global Interpreter Lock (GIL) in Python is a mechanism that prevents multiple native threads from executing Python bytecode at the same time. This lock is necessary mainly because CPython's memory management is not thread-safe.
+
+DCP currently uses background threads for metadata collectives and uploading to storage. Although these expensive steps are done asynchronously, it leads to contention for the GIL with the trainer threads. This causes the GPU utilization (QPS) to suffer significantly and also increases the e2e upload latency. For large-scale checkpoints, the overhead of the CPU parallel processing has a suppressive effect on net GPU training speed since CPUs also drive the training process via GPU kernel launches.
+
+Please refer to the following figure from our experiments:
+
+{:style="width:100%"}
+
+
+*Fig 2: One can see a sustained drop in training QPS even after staging (i.e. blocking operation to trainer) is complete.*
+
+The first dip in Figure 2 (marked by the purple line) indicates that staging is complete, and training can continue. However, a second drop is evident (marked by the area between the purple and yellow lines) which is due to trainer thread and checkpointing threads contending for the Python GIL, leading to degraded training QPS until the checkpoint thread completes execution.
+
+
+### Collective communications cost:
+
+DCP performs multiple collectives today for various reasons: dedupe, global metadata for the checkpoint, resharding, and distributed exception handling. Collectives are costly as these require network I/O and pickling/unpickling of the large metadata being sent across the GPU network. These collectives become extremely expensive as the job scale grows, leading to significantly higher e2e latency and potential for collective timeouts.
+
+
+## Solutions
+
+
+### Process based async checkpointing
+
+DCP now supports async checkpoint save via a background process. This helps avoid the training QPS drop by eliminating the python GIL contention with the trainer threads. Please see Fig 2 for checkpointing via threads and Fig 3 for checkpointing via background process.
+
+
+### Caching of the save plans
+
+DCP has a clear boundary between the planning and storage I/O steps. SavePlanner in DCP is a stateful component which acts as an access proxy to the state_dict. Planner manages save plans prepared by individual ranks, which carry metadata information necessary to do the write I/O. The planning step involves a collective operation to gather a comprehensive view of the checkpoint on the coordinator rank. The coordinator rank is responsible for de-duplicating parameters/weights to eliminate redundancies, validating the global plan to ensure accuracy and consistency, and creating the global metadata structs. This is followed by a scatter collective where the coordinator rank assigns I/O tasks to each rank. Any transformations done on the plans affect how the storage components finally write the data.
+
+During the course of a training job, multiple checkpoints are saved. In the majority of these cases, only the checkpoint data changes between different save instances, and thus, the plan remains the same. This presented an opportunity for us to cache the plans, pay the planning cost only on the first save, and then amortize that cost across all the subsequent attempts. Only the updated plans (plans which changed in the next attempt) are sent via collective, thus reducing the collective overhead significantly.
+
+
+## Experiment Results
+
+**Set up:** 1856 H200 GPUs, Llama3-70B, HSDP2 with TorchTitan
+
+After deploying both the solutions above, the following are the key results:
+
+* TPS drop has significantly narrowed, with a peak dip to 372 vs 315 tps, and for a greatly reduced time window (~67 seconds vs ~437 seconds). This time window is now mostly attributed to the blocking for CPU processing.
+* Subsequent checkpoint save attempts also continue to be much faster due to very low overhead at the planning stage. E2E latency is thus improved by over 6.5x. This will allow our partners to increase the checkpointing frequency and reduce the lost training progress (i.e. wasted training time).
+
+If you look at the very first downspike in Figure 1, this drawdown in GPU processing time takes training throughput from 700 down to 320 tps, and suppresses it for roughly 7 minutes (467 seconds). Once the CPUs have finished processing, training continues again at full speed.
+
+Previously, this ~7 minute suppression would be repeated at *every* checkpoint. However, with the new process-based checkpointing feature, only the first checkpoint has the full drawdown time (mainly due to overhead from daemon process initialization), as all future checkpoints are executed via the background process, mitigating GIL contention with the trainer threads.
+
+This is visually shown in all the subsequent checkpoints where the average MFU suppression time drops to just over a minute, reflected by the sharp spikes that almost immediately revert to full MFU throughput.
+
+
+{:style="width:100%"}
+
+
+*Fig 3: The red box shows the non-cached plan checkpoint, which also includes Checkpoint Background Init process overhead, while the purple box highlights the first checkpoint to run with the cached plan.*
+
+This means that even large-scale checkpointing, such as shown in Fig 2 at 1856 GPU scale, can be done with ~6x reduced training throughput impact. This enables Asynchronous DCP checkpointing to be run more frequently (thus better rollback protection) while enhancing total training throughput relative to previous Async Checkpointing overhead.
+
+**Using DCP’s cached checkpointing:**
+
+This feature is already available as part of the PyTorch nightly builds, and you can test out PyTorch’s Asynchronous DCP checkpointing directly in TorchTitan. Following are the instructions to enable these features:
+
+* Process-based asynchronous checkpointing:
+ * Set the **async_checkpointer_type** to AsyncCheckpointerType.PROCESS in the [async_save](https://github.com/pytorch/pytorch/blob/main/torch/distributed/checkpoint/state_dict_saver.py#L193) API. (*file*: pytorch/torch/distributed/checkpoint/state_dict_saver.py)
+* Save plan caching:
+ * Set the **enable_plan_caching** flag to true in the [DefaultSavePlanner](https://github.com/pytorch/pytorch/blob/main/torch/distributed/checkpoint/default_planner.py#L78C9-L78C28). (*file*: pytorch/torch/distributed/checkpoint/default_planner.py)
+
+
+## Future work
+
+DCP will be rolling out additional optimizations to further improve the checkpointing cost. Currently even though the save plans are cached, coordinator rank still prepares the metadata. For larger jobs and models with many tensors, this overhead is non-trivial. In the next iteration, DCP will eliminate the metadata overhead and improve the e2e latency further. DCP will also introduce additional optimizations, such as zero-overhead checkpointing, to enable efficient checkpointing in large-scale jobs.
+
+Stay tuned!
diff --git a/_posts/2025-04-30-flexattention-for-inference.md b/_posts/2025-04-30-flexattention-for-inference.md
new file mode 100644
index 000000000000..587aedf2158a
--- /dev/null
+++ b/_posts/2025-04-30-flexattention-for-inference.md
@@ -0,0 +1,380 @@
+---
+layout: blog_detail
+title: "FlexAttention Part II: FlexAttention for Inference"
+author: Joy Dong, Boyuan Feng, Driss Guessous, Joel Schlosser, Yanbo Liang, Horace He
+---
+
+## Overview
+
+In PyTorch 2.5.0 release, we introduced [FlexAttention](https://pytorch.org/blog/flexattention/) `torch.nn.attention.flex_attention` for ML researchers who’d like to customize their attention kernels without writing kernel code. This blog introduces our decoding backend optimized for inference, supporting GQA and PagedAttention, along with feature updates including nested jagged tensor support, performance tuning guides and trainable biases support.
+
+If you’re looking for an easy way to play around with FlexAttention in your post-training / inference pipeline, PyTorch native post-training library [torchtune](https://github.com/pytorch/torchtune) and inference codebase [gpt-fast](https://github.com/pytorch-labs/gpt-fast) already have FlexAttention integrated. Try it out!
+
+We are excited to share that our paper on FlexAttention has been accepted for presentation at the MLSys2025 Conference held from May 12-15th in Santa Clara, California.
+
+Title: **FlexAttention: A Programming Model for Generating Optimized Attention Kernels.** [Poster](https://mlsys.org/virtual/2025/poster/3007)
+
+
+## FlexAttention for Inference
+
+TL;DR: `torch.compile` lowers `flex_attention` to a fused [FlashDecoding](https://pytorch.org/blog/flash-decoding/) kernel when it runs on a very short query.
+
+One fused attention kernel does not suit all – especially in long-context LLM inference.
+
+The decoding phase of LLM inference is an iterative process: tokens are generated one at a time, requiring `N` forward passes to generate an `N`-token sentence. Fortunately, each iteration doesn’t need to recompute self-attention over the full sentence — previously calculated tokens are cached, therefore we only need to attend the newly generated token to the cached context.
+
+
+{:style="width:100%"}
+
+
+This results in a unique attention pattern where a short query sequence (1 token) attends to a long key-value cache (context length up to 128k). Traditional optimizations for square attention kernels (`q_len ≈ kv_len`) don’t directly apply here. This pattern poses new challenges for GPU memory utilization and occupancy. We build a dedicated FlexDecoding backend optimized for long-context LLM inference incorporating decoding-specific techniques from [FlashDecoding](https://pytorch.org/blog/flash-decoding/).
+
+FlexDecoding is implemented as an alternative backend for the `torch.nn.attention.flex_attention `operator. `flex_attention` automatically switches to the FlexDecoding backend for its JIT compilation when given a short query and a long KV cache. If the input shape changes significantly, for example transitioning from the prefill phase to decoding, JIT recompilation generates a separate kernel for each scenario.
+
+```
+flex_attention = torch.compile(flex_attention)
+
+k_cache = torch.random(B, H, 16384, D)
+v_cache = torch.random(B, H, 16384, D)
+
+...
+
+# Prefill Phase: query shape = [B, H, 8000, D]
+flex_attention(q_prefill, k_cache, v_cache, ...) # Uses FlexAttention backend optimized for prefill & training
+
+# Decoding Phase: q_last_token shape = [B, H, 1, D]
+flex_attention(q_last_token , k_cache, v_cache, ...) # Recompiles with the FlexDecoding backend
+
+# decode 2 tokens at the same time: q_last_2_tokens shape = [B, H, 2, D]
+flex_attention(q_last_2_tokens, k_cache, v_cache, ...) # No recompilation needed! Runs the decoding kernel again.
+```
+
+
+## Working with KV Cache
+
+One of the key optimizations for efficient inference is maintaining a preallocated KV cache that updates **in place** as new tokens are generated. Instead of enforcing a specific KV cache policy with a dedicated API, FlexDecoding allows users to define and manage the KV cache themselves.
+
+Similar to FlexAttention, FlexDecoding takes user-defined `mask_mod` and `score_mod` functions. These functions modify attention scores before the softmax operation.
+
+{:style="width:100%"}
+
+```
+score_mod(score, b, h, q_idx, kv_idx) -> tensor # return updated score
+```
+
+Score is a scalar pytorch tensor that represents the dot product of a query token and a key token. The rest of the arguments specify which score is being computed:
+
+
+
+* `b` batch index
+* `h` attention head index
+* `q_idx` token position in query tensor
+* `kv_idx` token position in key/value tensor
+
+In the decoding phase, previously calculated tokens are cached, and only the latest generated token (i-th) is used as the query. A naive causal mask on this one token query looks like this:
+
+```
+def causal(score, b, h, q_idx, kv_idx):
+ return torch.where(q_idx >= kv_idx, score, -float("inf"))
+```
+
+
+{:style="width:100%"}
+
+
+This is problematic: the new token “*saw*” should attend to all previously generated tokens i.e. “*The cat sat on the mat and saw*”, not just the first entry in the kv cache. To correct this, the `score_mod` needs to **offset q_idx** **by i **for accurate decoding.
+
+
+{:style="width:100%"}
+
+
+Creating a new `score_mod` for each token to accommodate the offset is slow since it means FlexAttention needs to be recompiled every iteration for a different `score_mod`. Instead,
+
+We define this `offset` as a tensor and increment its value at each iteration:
+
+```
+offset = torch.tensor(i, "cuda")
+def causal_w_offset(score, b, h, q_idx, kv_idx):
+ return torch.where(q_idx + offset >= kv_idx, score, -float("inf"))
+
+# Attend the i-th token
+flex_attention(..., score_mod=causal_w_offset ) # Compiles the kernel here
+...
+# Attend the i+1-th token
+offset = offset + 1 # Increment offset
+flex_attention(..., score_mod=causal_w_offset ) # Doesn't need to recompile!
+```
+
+Notably, here `offset` becomes a captured tensor and it does not need to recompile if `offset` changes values.
+
+Manually rewriting your `score_mod` and `mask_mod` for offset handling isn't necessary. We can automate this process with a generic rewriter:
+
+```
+offset = torch.tensor(i, "cuda")
+
+def get_score_mod_w_offset(score_mod: _score_mod_signature, _offset: tensor):
+ def _score_mod(score, b, h, q, kv):
+ return score_mod(score, b, h, q + _offset, kv)
+ return _score_mod
+
+def get_mask_mod_w_offset(mask_mod: _mask_mod_signature, _offset: tensor):
+ def _mask_mod(b, h, q, kv):
+ return mask_mod(b, h, q + _offset, kv)
+ return _mask_mod
+
+causal_w_offset = get_score_mod_w_offset(causal, offset)
+```
+
+## BlockMask for Inference
+
+We can also use BlockMask with inference to leverage mask sparsity. The idea is to precompute the BlockMask once during model setup and use slices of it during decoding
+
+
+### Precomputing BlockMask
+
+During setup, we create a squared BlockMask for `MAX_SEQ_LEN x MAX_SEQ_LEN`:
+
+```
+from torch.nn.attention.flex_attention import create_block_mask
+
+def causal_mask(b, h, q_idx, kv_idx):
+ return q_idx >= kv_idx
+
+block_mask = create_block_mask(causal_mask, B=None, H=None, Q_LEN=MAX_SEQ_LEN,KV_LEN=MAX_SEQ_LEN)
+```
+
+{:style="width:100%"}
+
+
+### Using BlockMask During Decoding
+
+For the i-th token, we use a slice of the mask:
+
+```
+block_offset = i // block_mask.BLOCK_SIZE[0]
+block_mask_slice = block_mask[:, :, block_offset]
+
+# don't forget to use the mask_mod with offset!
+block_mask_slice.mask_mod = get_mask_mod_w_offset(causal_mask)
+```
+
+{:style="width:100%"}
+
+
+## Performance
+
+
+{:style="width:100%"}
+
+FlexDecoding kernel performs on par with FlashDecoding (FAKV) and significantly outperforms pytorch scaled_dot_product_attention ([code](https://github.com/pytorch/pytorch/blob/main/benchmarks/transformer/score_mod.py)).
+
+
+{:style="width:100%"}
+
+FlexDecoding boosts LLaMa3.1-8B serving performance by 1.22x-2.04x, and LLaMa3.1-70B performance by 0.99x - 1.66x compared to SDPA in gpt-fast. ([code](https://github.com/pytorch-labs/gpt-fast))
+
+
+## Paged Attention
+
+[vLLM](https://blog.vllm.ai/2023/06/20/vllm.html) is one of the popular LLM serving engines, powered by the efficient memory management from PagedAttention. Existing [PagedAttention](https://github.com/vllm-project/vllm/blob/main/csrc/attention/paged_attention_v2.cu) implementation requires dedicated CUDA kernels and shows limited flexibility on supporting emerging attention variants. In this section, we present a PT2-native PagedAttention implementation that is enabled by flex attention and torch.compile.
+
+PagedAttention scatters KV cache to reduce memory fragmentation and support higher batch sizes. Without PagedAttention, KV cache from the same request are stored in a contiguous memory, requiring 2 tensor of shape *B x H x KV LEN x D*. We call it a logical KV cache. Here, KV_LEN is the maximum sequence length over all requests in a batch. Considering the Figure 1(a), KV_LEN is 9 thus all requests must be padded to 9 tokens, leading to large memory waste. With PagedAttention, we can chunk each request into multiple pages of the same size page_size and scatter these pages into a physical KV cache of shape *1 x H x max seq len x D*, where max_seq_len=n_pages x page_size. This avoids padding requests to the same length and saves memory. Specifically, we provide an `assign` API to update KV cache via index computations:
+
+```
+def assign(
+ batch_idx: torch.Tensor,
+ input_pos: torch.Tensor,
+ k_val: torch.Tensor,
+ v_val: torch.Tensor,
+ k_cache: torch.Tensor,
+ v_cache: torch.Tensor,
+) -> None
+```
+
+Behind this `assign` API is a page table, a tensor mapping logical KV cache to physical KV cache:
+
+[batch_idx, logical_page_idx] -> physical_page_idx
+
+`assign` takes `k_val` and `v_val` and scatters to physical KV cache guided by the mapping from the page table.
+
+
+{:style="width:100%"}
+
+
+**Paged Attention with Page Table**
+
+A natural question is, how to integrate PagedAttention with flex attention to support diverse attention variants? A naive idea is to materialize the logical KV cache before computing with flex attention. But this leads to redundant memory copy and bad performance. Another idea is to build a dedicated CUDA or Triton kernel for paged attention, similar to [existing PagedAttention implementation](https://github.com/vllm-project/vllm/blob/main/csrc/attention/paged_attention_v2.cu). However, this adds much manual effort and code complexity.
+
+Instead, we design a fused indirect memory access by converting a logical block mask according to the page table. In FlexAttention, we exploit BlockMask to identify logical blocks and skip redundant computation. While Paged Attention adds an extra layer of indirect memory access, we can further convert the logical block mask to the physical block mask corresponding to the page table, as illustrated in Figure 2. Our PagedAttention implementation provides a `convert_logical_block_mask` via torch.gather calls:
+
+```
+def convert_logical_block_mask(
+ block_mask: BlockMask,
+ batch_idx: Optional[torch.Tensor] = None,
+) -> BlockMask
+```
+
+{:style="width:100%"}
+
+
+
+**Paged Attention via Block Mask Conversion**
+
+One remaining question is how to rewrite user-specified `mask_mod` and `score_mod` for PagedAttention. When users specify these modifications, they write with logical indices without the knowledge of the page table maintained at runtime. The following code shows an automated conversion at runtime which is necessary to rewrite user-specified modifications with physical kv indices. The `new_mask_mod` would take the physical_kv_idx and convert it back to the logical_kv_idx and apply user-specified `mask_mod` on the logical_kv_idx for the correct mask. For efficiency, we maintain physical_to_logical as a mapping from physical_kv_block to logical_kv_block to facilitate the conversion. For correctness, we mask out-of-boundary blocks as False with a `torch.where` call. After batching logical KV caches from multiple requests into the same physical KV cache, there are much more physical blocks than the number of logical blocks for each request. Thus, a physical block may not have a corresponding logical block for a specific request during block mask conversion. By masking as False with `torch.where`, we can ensure the correctness that data from different requests do not interfere with each other. Similarly, we can convert the [score_mod](https://github.com/pytorch/pytorch/blob/main/torch/nn/attention/experimental/_paged_attention.py#L308-L338) automatically.
+
+```
+def get_mask_mod(mask_mod: Optional[_mask_mod_signature]) -> _mask_mod_signature:
+ if mask_mod is None:
+ mask_mod = noop_mask
+
+ def new_mask_mod(
+ b: torch.Tensor,
+ h: torch.Tensor,
+ q_idx: torch.Tensor,
+ physical_kv_idx: torch.Tensor,
+ ):
+ physical_kv_block = physical_kv_idx // page_size
+ physical_kv_offset = physical_kv_idx % page_size
+ logical_block_idx = physical_to_logical[b, physical_kv_block]
+ logical_kv_idx = logical_block_idx * page_size + physical_kv_offset
+ return torch.where(
+ logical_block_idx >= 0, mask_mod(b, h, q_idx, logical_kv_idx), False
+ )
+
+ return new_mask_mod
+```
+
+Figure 3 demonstrates the latency from Paged Attention ([code](https://github.com/pytorch-labs/attention-gym/blob/main/attn_gym/paged_attention/latency.py)). Overall, there is less than 5% overhead from Flex Attention with Paged Attention, compared with Flex Attention only. We also observe an on-par performance with Flash Attention v2. A [minimal serving example](https://github.com/pytorch-labs/attention-gym/blob/main/attn_gym/paged_attention/throughput.py) further shows that PagedAttention can support 76x higher batch size when evaluating on [OpenOrca dataset](https://huggingface.co/datasets/Open-Orca/OpenOrca) which includes 1M GPT-4 completions and 3.2M GPT-3.5 completions.
+
+
+{:style="width:100%"}
+
+
+**Paged Attention: Latency under diverse sequence length**
+
+
+## Ragged input sequences with Nested Jagged Tensors (NJTs)
+
+FlexAttention now supports ragged-sized input sequences through the use of Nested Jagged Tensors (NJTs). NJTs represent ragged-sized sequences by packing sequences into a single “stacked sequence” and maintaining a set of offsets delimiting sequence boundaries for each batch item.
+
+A block mask can be created for input NJTs through the new `create_nested_block_mask()` API. The returned block mask is compatible with the ragged structure of the given NJT, treating it as a single “stacked sequence” with inter-sequence attention automatically masked out. The mask_mod or score_mod function can be written as usual.
+
+```
+from torch.nn.attention.flex_attention import create_nested_block_mask, flex_attention
+
+BATCH = 8
+NUM_HEADS = 8
+D = 16
+device = "cuda"
+
+# Input NJTs of shape (BATCH, SEQ_LEN*, D) with ragged SEQ_LEN
+sequence_lengths = [torch.randint(5, 30, ()).item() for _ in range(BATCH)]
+query = torch.nested.nested_tensor([
+ torch.randn(seq_len, NUM_HEADS * D, device=device)
+ for seq_len in sequence_lengths
+], layout=torch.jagged)
+key = torch.randn_like(query)
+value = torch.randn_like(query)
+
+# View as shape (BATCH, NUM_HEADS, SEQ_LEN*, HEAD_DIM)
+query = query.unflatten(-1, [NUM_HEADS, D]).transpose(1, 2)
+key = key.unflatten(-1, [NUM_HEADS, D]).transpose(1, 2)
+value = value.unflatten(-1, [NUM_HEADS, D]).transpose(1, 2)
+
+# Simple causal mask
+def my_mask_mod(b, h, q_idx, kv_idx):
+ return q_idx >= kv_idx
+
+# Construct a block mask using the ragged structure of the
+# specified query NJT. Ragged-sized sequences are treated as a single
+# "stacked sequence" with inter-sequence attention masked out.
+block_mask = create_nested_block_mask(my_mask_mod, 1, 1, query)
+
+# For cross attention, create_nested_block_mask() also supports a
+# rectangular block mask using the ragged structures of both query / key.
+#block_mask = create_nested_block_mask(my_mask_mod, 1, 1, query, key)
+
+output = flex_attention(query, key, value, block_mask=block_mask)
+```
+
+## Trainable Biases
+
+FlexAttention now supports trainable parameters in `score_mod functions.` This feature enables users to reference tensors that require gradients within their `score_mod` implementations, with gradients automatically backpropagating through these parameters during training.
+
+
+### Memory-Efficient Gradient Accumulation
+
+Instead of materializing the full attention scores matrix, FlexAttention uses atomic additions (`tl.atomic_add`) to accumulate gradients. This approach significantly reduces memory usage at the cost of introducing some non-determinism in gradient calculations.
+
+
+### Handling Broadcasted Operations
+
+Broadcasting operations in the forward pass (e.g., `score + bias[h]`) require special consideration in the backward pass. When broadcasting a tensor across multiple attention scores within a head or other dimensions, we need to reduce these gradients back to the original tensor shape. Rather than materializing the full attention score matrix to perform this reduction, we use atomic operations. While this incurs some runtime overhead, it allows us to maintain memory efficiency by avoiding the materialization of large intermediate tensors.
+
+
+### Current Limitations
+
+The implementation currently allows only a single read from each input tensor in the `score_mod` function. For example, `bias[q_idx] + bias[kv_idx]` would not be supported as it reads from the same tensor twice. We hope to remove this restriction in the future.
+
+
+### Simple Example:
+
+```
+bias = torch.randn(num_heads, requires_grad=True)
+def score_mod(score, b, h, q_idx, kv_idx):
+ return score + bias[h]
+```
+
+## Performance Tuning for FlexAttention
+
+
+### TL;DR
+
+For optimal performance, compile FlexAttention using `max-autotune`, especially when dealing with complex `score_mods` and `mask_mods`:
+
+flex_attention = torch.compile(flex_attention, dynamic=True, mode='max-autotune')
+
+
+### What is `max-autotune`?
+
+`max-autotune` is a `torch.compile` mode in which TorchInductor sweeps many kernel parameters (e.g., tile size, `num_stages`) and selects the best-performing configuration. This process allows kernels to test both successful and failing configurations without issues, and find the best viable configuration.
+
+While compilation takes longer with `max-autotune`, the optimal configuration is cached for future kernel executions.
+
+Here’s an example of FlexAttention compiled with `max-autotune`:
+
+```
+triton_flex_attention_backward_7 0.2528 ms 100.0% BLOCKS_ARE_CONTIGUOUS=False, BLOCK_M1=32, BLOCK_M2=32, BLOCK_N1=32, BLOCK_N2=32, FLOAT32_PRECISION="'ieee'", GQA_SHARED_HEADS=7, HAS_FULL_BLOCKS=False, IS_DIVISIBLE=False, OUTPUT_LOGSUMEXP=True, PRESCALE_QK=False, QK_HEAD_DIM=128, ROWS_GUARANTEED_SAFE=False, SM_SCALE=0.08838834764831843, SPARSE_KV_BLOCK_SIZE=1073741824, SPARSE_Q_BLOCK_SIZE=1073741824, V_HEAD_DIM=128, num_stages=4, num_warps=4
+```
+
+### Why Use `max-autotune` for FlexAttention?
+
+The amount of shared memory utilized in FlexAttention depends on `score_mod` and `mask_mod` methods. This variability means that the preconfigured default kernel parameters may lead to performance cliffs or even out of shared memory** **errors on certain hardware for some masks/mods.
+
+For instance, with document masks, default configurations can halve GPU occupancy, reducing performance to ~75% of its potential on some GPUs. To avoid such issues, we strongly recommend enabling `max-autotune`.
+
+
+## Updates and Enhancements
+
+* Now available as a prototype feature in PyTorch 2.5.0
+* Fixed critical correctness issues, including a bug affecting multiple calls to FlexAttention within the same call to torch.compile
+
+
+## Expanded Architecture Support
+
+* Arbitrary sequence length support - no longer requires multiples of 128
+* Added native grouped-query attention (GQA) support via `is_gqa=True`
+* Enhanced dimension flexibility:
+ * Different QK and V head dimensions
+ * Non-power-of-two head dimensions
+* Trainable attention biases (prototype)
+
+
+## Under the Hood
+
+* New fused CPU backend
+* Improved TF32 handling for float32 inputs
+* Resolved various dynamic shape issues
+* Output layout matching query strides
+
+These updates make FlexAttention more robust and flexible while maintaining its core promise of combining PyTorch's ease of use with FlashAttention's performance benefits.
\ No newline at end of file
diff --git a/_posts/2025-05-01-docathon-2025.md b/_posts/2025-05-01-docathon-2025.md
new file mode 100644
index 000000000000..1ad33370e775
--- /dev/null
+++ b/_posts/2025-05-01-docathon-2025.md
@@ -0,0 +1,54 @@
+---
+layout: blog_detail
+title: 'Announcing the PyTorch Docathon 2025'
+---
+
+{:style="max-width:600px; display: block; margin-left: auto; margin-right: auto"}
+
+
+We're thrilled to announce the [2025 PyTorch Docathon](https://community.linuxfoundation.org/events/details/lfhq-pytorch-foundation-presents-pytorch-docathon-june-3rd-18th-2025/)! This is a hackathon-style event aimed at enhancing PyTorch documentation with the support of the community. Documentation is a vital component of any technology, and by refining it, we can simplify the onboarding process for new users, help them effectively utilize PyTorch's features, and ultimately speed up the transition from research to production in machine learning.
+
+
+## WHY PARTICIPATE
+
+
+### Low Barrier to Entry
+
+Unlike many open-source projects that require deep knowledge of the codebase and previous contributions to join hackathon events, the Docathon is tailored for newcomers. While we expect participants to be familiar with Python, and have basic knowledge of PyTorch and machine learning, there are tasks related to website issues that don't even require that level of expertise.
+
+
+### Tangible Results
+
+A major advantage of the Docathon is witnessing the immediate impact of your contributions. Enhancing documentation significantly boosts a project's usability and accessibility, and you'll be able to observe these improvements directly. Seeing tangible outcomes can also be a strong motivator to continue contributing.
+
+
+### Collaborative Environment
+
+The Docathon fosters a collaborative atmosphere, offering you the chance to work alongside other contributors and PyTorch maintainers to improve the documentation. This is a fantastic opportunity to learn from peers, exchange ideas, and build connections.
+
+
+### Learning Opportunities
+
+Even if you're not a PyTorch expert, the Docathon offers a valuable learning experience. You'll have the chance to delve into PyTorch modules, test tutorials on your machine, and explore them in the CI environment.
+
+
+## WHO SHOULD PARTICIPATE
+
+Whether you’re a seasoned documentation expert or just starting out, we invite everyone to join in the PyTorch docathon to contribute and develop your skills and knowledge to help improve the documentation for everyone! We will have issues labelled by skill level, and the PyTorch Discord will be available for collaboration and help.
+
+
+## EVENT DETAILS
+
+
+
+* June 3: Kick-off 10 AM PT
+* June 4 - June 15: Submissions and Feedback
+* June 16 - June 17: Final Reviews
+* June 18: Winner Announcements
+
+Make sure to [RSVP](https://community.linuxfoundation.org/events/details/lfhq-pytorch-foundation-presents-pytorch-docathon-june-3rd-18th-2025/) to the event so you receive all the notifications and instructions on how to participate.
+
+Further details about the Docathon will be shared during the Kick-off call on June 3.
+
+
+**Don't forget to register for this year's event: [RSVP now](https://community.linuxfoundation.org/events/details/lfhq-pytorch-foundation-presents-pytorch-docathon-june-3rd-18th-2025/)**
\ No newline at end of file
diff --git a/_posts/2025-05-01-how-ibm-uses-pt-terratorch.md b/_posts/2025-05-01-how-ibm-uses-pt-terratorch.md
new file mode 100644
index 000000000000..db6955023bc0
--- /dev/null
+++ b/_posts/2025-05-01-how-ibm-uses-pt-terratorch.md
@@ -0,0 +1,90 @@
+---
+layout: blog_detail
+title: 'How IBM Research Uses PyTorch and TerraTorch to Make Geospatial Computer Vision Accessible for Everyone'
+hidden: true
+---
+
+Earth Observation-based analytics are becoming essential for understanding our planet — from monitoring deforestation to tracking urban development and analyzing the impacts of climate change. However, the coding and deep learning skills for applying AI models to satellite imagery and earth observation data has traditionally been a major barrier for many practitioners.
+
+By IBM Research’s launch of TerraTorch 1.0, a PyTorch domain library for fine-tuning of Geospatial Computer Vision Foundation Models, we make geospatial AI not only more accessible but also more practical for the wider PyTorch community. Our goal: simplify the process so that any data scientist, researcher, or enthusiast can build powerful geospatial models with ease and low GPU and data processing requirements.
+
+{:style="width:100%"}
+
+
+**The power of foundation models, even with 75-95% of the input data removed, the models do a fantastic job in reconstruction of the input data - therefore learning the underlying physics of our planet in a deep, latent space**
+
+## The Business Challenge
+
+Our goal was to remove the technical barriers that prevent people from working with satellite imagery, weather and climate data at scale. Together with NASA, we’ve developed the Prithvi family of foundation models. Integrating the latest innovations of AI research using the clean API PyTorch provides has facilitated the job.
+
+We wanted to create a framework that anyone can use to go from raw data to inference ready models in just a few steps.
+
+
+{:style="width:100%"}
+
+
+**How a weather and climate foundation model created and fine-tuned on PyTorch is used for weather forecasts**
+
+## How IBM Research Used PyTorch
+
+We’ve built TerraTorch on top of PyTorch, leveraging its dynamic ecosystem to integrate:
+
+
+
+* PyTorch Lightning for clean, scalable training loops
+* TorchGeo for geospatial data handling and transformations (PyTorch transforms)
+* For foundation models like the leading generative multimodal foundation model ['Terramind'](https://research.ibm.com/blog/terramind-esa-earth-observation-model), co-developed by IBM and ESA, and [the ‘Prithvi’ family](https://huggingface.co/ibm-nasa-geospatial), co-developed by IBM and NASA, TerraTorch has been used to fine-tune all of the downstream geospatial models for satellite imagery, weather and climate data. It includes the family of fine-tuned models that IBM has released as part of [Granite](https://huggingface.co/collections/ibm-granite/granite-geospatial-models-667dacfed21bdcf60a8bc982). In addition, other interesting foundation models and ecosystem components like Clay, SatMAE, Satlas, DeCur and DOFA are included in TerraTorch.
+* Powerful and state-of-the-art vision transformers to experiment with modern neural network architectures
+* TerraTorch-Iterate build on top of PyTorch, Optuna, MLFlow and Ray Tune for Hyperparameter Optimization (HPO), Neural Architecture Search (NAS) and Foundation Model Benchmarking (GeoBench), where TerraTorch became the reference implementation
+
+
+{:style="width:100%"}
+
+**The fine-tuning and inference process is completely described in a single YAML config file. There, the architectural building blocks of the model (backbone, neck, decoder, head) are defined. The Model Factory assembles the model using the build-in and custom registries. In addition, the Optimizer and Data Modules are created as defined in the config. Finally, everything is passed to the Lightning Trainer, who executes the task.**
+
+
+With PyTorch’s flexibility, we were able to prototype quickly, iterate on model architectures, and deploy pipelines for a range of geospatial applications — from flood and biomass detection to increasing resolution of climate data, where some of our our work became part of the [IBM Granite Geospatial Model Family](https://huggingface.co/collections/ibm-granite/granite-geospatial-models-667dacfed21bdcf60a8bc982).
+
+
+{:style="width:100%"}
+
+
+**Architecture of the Prithvi-EO-2.0-600M foundation model which IBM Research developed together with NASA**
+
+## Solving AI Challenges with PyTorch
+
+PyTorch helped us to tackle three major challenges:
+
+* Ease of experimentation: Dynamic computation graphs, automatic differentiation, full abstraction of CUDA and rich visualization tools made it simple to test different models and training strategies.
+* Scalability: With DDP, FSDP, PyTorch Lightning and TorchGeo, we could train models on large-scale datasets without worrying about infrastructure.
+* Community support: PyTorch - the de-facto standard in AI research - with its active community and excellent documentation made it easy to overcome hurdles and stay up to date with the latest advancements in AI research.
+
+## A Word from IBM Research
+
+*"PyTorch gave me the power to turn complex linear algebra and optimization problems into accessible, shareable solutions for the community. It feels empowering that we’re building and fine-tuning models for anyone curious about understanding our planet through AI."*
+
+— Romeo Kienzler, AI Research Engineer at IBM Research Zurich, Rueschlikon
+
+
+{:style="width:100%"}
+
+
+## The Benefits of Using PyTorch
+
+Using PyTorch allowed us to:
+
+
+
+* Build a reproducible, open-source framework for fine-tuning geospatial foundation models
+* Share our work with the community through easy-to-follow notebooks, TerraTorch configuration files, tutorials and model checkpoints on HuggingFace
+* Rapidly iterate over foundation model architectures and deploy fine-tuned models for inference, from research to real-world client products
+
+## Learn More
+
+For more information about this project and to explore the code, visit:
+
+* [GitHub Repository](https://github.com/IBM/terratorch)
+* [IBM Research: Simplifying Geospatial AI with TerraTorch 1.0](https://research.ibm.com/blog/simplifying-geospatial-ai-with-terra-torch-1-0)
+* [TerraTorch PrithviEOv2 example notebooks](https://github.com/IBM/terratorch/tree/main/examples/tutorials/PrithviEOv2)
+* [TerraMind example notebooks](https://github.com/IBM/terramind/tree/main/notebooks)
+* [Run TerraMind using TerraTorch on Colab](https://colab.research.google.com/github/IBM/terramind/blob/main/notebooks/terramind_v1_base_sen1floods11.ipynb)
diff --git a/_posts/2025-05-02-pt-day-france-featured-sessions.md b/_posts/2025-05-02-pt-day-france-featured-sessions.md
new file mode 100644
index 000000000000..36bd9bacd37b
--- /dev/null
+++ b/_posts/2025-05-02-pt-day-france-featured-sessions.md
@@ -0,0 +1,49 @@
+---
+layout: blog_detail
+title: 'PyTorch Day France Featured Sessions: A Defining Moment for Open Source AI'
+---
+
+[PyTorch Day France](https://events.linuxfoundation.org/pytorch-day-france/) offers a front-row seat to the future of open source AI. Taking place **7 May at Station F in Paris** and co-located with **[GOSIM AI Paris](https://paris2025.gosim.org/)**, this one-day event will bring together developers, researchers, and industry leaders for a day of technical sessions, real-world insights, and community exchange.
+
+
+## 🌍 A Major Milestone for the PyTorch Foundation
+
+This event marks the very first **PyTorch Day**, launching a new international series hosted annually in different regions to convene AI researchers, developers, engineers, and enthusiasts. PyTorch Days are designed to spotlight open source AI advancements, foster community collaboration, and provide a forum to learn about active, high-impact AI projects built using PyTorch.
+
+PyTorch Day France also represents a pivotal moment in the PyTorch Foundation’s journey. With its recent [expansion into an umbrella foundation]( https://pytorch.org/blog/pt-foundation-expands/), PyTorch is now positioned to support a broader ecosystem of trusted, community-driven AI projects across the full AI lifecycle.
+
+At PyTorch Day France, you’ll hear directly from PyTorch Foundation **Executive Director, Matt White,** about this transition—and get a first look at some exciting announcements.
+
+
+## 🎟️ Registration Details
+
+[Register now](https://www.eventbrite.com/e/gosim-ai-paris-tickets-1265928669729?aff=oddtdtcreator) with code **PYTORCH** for **free access** to the full day of **PyTorch Day France** sessions, **plus** **GOSIM AI Paris**.
+
+🔗Two events, one registration—double the sessions, double the innovation. \
+[Register here](https://www.eventbrite.com/e/gosim-ai-paris-tickets-1265928669729?aff=oddtdtcreator)
+
+
+## 📅 Featured Sessions
+
+The day’s agenda includes deep technical dives and applied AI use cases from across the community, including the following talks:
+
+
+
+* [Luca Antiga (Lightning AI)](https://sched.co/21nz4)
+ *Lightning Thunder: Supercharged PyTorch for Modern Hardware*
+* [Erwan Gallen & Eldar Kurtic (Red Hat)](https://sched.co/21nyd)
+ *Scaling LLM Inference with vLLM: Multi‑Accelerator Serving and Quantized LLMs*
+* [Pierre Rouanet (Pollen Robotics)](https://sched.co/21nyX)
+ *Real-World Robotics as the Next Frontier for AI?*
+* [Pablo Montalvo (Hugging Face)](https://sched.co/21nzG)
+ *PyTorch x Transformers: Pythonicity, Autodiff, and Modularity Defining Modern AI*
+* [Pedro Ortis (Common Crawl)](https://sched.co/21nym)
+ *Harnessing Common Crawl for AI and ML Applications*
+* [Meriem Bendris (NVIDIA)](https://sched.co/21nys)
+ *Teaching Mistral to Reason: Post-Training with PyTorch and NVIDIA*
+* [Olatunji Ruwase (Snowflake)](https://sched.co/21nyy)
+ *DeepSpeed – Efficient Training Scalability for Deep Learning Models*
+
+[View the full schedule](https://pytorchdayfrance2025.sched.com/).
+
+Whether you’re a contributor, practitioner, or simply curious about what’s ahead, PyTorch Day France is an opportunity to connect with the community and shape what’s next for our ecosystem.
diff --git a/_posts/2025-05-02-pt-korea-user-group-recap.md b/_posts/2025-05-02-pt-korea-user-group-recap.md
new file mode 100644
index 000000000000..b5a2126271b8
--- /dev/null
+++ b/_posts/2025-05-02-pt-korea-user-group-recap.md
@@ -0,0 +1,87 @@
+---
+layout: blog_detail
+title: 'Recap of the PyTorch Korea User Group Meetup: A Technical Conference with a PyTorch Core Maintainer'
+author: 'Jiho Kim, PyTorch Korea User Group'
+---
+
+At the end of March, the PyTorch Korea User Group hosted a special meetup that brought together prominent speakers for deep discussions on the PyTorch core and its broader ecosystem. With the event more than doubling in size compared to past gatherings, we were able to connect with even more developers and share insights. Huge thanks to [goorm](https://goorm.co/) for sponsoring the fantastic venue! 😄
+
+
+{:style="width:100%"}
+
+
+
+This recap is for those who couldn’t attend in person, as well as for participants who want to revisit the energy and insights of the day. The event featured experts in core PyTorch, AI accelerators, inference optimization, and large language model development. Below is a quick overview of the key sessions that anchored the conference.
+
+
+
+## 1️⃣ Jerry Lee | PyTorch Foundation
+
+Representing the PyTorch Foundation, part of the Linux Foundation, Jaeung provided an overview of how PyTorch is driving core open source technologies forward. He shared PyTorch's growth story, the many global projects currently in motion, and the ecosystem’s impressive 20%+ annual growth. The session also covered how the foundation operates, how member organizations are involved, and upcoming plans that are particularly useful for practitioners.
+
+
+{:style="width:100%"}
+
+
+## 2️⃣ Alban Desmaison | PyTorch Roadmap
+
+Alban shared the design philosophy behind PyTorch and Meta’s official contribution roadmap ([link](https://dev-discuss.pytorch.org/t/meta-pytorch-team-2025-h1-roadmaps/2794)). He provided a deep technical dive into the differences between Eager and Compiled modes, especially breaking down the backend architecture of device Eager execution. Practical tools and improvements were also introduced—such as memory profilers, enhanced custom operator support, and pinned memory optimizations.
+
+
+{:style="width:100%"}
+
+
+
+
+## 3️⃣ Hongseok Kim | PyTorch on Rebellions AI Accelerators: Status
+
+Rebellions is building runtime integration for their proprietary NPU architecture, fully aligned with the structural changes in PyTorch 2.0. This talk introduced the performance and scalability of their upcoming chip, their integration strategy with the PyTorch runtime, and challenges in supporting Eager Mode. Hongseok also previewed their roadmap toward releasing these features within the year.
+
+{:style="width:100%"}
+
+
+
+## 4️⃣ Kyujin Cho | Backend.AI: A Unified Platform for All AI Accelerators
+
+Backend.AI abstracts and integrates various AI accelerators into a unified workflow. As the diversity of accelerator architectures grows, the need for portability and infrastructure unification becomes even more important. This session showcased features across development and operations—from NPU scheduling and resource allocation to monitoring. Backend.AI currently supports accelerators from NVIDIA, Intel, Tenstorrent, Rebellions, and more.
+
+{:style="width:100%"}
+
+
+
+## 5️⃣ Taeho Kim | Optimizing & Deploying Models Across Multiple Chipsets Using NetsPresso
+
+This talk focused on the challenges of inference in real-world industrial applications of AI models. As new state-of-the-art models emerge rapidly, there’s a growing need for environments that can quickly validate device compatibility—ideally with one-click ease. NetsPresso is actively working on a static graph representation compatible with PyTorch, offering efficient support for model development, optimization, and testing.
+
+
+{:style="width:100%"}
+
+
+## 6️⃣ Jungyeop Lee | The Journey to Reproduce Deepseek-R1
+
+Jungyeop took us through his journey of reproducing Deepseek, a large language model—an effort that involved 201 experiments. He shared real-world lessons from training with Korean data, tokenizer modifications, and fine-tuning strategies. His practical insights and next steps were especially valuable for those building or re-implementing large models from scratch.
+
+
+{:style="width:100%"}
+
+
+## 7️⃣ Sol Kim | A journey from TCP architecture to production-level LLMs
+
+Sol presented an integrated optimization approach to deploying large models using the TCP(Tensor Contraction Processor) architecture, which supports tensor contraction at the hardware level. The talk highlighted optimization techniques built on hardware abstraction layers (HALs) and bottom-up integration strategies with PyTorch—offering a hybrid hardware-software perspective.
+
+
+{:style="width:100%"}
+
+## 💡 Panel Talk & Q&A 💡
+
+The event wrapped up with an engaging panel discussion. Attendees asked sharp questions, and the speakers offered insightful answers. It was a powerful moment that captured the community’s enthusiasm for PyTorch and their hunger for deeper technical understanding.
+
+
+{:style="width:100%"}
+
+
+## Final Thoughts
+
+Since our first offline meetup in October 2022, the PyTorch Korea User Group has held five major technical conferences. Each event deepens our appreciation for the scale and depth of the PyTorch ecosystem. With perspectives from users, contributors, and ecosystem builders, the stories we share are only growing—and we’re committed to continuing this journey together.
+
+See you at the next conference—with even more exciting talks to come! 🙌
\ No newline at end of file
diff --git a/assets/images/6x-faster-async-checkpointing/fg1.png b/assets/images/6x-faster-async-checkpointing/fg1.png
new file mode 100644
index 000000000000..4c3d3bf50d02
Binary files /dev/null and b/assets/images/6x-faster-async-checkpointing/fg1.png differ
diff --git a/assets/images/6x-faster-async-checkpointing/fg2.png b/assets/images/6x-faster-async-checkpointing/fg2.png
new file mode 100644
index 000000000000..1eaddbc43e68
Binary files /dev/null and b/assets/images/6x-faster-async-checkpointing/fg2.png differ
diff --git a/assets/images/6x-faster-async-checkpointing/fg3.png b/assets/images/6x-faster-async-checkpointing/fg3.png
new file mode 100644
index 000000000000..4c3d3bf50d02
Binary files /dev/null and b/assets/images/6x-faster-async-checkpointing/fg3.png differ
diff --git a/assets/images/docathon-2025.png b/assets/images/docathon-2025.png
new file mode 100644
index 000000000000..aad9c70d1f36
Binary files /dev/null and b/assets/images/docathon-2025.png differ
diff --git a/assets/images/flexattention-for-inference/fg1.png b/assets/images/flexattention-for-inference/fg1.png
new file mode 100644
index 000000000000..c42a3bf5717f
Binary files /dev/null and b/assets/images/flexattention-for-inference/fg1.png differ
diff --git a/assets/images/flexattention-for-inference/fg10.png b/assets/images/flexattention-for-inference/fg10.png
new file mode 100644
index 000000000000..70d9e441b97c
Binary files /dev/null and b/assets/images/flexattention-for-inference/fg10.png differ
diff --git a/assets/images/flexattention-for-inference/fg11.png b/assets/images/flexattention-for-inference/fg11.png
new file mode 100644
index 000000000000..94697c426b7e
Binary files /dev/null and b/assets/images/flexattention-for-inference/fg11.png differ
diff --git a/assets/images/flexattention-for-inference/fg2.png b/assets/images/flexattention-for-inference/fg2.png
new file mode 100644
index 000000000000..47ae6ab99d26
Binary files /dev/null and b/assets/images/flexattention-for-inference/fg2.png differ
diff --git a/assets/images/flexattention-for-inference/fg3.png b/assets/images/flexattention-for-inference/fg3.png
new file mode 100644
index 000000000000..06bc61656d47
Binary files /dev/null and b/assets/images/flexattention-for-inference/fg3.png differ
diff --git a/assets/images/flexattention-for-inference/fg4.png b/assets/images/flexattention-for-inference/fg4.png
new file mode 100644
index 000000000000..b78a15172977
Binary files /dev/null and b/assets/images/flexattention-for-inference/fg4.png differ
diff --git a/assets/images/flexattention-for-inference/fg5.png b/assets/images/flexattention-for-inference/fg5.png
new file mode 100644
index 000000000000..dbb7081efe98
Binary files /dev/null and b/assets/images/flexattention-for-inference/fg5.png differ
diff --git a/assets/images/flexattention-for-inference/fg6.png b/assets/images/flexattention-for-inference/fg6.png
new file mode 100644
index 000000000000..d2221e66d982
Binary files /dev/null and b/assets/images/flexattention-for-inference/fg6.png differ
diff --git a/assets/images/flexattention-for-inference/fg7.png b/assets/images/flexattention-for-inference/fg7.png
new file mode 100644
index 000000000000..6ec36ad490c5
Binary files /dev/null and b/assets/images/flexattention-for-inference/fg7.png differ
diff --git a/assets/images/flexattention-for-inference/fg8.png b/assets/images/flexattention-for-inference/fg8.png
new file mode 100644
index 000000000000..a6c6a5227db8
Binary files /dev/null and b/assets/images/flexattention-for-inference/fg8.png differ
diff --git a/assets/images/flexattention-for-inference/fg9.png b/assets/images/flexattention-for-inference/fg9.png
new file mode 100644
index 000000000000..8187641ba4b5
Binary files /dev/null and b/assets/images/flexattention-for-inference/fg9.png differ
diff --git a/assets/images/governing-board/Dwarak-Rajagopal.jpg b/assets/images/governing-board/Dwarak-Rajagopal.jpg
deleted file mode 100644
index dc8d4a070614..000000000000
Binary files a/assets/images/governing-board/Dwarak-Rajagopal.jpg and /dev/null differ
diff --git a/assets/images/governing-board/dwarakrajagopal2.jpg b/assets/images/governing-board/dwarakrajagopal2.jpg
new file mode 100644
index 000000000000..9036b956605d
Binary files /dev/null and b/assets/images/governing-board/dwarakrajagopal2.jpg differ
diff --git a/assets/images/governing-board/ricardo-aravena.jpg b/assets/images/governing-board/ricardo-aravena.jpg
new file mode 100644
index 000000000000..4c76381a73cf
Binary files /dev/null and b/assets/images/governing-board/ricardo-aravena.jpg differ
diff --git a/assets/images/how-ibm-uses-pt-terratorch/fg1.png b/assets/images/how-ibm-uses-pt-terratorch/fg1.png
new file mode 100644
index 000000000000..140186a272cf
Binary files /dev/null and b/assets/images/how-ibm-uses-pt-terratorch/fg1.png differ
diff --git a/assets/images/how-ibm-uses-pt-terratorch/fg2.png b/assets/images/how-ibm-uses-pt-terratorch/fg2.png
new file mode 100644
index 000000000000..7a37b893773d
Binary files /dev/null and b/assets/images/how-ibm-uses-pt-terratorch/fg2.png differ
diff --git a/assets/images/how-ibm-uses-pt-terratorch/fg3.png b/assets/images/how-ibm-uses-pt-terratorch/fg3.png
new file mode 100644
index 000000000000..bcbe77ea9eca
Binary files /dev/null and b/assets/images/how-ibm-uses-pt-terratorch/fg3.png differ
diff --git a/assets/images/how-ibm-uses-pt-terratorch/fg4.png b/assets/images/how-ibm-uses-pt-terratorch/fg4.png
new file mode 100644
index 000000000000..798947a41f20
Binary files /dev/null and b/assets/images/how-ibm-uses-pt-terratorch/fg4.png differ
diff --git a/assets/images/how-ibm-uses-pt-terratorch/fg5.png b/assets/images/how-ibm-uses-pt-terratorch/fg5.png
new file mode 100644
index 000000000000..a8306bf3ed84
Binary files /dev/null and b/assets/images/how-ibm-uses-pt-terratorch/fg5.png differ
diff --git a/assets/images/pt-korea-user-group-recap/fg1.jpg b/assets/images/pt-korea-user-group-recap/fg1.jpg
new file mode 100644
index 000000000000..dbd408d6baf8
Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg1.jpg differ
diff --git a/assets/images/pt-korea-user-group-recap/fg1.png b/assets/images/pt-korea-user-group-recap/fg1.png
new file mode 100644
index 000000000000..02caf444eb0b
Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg1.png differ
diff --git a/assets/images/pt-korea-user-group-recap/fg2.jpg b/assets/images/pt-korea-user-group-recap/fg2.jpg
new file mode 100644
index 000000000000..924780bb6a3c
Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg2.jpg differ
diff --git a/assets/images/pt-korea-user-group-recap/fg2.png b/assets/images/pt-korea-user-group-recap/fg2.png
new file mode 100644
index 000000000000..3153c75f603c
Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg2.png differ
diff --git a/assets/images/pt-korea-user-group-recap/fg3.jpg b/assets/images/pt-korea-user-group-recap/fg3.jpg
new file mode 100644
index 000000000000..b49e18e82f9d
Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg3.jpg differ
diff --git a/assets/images/pt-korea-user-group-recap/fg3.png b/assets/images/pt-korea-user-group-recap/fg3.png
new file mode 100644
index 000000000000..ed9e8ccadb3e
Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg3.png differ
diff --git a/assets/images/pt-korea-user-group-recap/fg4.jpg b/assets/images/pt-korea-user-group-recap/fg4.jpg
new file mode 100644
index 000000000000..b89e21c71d78
Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg4.jpg differ
diff --git a/assets/images/pt-korea-user-group-recap/fg4.png b/assets/images/pt-korea-user-group-recap/fg4.png
new file mode 100644
index 000000000000..4831610f703d
Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg4.png differ
diff --git a/assets/images/pt-korea-user-group-recap/fg5.jpg b/assets/images/pt-korea-user-group-recap/fg5.jpg
new file mode 100644
index 000000000000..5e41bbd65b32
Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg5.jpg differ
diff --git a/assets/images/pt-korea-user-group-recap/fg5.png b/assets/images/pt-korea-user-group-recap/fg5.png
new file mode 100644
index 000000000000..02e5b01ac852
Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg5.png differ
diff --git a/assets/images/pt-korea-user-group-recap/fg6.jpg b/assets/images/pt-korea-user-group-recap/fg6.jpg
new file mode 100644
index 000000000000..ce7c789c2385
Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg6.jpg differ
diff --git a/assets/images/pt-korea-user-group-recap/fg6.png b/assets/images/pt-korea-user-group-recap/fg6.png
new file mode 100644
index 000000000000..5e5650ab16cf
Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg6.png differ
diff --git a/assets/images/pt-korea-user-group-recap/fg7.jpg b/assets/images/pt-korea-user-group-recap/fg7.jpg
new file mode 100644
index 000000000000..e0b7f94f2a3b
Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg7.jpg differ
diff --git a/assets/images/pt-korea-user-group-recap/fg7.png b/assets/images/pt-korea-user-group-recap/fg7.png
new file mode 100644
index 000000000000..3371ea3c6073
Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg7.png differ
diff --git a/assets/images/pt-korea-user-group-recap/fg8.jpg b/assets/images/pt-korea-user-group-recap/fg8.jpg
new file mode 100644
index 000000000000..8d57c2a30bc4
Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg8.jpg differ
diff --git a/assets/images/pt-korea-user-group-recap/fg8.png b/assets/images/pt-korea-user-group-recap/fg8.png
new file mode 100644
index 000000000000..4ba3b87290b3
Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg8.png differ
diff --git a/assets/images/pt-korea-user-group-recap/fg9.jpg b/assets/images/pt-korea-user-group-recap/fg9.jpg
new file mode 100644
index 000000000000..e631c2a29545
Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg9.jpg differ
diff --git a/assets/images/pt-korea-user-group-recap/fg9.png b/assets/images/pt-korea-user-group-recap/fg9.png
new file mode 100644
index 000000000000..02d9224b8c7d
Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg9.png differ
diff --git a/assets/quick-start-module.js b/assets/quick-start-module.js
index 5088ab09d4fa..ab2929175469 100644
--- a/assets/quick-start-module.js
+++ b/assets/quick-start-module.js
@@ -11,8 +11,8 @@ var archInfoMap = new Map([
['accnone', {title: "CPU", platforms: new Set(['linux', 'macos', 'windows'])}]
]);
-let version_map={"nightly": {"accnone": ["cpu", ""], "cuda.x": ["cuda", "11.8"], "cuda.y": ["cuda", "12.6"], "cuda.z": ["cuda", "12.8"], "rocm5.x": ["rocm", "6.3"]}, "release": {"accnone": ["cpu", ""], "cuda.x": ["cuda", "11.8"], "cuda.y": ["cuda", "12.6"], "cuda.z": ["cuda", "12.8"], "rocm5.x": ["rocm", "6.3"]}}
-let stable_version="Stable (2.7.0)";
+let version_map={"nightly": {"accnone": ["cpu", ""], "cuda.x": ["cuda", "11.8"], "cuda.y": ["cuda", "12.6"], "cuda.z": ["cuda", "12.8"], "rocm5.x": ["rocm", "6.4"]}, "release": {"accnone": ["cpu", ""], "cuda.x": ["cuda", "11.8"], "cuda.y": ["cuda", "12.6"], "cuda.z": ["cuda", "12.8"], "rocm5.x": ["rocm", "6.3"]}}
+let stable_version="Stable (2.7.1)";
var default_selected_os = getAnchorSelectedOS() || getDefaultSelectedOS();
var opts = {
@@ -267,7 +267,7 @@ $("[data-toggle='cloud-dropdown']").on("click", function(e) {
});
function commandMessage(key) {
- var object = {"preview,pip,linux,accnone,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,linux,cuda.x,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118", "preview,pip,linux,cuda.y,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", "preview,pip,linux,cuda.z,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128", "preview,pip,linux,rocm5.x,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3", "preview,conda,linux,cuda.x,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "preview,conda,linux,cuda.y,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "preview,conda,linux,cuda.z,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "preview,conda,linux,rocm5.x,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "preview,conda,linux,accnone,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "preview,libtorch,linux,accnone,cplusplus": "Download here (cxx11 ABI): https://download.pytorch.org/libtorch/nightly/cpu/libtorch-cxx11-abi-shared-with-deps-latest.zip", "preview,libtorch,linux,cuda.x,cplusplus": "Download here (cxx11 ABI): https://download.pytorch.org/libtorch/nightly/cu118/libtorch-cxx11-abi-shared-with-deps-latest.zip", "preview,libtorch,linux,cuda.y,cplusplus": "Download here (cxx11 ABI): https://download.pytorch.org/libtorch/nightly/cu126/libtorch-cxx11-abi-shared-with-deps-latest.zip", "preview,libtorch,linux,cuda.z,cplusplus": "Download here (cxx11 ABI): https://download.pytorch.org/libtorch/nightly/cu128/libtorch-cxx11-abi-shared-with-deps-latest.zip", "preview,libtorch,linux,rocm5.x,cplusplus": "Download here (cxx11 ABI): https://download.pytorch.org/libtorch/nightly/rocm6.3/libtorch-cxx11-abi-shared-with-deps-latest.zip", "preview,pip,macos,cuda.x,python": "# CUDA is not available on MacOS, please use default package pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,macos,cuda.y,python": "# CUDA is not available on MacOS, please use default package pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,macos,cuda.z,python": "# CUDA is not available on MacOS, please use default package pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,macos,rocm5.x,python": "# ROCm is not available on MacOS, please use default package pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,macos,accnone,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,conda,macos,cuda.x,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "preview,conda,macos,cuda.y,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "preview,conda,macos,cuda.z,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "preview,conda,macos,rocm5.x,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "preview,conda,macos,accnone,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "preview,libtorch,macos,accnone,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported): https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "preview,libtorch,macos,cuda.x,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported): https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "preview,libtorch,macos,cuda.y,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported): https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "preview,libtorch,macos,cuda.z,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported): https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "preview,libtorch,macos,rocm5.x,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported): https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "preview,pip,windows,accnone,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,windows,cuda.x,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118", "preview,pip,windows,cuda.y,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", "preview,pip,windows,cuda.z,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128", "preview,pip,windows,rocm5.x,python": "NOTE: ROCm is not available on Windows", "preview,conda,windows,cuda.x,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "preview,conda,windows,cuda.y,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "preview,conda,windows,cuda.z,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "preview,conda,windows,rocm5.x,python": "NOTE: ROCm is not available on Windows", "preview,conda,windows,accnone,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "preview,libtorch,windows,accnone,cplusplus": "Download here (Release version): https://download.pytorch.org/libtorch/nightly/cpu/libtorch-win-shared-with-deps-latest.zip Download here (Debug version): https://download.pytorch.org/libtorch/nightly/cpu/libtorch-win-shared-with-deps-debug-latest.zip", "preview,libtorch,windows,cuda.x,cplusplus": "Download here (Release version): https://download.pytorch.org/libtorch/nightly/cu118/libtorch-win-shared-with-deps-latest.zip Download here (Debug version): https://download.pytorch.org/libtorch/nightly/cu118/libtorch-win-shared-with-deps-debug-latest.zip", "preview,libtorch,windows,cuda.y,cplusplus": "Download here (Release version): https://download.pytorch.org/libtorch/nightly/cu126/libtorch-win-shared-with-deps-latest.zip Download here (Debug version): https://download.pytorch.org/libtorch/nightly/cu126/libtorch-win-shared-with-deps-debug-latest.zip", "preview,libtorch,windows,cuda.z,cplusplus": "Download here (Release version): https://download.pytorch.org/libtorch/nightly/cu128/libtorch-win-shared-with-deps-latest.zip Download here (Debug version): https://download.pytorch.org/libtorch/nightly/cu128/libtorch-win-shared-with-deps-debug-latest.zip", "preview,libtorch,windows,rocm5.x,cplusplus": "NOTE: ROCm is not available on Windows", "stable,pip,linux,accnone,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu", "stable,pip,linux,cuda.x,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118", "stable,pip,linux,cuda.y,python": "pip3 install torch torchvision torchaudio", "stable,pip,linux,cuda.z,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128", "stable,pip,linux,rocm5.x,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3", "stable,conda,linux,cuda.x,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "stable,conda,linux,cuda.y,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "stable,conda,linux,cuda.z,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "stable,conda,linux,rocm5.x,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "stable,conda,linux,accnone,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "stable,libtorch,linux,accnone,cplusplus": "Download here (cxx11 ABI): https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.7.0%2Bcpu.zip", "stable,libtorch,linux,cuda.x,cplusplus": "Download here (cxx11 ABI): https://download.pytorch.org/libtorch/cu118/libtorch-cxx11-abi-shared-with-deps-2.7.0%2Bcu118.zip", "stable,libtorch,linux,cuda.y,cplusplus": "Download here (cxx11 ABI): https://download.pytorch.org/libtorch/cu126/libtorch-cxx11-abi-shared-with-deps-2.7.0%2Bcu126.zip", "stable,libtorch,linux,cuda.z,cplusplus": "Download here (cxx11 ABI): https://download.pytorch.org/libtorch/cu128/libtorch-cxx11-abi-shared-with-deps-2.7.0%2Bcu128.zip", "stable,libtorch,linux,rocm5.x,cplusplus": "Download here (cxx11 ABI): https://download.pytorch.org/libtorch/rocm6.3/libtorch-cxx11-abi-shared-with-deps-2.7.0%2Brocm6.3.zip", "stable,pip,macos,cuda.x,python": "# CUDA is not available on MacOS, please use default package pip3 install torch torchvision torchaudio", "stable,pip,macos,cuda.y,python": "# CUDA is not available on MacOS, please use default package pip3 install torch torchvision torchaudio", "stable,pip,macos,cuda.z,python": "# CUDA is not available on MacOS, please use default package pip3 install torch torchvision torchaudio", "stable,pip,macos,rocm5.x,python": "# ROCm is not available on MacOS, please use default package pip3 install torch torchvision torchaudio", "stable,pip,macos,accnone,python": "pip3 install torch torchvision torchaudio", "stable,conda,macos,cuda.x,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "stable,conda,macos,cuda.y,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "stable,conda,macos,cuda.z,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "stable,conda,macos,rocm5.x,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "stable,conda,macos,accnone,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "stable,libtorch,macos,accnone,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported): https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.0.zip", "stable,libtorch,macos,cuda.x,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported): https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.0.zip", "stable,libtorch,macos,cuda.y,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported): https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.0.zip", "stable,libtorch,macos,cuda.z,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported): https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.0.zip", "stable,libtorch,macos,rocm5.x,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported): https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.0.zip", "stable,pip,windows,accnone,python": "pip3 install torch torchvision torchaudio", "stable,pip,windows,cuda.x,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118", "stable,pip,windows,cuda.y,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126", "stable,pip,windows,cuda.z,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128", "stable,pip,windows,rocm5.x,python": "NOTE: ROCm is not available on Windows", "stable,conda,windows,cuda.x,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "stable,conda,windows,cuda.y,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "stable,conda,windows,cuda.z,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "stable,conda,windows,rocm5.x,python": "NOTE: ROCm is not available on Windows", "stable,conda,windows,accnone,python": "NOTE: Conda packages are no longer available. Please use pip instead. ", "stable,libtorch,windows,accnone,cplusplus": "Download here (Release version): https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-2.7.0%2Bcpu.zip Download here (Debug version): https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-debug-2.7.0%2Bcpu.zip", "stable,libtorch,windows,cuda.x,cplusplus": "Download here (Release version): https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-2.7.0%2Bcu118.zip Download here (Debug version): https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-debug-2.7.0%2Bcu118.zip", "stable,libtorch,windows,cuda.y,cplusplus": "Download here (Release version): https://download.pytorch.org/libtorch/cu126/libtorch-win-shared-with-deps-2.7.0%2Bcu126.zip Download here (Debug version): https://download.pytorch.org/libtorch/cu126/libtorch-win-shared-with-deps-debug-2.7.0%2Bcu126.zip", "stable,libtorch,windows,cuda.z,cplusplus": "Download here (Release version): https://download.pytorch.org/libtorch/cu128/libtorch-win-shared-with-deps-2.7.0%2Bcu128.zip Download here (Debug version): https://download.pytorch.org/libtorch/cu128/libtorch-win-shared-with-deps-debug-2.7.0%2Bcu128.zip", "stable,libtorch,windows,rocm5.x,cplusplus": "NOTE: ROCm is not available on Windows"};
+ var object = {"preview,pip,linux,accnone,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,linux,cuda.x,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118", "preview,pip,linux,cuda.y,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", "preview,pip,linux,cuda.z,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128", "preview,pip,linux,rocm5.x,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.4", "preview,libtorch,linux,accnone,cplusplus": "Download here (cxx11 ABI): https://download.pytorch.org/libtorch/nightly/cpu/libtorch-cxx11-abi-shared-with-deps-latest.zip", "preview,libtorch,linux,cuda.x,cplusplus": "Download here (cxx11 ABI): https://download.pytorch.org/libtorch/nightly/cu118/libtorch-cxx11-abi-shared-with-deps-latest.zip", "preview,libtorch,linux,cuda.y,cplusplus": "Download here (cxx11 ABI): https://download.pytorch.org/libtorch/nightly/cu126/libtorch-cxx11-abi-shared-with-deps-latest.zip", "preview,libtorch,linux,cuda.z,cplusplus": "Download here (cxx11 ABI): https://download.pytorch.org/libtorch/nightly/cu128/libtorch-cxx11-abi-shared-with-deps-latest.zip", "preview,libtorch,linux,rocm5.x,cplusplus": "Download here (cxx11 ABI): https://download.pytorch.org/libtorch/nightly/rocm6.4/libtorch-cxx11-abi-shared-with-deps-latest.zip", "preview,pip,macos,cuda.x,python": "# CUDA is not available on MacOS, please use default package pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,macos,cuda.y,python": "# CUDA is not available on MacOS, please use default package pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,macos,cuda.z,python": "# CUDA is not available on MacOS, please use default package pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,macos,rocm5.x,python": "# ROCm is not available on MacOS, please use default package pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,macos,accnone,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,libtorch,macos,accnone,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported): https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "preview,libtorch,macos,cuda.x,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported): https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "preview,libtorch,macos,cuda.y,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported): https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "preview,libtorch,macos,cuda.z,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported): https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "preview,libtorch,macos,rocm5.x,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported): https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "preview,pip,windows,accnone,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,windows,cuda.x,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118", "preview,pip,windows,cuda.y,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", "preview,pip,windows,cuda.z,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128", "preview,pip,windows,rocm5.x,python": "NOTE: ROCm is not available on Windows", "preview,libtorch,windows,accnone,cplusplus": "Download here (Release version): https://download.pytorch.org/libtorch/nightly/cpu/libtorch-win-shared-with-deps-latest.zip Download here (Debug version): https://download.pytorch.org/libtorch/nightly/cpu/libtorch-win-shared-with-deps-debug-latest.zip", "preview,libtorch,windows,cuda.x,cplusplus": "Download here (Release version): https://download.pytorch.org/libtorch/nightly/cu118/libtorch-win-shared-with-deps-latest.zip Download here (Debug version): https://download.pytorch.org/libtorch/nightly/cu118/libtorch-win-shared-with-deps-debug-latest.zip", "preview,libtorch,windows,cuda.y,cplusplus": "Download here (Release version): https://download.pytorch.org/libtorch/nightly/cu126/libtorch-win-shared-with-deps-latest.zip Download here (Debug version): https://download.pytorch.org/libtorch/nightly/cu126/libtorch-win-shared-with-deps-debug-latest.zip", "preview,libtorch,windows,cuda.z,cplusplus": "Download here (Release version): https://download.pytorch.org/libtorch/nightly/cu128/libtorch-win-shared-with-deps-latest.zip Download here (Debug version): https://download.pytorch.org/libtorch/nightly/cu128/libtorch-win-shared-with-deps-debug-latest.zip", "preview,libtorch,windows,rocm5.x,cplusplus": "NOTE: ROCm is not available on Windows", "stable,pip,linux,accnone,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu", "stable,pip,linux,cuda.x,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118", "stable,pip,linux,cuda.y,python": "pip3 install torch torchvision torchaudio", "stable,pip,linux,cuda.z,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128", "stable,pip,linux,rocm5.x,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3", "stable,libtorch,linux,accnone,cplusplus": "Download here (cxx11 ABI): https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Bcpu.zip", "stable,libtorch,linux,cuda.x,cplusplus": "Download here (cxx11 ABI): https://download.pytorch.org/libtorch/cu118/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Bcu118.zip", "stable,libtorch,linux,cuda.y,cplusplus": "Download here (cxx11 ABI): https://download.pytorch.org/libtorch/cu126/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Bcu126.zip", "stable,libtorch,linux,cuda.z,cplusplus": "Download here (cxx11 ABI): https://download.pytorch.org/libtorch/cu128/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Bcu128.zip", "stable,libtorch,linux,rocm5.x,cplusplus": "Download here (cxx11 ABI): https://download.pytorch.org/libtorch/rocm6.3/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Brocm6.3.zip", "stable,pip,macos,cuda.x,python": "# CUDA is not available on MacOS, please use default package pip3 install torch torchvision torchaudio", "stable,pip,macos,cuda.y,python": "# CUDA is not available on MacOS, please use default package pip3 install torch torchvision torchaudio", "stable,pip,macos,cuda.z,python": "# CUDA is not available on MacOS, please use default package pip3 install torch torchvision torchaudio", "stable,pip,macos,rocm5.x,python": "# ROCm is not available on MacOS, please use default package pip3 install torch torchvision torchaudio", "stable,pip,macos,accnone,python": "pip3 install torch torchvision torchaudio", "stable,libtorch,macos,accnone,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported): https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.1.zip", "stable,libtorch,macos,cuda.x,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported): https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.1.zip", "stable,libtorch,macos,cuda.y,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported): https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.1.zip", "stable,libtorch,macos,cuda.z,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported): https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.1.zip", "stable,libtorch,macos,rocm5.x,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported): https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.1.zip", "stable,pip,windows,accnone,python": "pip3 install torch torchvision torchaudio", "stable,pip,windows,cuda.x,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118", "stable,pip,windows,cuda.y,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126", "stable,pip,windows,cuda.z,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128", "stable,pip,windows,rocm5.x,python": "NOTE: ROCm is not available on Windows", "stable,libtorch,windows,accnone,cplusplus": "Download here (Release version): https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-2.7.1%2Bcpu.zip Download here (Debug version): https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-debug-2.7.1%2Bcpu.zip", "stable,libtorch,windows,cuda.x,cplusplus": "Download here (Release version): https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-2.7.1%2Bcu118.zip Download here (Debug version): https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-debug-2.7.1%2Bcu118.zip", "stable,libtorch,windows,cuda.y,cplusplus": "Download here (Release version): https://download.pytorch.org/libtorch/cu126/libtorch-win-shared-with-deps-2.7.1%2Bcu126.zip Download here (Debug version): https://download.pytorch.org/libtorch/cu126/libtorch-win-shared-with-deps-debug-2.7.1%2Bcu126.zip", "stable,libtorch,windows,cuda.z,cplusplus": "Download here (Release version): https://download.pytorch.org/libtorch/cu128/libtorch-win-shared-with-deps-2.7.1%2Bcu128.zip Download here (Debug version): https://download.pytorch.org/libtorch/cu128/libtorch-win-shared-with-deps-debug-2.7.1%2Bcu128.zip", "stable,libtorch,windows,rocm5.x,cplusplus": "NOTE: ROCm is not available on Windows"};
if (!object.hasOwnProperty(key)) {
$("#command").html(
diff --git a/governing-board.html b/governing-board.html
index 55fe3c48246d..588aa3db0d42 100644
--- a/governing-board.html
+++ b/governing-board.html
@@ -56,6 +56,20 @@
Brian Granger, Amazon Web Services
+
+
+
+
+
+
Dwarak Rajagopal, Snowflake
+
Dwarak Rajagopal is a distinguished technology leader with over two decades of experience at some of the world’s most innovative companies, including Google, Meta, Uber, Apple, and AMD. As Vice President of AI Engineering at Snowflake, Dwarak oversees the AI and ML engineering organizations, helping drive innovation across Snowflake Cortex AI, Snowflake’s AI Research Team, and open source offerings. He previously served as Senior Engineering Director at Google, where he spearheaded the AI Frameworks and On-Device Machine Learning teams, driving cutting-edge advancements in AI and large language models (LLMs) to enhance efficiency across server and mobile platforms. Earlier in his career, Dwarak led Meta’s PyTorch Core Frameworks team, bridging the gap between AI research and production. His career also includes developing autonomous driving software at Uber, pioneering graphics technology at Apple, and shaping compiler innovations at AMD.
+
+
+ Dwarak is passionate about democratizing AI through open source initiatives and ensuring the ethical advancement of technology. His visionary leadership continues to shape the future of AI and empower global innovation.
+
+
- Includes support for NVIDIA Blackwell GPUs, CUDA 12.8 wheels for Linux x86 and arm64, Torch Function Modes in torch.compile, Mega Cache for portable compilation artifacts, and new FlexAttention features for LLM inference.
-
+ Ricardo is a seasoned technology leader with over two decades of experience across the enterprise and startup landscape. He works at Snowflake as a Cloud Infrastructure and Open Source Lead, focusing on automating AI/ML infrastructure using cloud-native technologies at scale. A passionate open source advocate, Ricardo also serves as a shadow member of the CNCF Technical Oversight Committee and CNCF AI Sub-project, where he helps shape the future of AI computing infrastructure.
+
+ Throughout his career, Ricardo has held key engineering and leadership roles at major companies such as Rakuten, Cisco, and VMware, as well as at innovative startups including Truera, Branch Metrics, Coupa, HyTrust, Exablox, and SnapLogic. He's committed to community-driven innovation and regularly contributes to industry initiatives that bridge the gap between open source communities and enterprise adoption.
+
+
+
+
@@ -103,6 +117,10 @@
Shauheen Zahirazami, Google
Yikun Jiang, Huawei
+
Yikun Jiang is principal software engineer from Huawei computing opensource development team, working on multi-arch, heterogeneous hardware support and improvement of projects in computing area. He has more than 10 years experience in computing area, and leads an active and creative team in R&D under the principle of “upstream first”, which aims to make diverse computing power ubiquitous.
+
+
+