Skip to content

[Ray LLM] Support TPUs in Ray Serve LLM #57137

@ryanaoleary

Description

@ryanaoleary

Description

This can serve as a tracking issue for the state of TPU support in Ray LLM.

Use case

A high-level overview of what we'd like to document support for is as follows. Some of these use cases may already be supported, but require more testing and some will require changes in Ray LLM to enable.

  • vLLM inference with single-host, single-slice TPU and pipeline/tensor parallelism
  • multi-host and/or multi-slice scheduling support with Ray LLM for large models
  • multi-host TPU inference with pipeline/tensor parallelism
  • expert parallelism with TPU
  • batch inference with Ray LLM and Ray Data on TPU

Metadata

Metadata

Assignees

No one assigned

    Labels

    community-backlogenhancementRequest for new feature and/or capabilityllmperformanceserveRay Serve Related IssuetriageNeeds triage (eg: priority, bug/not-bug, and owning component)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions