-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Open
Labels
community-backlogenhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilityllmperformanceserveRay Serve Related IssueRay Serve Related IssuetriageNeeds triage (eg: priority, bug/not-bug, and owning component)Needs triage (eg: priority, bug/not-bug, and owning component)
Description
Description
This can serve as a tracking issue for the state of TPU support in Ray LLM.
Use case
A high-level overview of what we'd like to document support for is as follows. Some of these use cases may already be supported, but require more testing and some will require changes in Ray LLM to enable.
- vLLM inference with single-host, single-slice TPU and pipeline/tensor parallelism
- multi-host and/or multi-slice scheduling support with Ray LLM for large models
- multi-host TPU inference with pipeline/tensor parallelism
- expert parallelism with TPU
- batch inference with Ray LLM and Ray Data on TPU
Metadata
Metadata
Assignees
Labels
community-backlogenhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilityllmperformanceserveRay Serve Related IssueRay Serve Related IssuetriageNeeds triage (eg: priority, bug/not-bug, and owning component)Needs triage (eg: priority, bug/not-bug, and owning component)