[Ray LLM] Support TPUs in Ray Serve LLM

### Description

This can serve as a tracking issue for the state of TPU support in Ray LLM.

### Use case

A high-level overview of what we'd like to document support for is as follows. Some of these use cases may already be supported, but require more testing and some will require changes in Ray LLM to enable.

- [ ] vLLM inference with single-host, single-slice TPU and pipeline/tensor parallelism
- [ ] multi-host and/or multi-slice scheduling support with Ray LLM for large models
- [ ] multi-host TPU inference with pipeline/tensor parallelism
- [ ] expert parallelism with TPU
- [ ] batch inference with Ray LLM and Ray Data on TPU

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Ray LLM] Support TPUs in Ray Serve LLM #57137

Description

Use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Ray LLM] Support TPUs in Ray Serve LLM #57137

Description

Description

Use case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions