-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Description
Is your feature request related to a problem? Please describe.
Current Pytorch backend uses libtorch, for which Pytorch community no longer actively develop / maintain. Experimental 2.0 backend uses python, which:
- creates a separate process for each model instances and cannot share cuda contexts, using a lot of GPU memory and making the management of GPU memory difficult.
- is slow due to python
Describe the solution you'd like
A clear and concise description of what you want to happen.
Make pytorch backend support AOTInductor. This supports cpp inferences and is the recommended way from pytorch community.
https://docs.pytorch.org/docs/2.8/torch.compiler_aot_inductor.html
Metadata
Metadata
Assignees
Labels
No labels