* **(Prototype) CUDA-support in RPC using TensorPipe** - This feature should bring consequent speed improvements for users of PyTorch RPC with multiple-GPU machines, as TensorPipe will automatically leverage NVLink when available, and avoid costly copies to and from host memory when exchanging GPU tensors between processes. When not on the same machine, TensorPipe will fall back to copying the tensor to host memory and sending it as a regular CPU tensor. This will also improve the user experience as users will be able to treat GPU tensors like regular CPU tensors in their code. Refer to this [documentation](https://pytorch.org/docs/1.8.0/rpc.html) for more details.
0 commit comments