Replies: 1 comment 2 replies
-
@segmond, if I am understanding you and the PR correctly, #16276 from two days ago achieves this. Sorry if I am misunderstanding you. Documentation coming soon #16441. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Right now, we have to spin up as many RPC servers as we have GPUs on a remote note. Can we have just one RPC server per node? Could going that route improve performance? I get the feeling that the overhead of talking to N servers for every token generated will be much higher than talking to 1. It means of course that the remote node will need more logic and data to compute everything across it's GPU units. What do you all think?
Beta Was this translation helpful? Give feedback.
All reactions