Skip to content

Conversation

rgerganov
Copy link
Collaborator

Update the README file to match the newly added functionality of exposing multiple devices from a single server.

Update the README file to match the newly added functionality of
exposing multiple devices from a single server.
@abc-nix
Copy link

abc-nix commented Oct 6, 2025

Could you add an "Advanced" example mentioning the use of llama-server with --tensor-split and/or --override-tensor options connecting to different RPC servers? I think a lot of people expect --gpu-layers to magically distribute layers automatically to each device based on its available memory, and don't understand why it fails when "it should fit".

Sorry if this is not in the scope of the RPC documentation.

@tommarques56
Copy link

Hi @rgerganov, I can run an automated high-severity-only LLM review on this PR and post a single focused inline comment. Reply with "approve" or add a comment saying "@tommarques56 approve" to proceed.

@abc-nix
Copy link

abc-nix commented Oct 6, 2025

OT: tommarques56, dude, this is a documentation PR. What the heck are you doing with your bot?

No matter your PhD, you should talk with the repo owner first before doing this shit. The ggml organization has an email you can contact. Or open a discussion.

@rgerganov
Copy link
Collaborator Author

Could you add an "Advanced" example mentioning the use of llama-server with --tensor-split and/or --override-tensor options connecting to different RPC servers? I think a lot of people expect --gpu-layers to magically distribute layers automatically to each device based on its available memory, and don't understand why it fails when "it should fit".

Good point, I have added a note about how weights are distributed by default and how to change this with --tensor-split

Co-authored-by: Diego Devesa <slarengh@gmail.com>
@rgerganov rgerganov merged commit c61ae20 into ggml-org:master Oct 7, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants