Skip to content

Distributed and federated modes should be just one networked mode #7083

@Expro

Description

@Expro

Is your feature request related to a problem? Please describe.

Currently, there are 2 separate networked modes - federated (load balanced) and distributed (llama-cpp distributed inference). This split is quite wasteful, as it forces user to setup separate load balanced cluster for non-llama-cpp workloads for HA, and then separate cluster for distributed inference for llama-cpp. Another issue is that switching between federated and distributed modes requires restart of entire cluster - it's startup defined.

Describe the solution you'd like

There should be just one networked mode that by default does load balancing for workloads, with toggle switching llama-cpp backend between load balancing and distributed inferencing.

This way:

  • GPUs can be shared between backends even when llama-cpp is in distributed mode
  • llama-cpp mode can be changed without restarting all nodes
  • there is only one configuration path for networked mode - currently there are separate instructions for federated and distributed modes (and they are outdated)

I'm putting aside issue that currently distributed mode is broken and doesn't work at all.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions