Preliminary implementation of the inference engine for OpenAssistant.
The services of the inference stack are prefixed with "inference-" in the
unified compose descriptor.
Prior to building
those, please ensure that you have Docker's new
BuildKit backend enabled. See the
for more info.
To build the services, run:
docker compose --profile inference build
Spin up the stack:
docker compose --profile inference up -d
Tail the logs:
docker compose logs -f \
inference-server \
inference-worker \
inference-text-client \
Attach to the text-client, and start chatting:
docker attach open-assistant-inference-text-client-1
Note: In the last step,
refers to the name of thetext-client
container started in step 2.
Note: The compose file contains the bind mounts enabling you to develop on the modules of the inference stack, and the
package, without rebuilding.
Note: You can spin up any number of workers by adjusting the number of replicas of the
service to your liking.
Note: Please wait for the
service to output{"message":"Connected"}
before starting to chat.
Ensure you have tmux
installed on you machine and the following packages
installed into the Python environment;
You can run development setup to start the full development setup.
cd inference
Make sure to wait until the 2nd terminal is ready and says
before entering input into the last terminal.
Run a postgres container:
docker run --rm -it -p 5432:5432 -e POSTGRES_PASSWORD=postgres --name postgres postgres
Run a redis container (or use the one of the general docker compose file):
docker run --rm -it -p 6379:6379 --name redis redis
Run the inference server:
cd server
pip install -r requirements.txt
DEBUG_API_KEYS='0000,0001,0002' uvicorn main:app --reload
Run one (or more) workers:
cd worker
pip install -r requirements.txt
API_KEY=0000 python
# to add another worker, simply run
API_KEY=0001 python
For the worker, you'll also want to have the text-generation-inference server running:
docker run --rm -it -p 8001:80 -e MODEL_ID=distilgpt2 \
-v $HOME/.cache/huggingface:/root/.cache/huggingface \
--name text-generation-inference
Run the text client:
cd text-client
pip install -r requirements.txt
We run distributed load tests using the
Python package.
pip install locust
cd tests/locust
Navigate to to view the locust UI.