Preliminary implementation of the inference engine for OpenAssistant.
The services of the inference stack are prefixed with "inference-" in the
unified compose descriptor.
Prior to building
those, please ensure that you have Docker's new
BuildKit backend enabled. See the
FAQ
for more info.
To build the services, run:
docker compose --profile inference build
Spin up the stack:
docker compose --profile inference up -d
Tail the logs:
docker compose logs -f \
inference-server \
inference-worker \
inference-text-client \
inference-text-generation-server
Attach to the text-client, and start chatting:
docker attach open-assistant-inference-text-client-1
Note: In the last step,
open-assistant-inference-text-client-1
refers to the name of thetext-client
container started in step 2.
Note: The compose file contains the bind mounts enabling you to develop on the modules of the inference stack, and the
oasst-shared
package, without rebuilding.
Note: You can spin up any number of workers by adjusting the number of replicas of the
inference-worker
service to your liking.
Note: Please wait for the
inference-text-generation-server
service to output{"message":"Connected"}
before starting to chat.
Run ./full-dev-setup.sh
to start the full development setup. Make sure to wait
until the 2nd terminal is ready and says {"message":"Connected"}
before
entering input into the last terminal.
Run a redis container (or use the one of the general docker compose file):
docker run --rm -it -p 6379:6379 redis
Run the inference server:
cd server
pip install -r requirements.txt
uvicorn main:app --reload
Run one (or more) workers:
cd worker
pip install -r requirements.txt
python __main__.py
For the worker, you'll also want to have the text-generation-inference server running:
docker run --rm -it -p 8001:80 -e MODEL_ID=distilgpt2 ykilcher/text-generation-inference
Run the client:
cd text-client
pip install -r requirements.txt
python __main__.py