update inference readme (LAION-AI#2969)

Chuseuiti · jcardenes-via · olliestanley · web-flow · commit a18aa7088e31 · 2023-04-29T17:32:48.000+01:00
Fixed inference README with working version and deleted a lot of the
extra variants information to simplify the document, as they are
derivations of the same steps.

The motivation is that this container is not part of the docker compose,
neither exist in the existing repository in the main branch:
    inference-text-client

---------

Co-authored-by: jcardenes &lt;jcardenes@solvewithvia.com&gt;
Co-authored-by: Oliver Stanley &lt;olivergestanley@gmail.com&gt;
Co-authored-by: Rudd-O &lt;rudd-o@users.noreply.github.com&gt;
diff --git a/docker-compose.yaml b/docker-compose.yaml
@@ -5,6 +5,8 @@ services:
 
   # Use `docker compose --profile frontend-dev up --build --attach-dependencies` to start the services needed to work on the frontend. If you want to also run the inference, add a second `--profile inference` argument.
 
+  # If you update the containers used by the inference profile, please update inference/README.md. Thank you
+
   # The profile ci is used by CI automations. (i.e E2E testing)
 
   # This DB is for the FastAPI Backend.
diff --git a/inference/README.md b/inference/README.md
@@ -2,7 +2,10 @@
 
 # OpenAssistant Inference
 
-Preliminary implementation of the inference engine for OpenAssistant.
+Preliminary implementation of the inference engine for OpenAssistant. This is
+strictly for local development, although you might find limited success for your
+self-hosting OA plan. There is no warranty that this will not change in the
+future — in fact, expect it to change.
 
 ## Development Variant 1 (docker compose)
 
@@ -30,99 +33,31 @@ Tail the logs:
 ```shell
 docker compose logs -f    \
     inference-server      \
-    inference-worker      \
-    inference-text-client \
-    inference-text-generation-server
-```
-
-Attach to the text-client, and start chatting:
+    inference-worker
 
-```shell
-docker attach open-assistant-inference-text-client-1
 ```
 
-> **Note:** In the last step, `open-assistant-inference-text-client-1` refers to
-> the name of the `text-client` container started in step 2.
-
 > **Note:** The compose file contains the bind mounts enabling you to develop on
 > the modules of the inference stack, and the `oasst-shared` package, without
 > rebuilding.
 
+> **Note:** You can change the model by editing variable `MODEL_CONFIG_NAME` in
+> the `docker-compose.yaml` file. Valid model names can be found in
+> [model_configs.py](../oasst-shared/oasst_shared/model_configs.py).
+
 > **Note:** You can spin up any number of workers by adjusting the number of
 > replicas of the `inference-worker` service to your liking.
 
 > **Note:** Please wait for the `inference-text-generation-server` service to
 > output `{"message":"Connected"}` before starting to chat.
 
-## Development Variant 2 (tmux terminal multiplexing)
-
-Ensure you have `tmux` installed on you machine and the following packages
-installed into the Python environment;
-
-- `uvicorn`
-- `worker/requirements.txt`
-- `server/requirements.txt`
-- `text-client/requirements.txt`
-- `oasst_shared`
-
-You can run development setup to start the full development setup.
-
-```bash
-cd inference
-./full-dev-setup.sh
-```
-
-> Make sure to wait until the 2nd terminal is ready and says
-> `{"message":"Connected"}` before entering input into the last terminal.
-
-## Development Variant 3 (you'll need multiple terminals)
-
-Run a postgres container:
-
-```bash
-docker run --rm -it -p 5432:5432 -e POSTGRES_PASSWORD=postgres --name postgres postgres
-```
-
-Run a redis container (or use the one of the general docker compose file):
-
-```bash
-docker run --rm -it -p 6379:6379 --name redis redis
-```
-
-Run the inference server:
-
-```bash
-cd server
-pip install -r requirements.txt
-DEBUG_API_KEYS='0000,0001,0002' uvicorn main:app --reload
-```
-
-Run one (or more) workers:
-
-```bash
-cd worker
-pip install -r requirements.txt
-API_KEY=0000 python __main__.py
-
-# to add another worker, simply run
-API_KEY=0001 python __main__.py
-```
-
-For the worker, you'll also want to have the text-generation-inference server
-running:
-
-```bash
-docker run --rm -it -p 8001:80 -e MODEL_ID=distilgpt2 \
-    -v $HOME/.cache/huggingface:/root/.cache/huggingface \
-    --name text-generation-inference ghcr.io/yk/text-generation-inference
-```
-
-Run the text client:
+Run the text client and start chatting:
 
 ```bash
 cd text-client
 pip install -r requirements.txt
 python __main__.py
+# You'll soon see a `User:` prompt, where you can type your prompts.
 ```
 
 ## Distributed Testing