-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Description
What happened + What you expected to happen
If the GCS crashes before a ServeHandle
's first request, the ServeHandle
cannot process any requests, even if there are replicas alive. However, if it crashes after the ServeHandle
's first request, the ServeHandle
can process requests as long as there are replicas alive.
I would expect the ServeHandle
to fulfill any requests even if the GCS crashes before its first request.
Note: this issue has fairly minor impact since it's unlikely that the GCS crashes before the ServeHandle
processes even one request.
Versions / Dependencies
Ray on the latest master
.
Reproduction script
Repro script:
# File name: repro.py
import os
import signal
import psutil
import ray
from ray import serve
gcs_dead = False
def kill_gcs():
print("Killing gcs...")
gcs_pid = None
for proc in psutil.process_iter():
if "gcs_server" in proc.name():
if gcs_pid is not None:
raise ValueError("Got two pids!")
gcs_pid = proc.pid
os.kill(gcs_pid, signal.SIGTERM)
global gcs_dead
gcs_dead = True
print(
"Killed gcs! If you did this before any requests succeeded, the "
"first request afterwards should hang."
)
ray.init()
@serve.deployment
class C():
def __call__(self, *args):
return os.getpid()
graph = C.bind()
serve.run(graph)
handle = serve.get_deployment("C").get_handle()
# Comment this out if you want to kill GCS later:
kill_gcs()
ref = handle.remote()
print("Issued request 1...")
result = ray.get(ref)
print(f"Request 1 succeeded. Got: {result}")
if not gcs_dead:
kill_gcs()
ref = handle.remote()
print("Issued request 2...")
result = ray.get(ref)
print(f"Request 2 succeeded. Got: {result}")
You can run the file like a regular Python script:
$ python repro.py
The file makes two requests to a deployment with one replica. There's two places where the file calls kill_gcs()
. If you leave the first kill_gcs()
call uncommented, the script hangs on the first request. If you comment it out, the script successfully fulfills both requests.
Issue Severity
Low: It annoys or frustrates me.