Add liveness and readiness probes to API deployment #256

droctothorpe · 2020-04-17T15:35:36Z

Idiomatically, production pods should leverage liveness probes. Liveness probes ensure that pods are restarted if the application is unresponsive but the main process is still running.

For pods running in production, you should always define a liveness probe. Without one, Kubernetes has no way of knowing whether your app is still alive or not. As long as the process is still running, Kubernetes will consider the container to be healthy.
-- From Kubernetes in Action by Marko Luksa

Dask Gateway Server includes a health check endpoint, but kubelet isn't checking it.

This PR updates the Helm deployment for Dask Gateway Server to include liveness and readiness probes.

I tested this on my local machine using Docker Desktop Kubernetes and confirmed that everything works.

Major kudos to @suchitpandya for identifying this discrepancy.

resources/helm/dask-gateway/templates/gateway/deployment.yaml

jcrist

Thanks @droctothorpe! I added the health check route, but completely forgot to add it to the chart. Good catch.

resources/helm/dask-gateway/templates/gateway/deployment.yaml

droctothorpe · 2020-04-17T16:30:57Z

Happy to contribute in whatever small way I can. Updated per your specifications and validated:

resources/helm/dask-gateway/values.yaml

jcrist · 2020-04-17T18:26:22Z

LGTM. Thanks!

dhirschfeld · 2020-04-18T02:58:58Z

I'm not an expert (by any means) but I've read that you have to be cautious with using liveness probes:
https://srcco.de/posts/kubernetes-liveness-probes-are-dangerous.html

Not sure if it's relevant here (prob depends on the implementation of the /api/health endpoint), but perhaps worth keeping in mind...

jcrist · 2020-04-18T04:11:16Z

Yeah, the important thing is that checks for the liveness probe don't rely on any downstream services, they only check the health of the pod itself. Otherwise you run the risk of cascading failures. We're fine here in this regard.

consideRatio · 2020-04-21T07:44:08Z

resources/helm/dask-gateway/templates/gateway/deployment.yaml

+            initialDelaySeconds: {{ .Values.gateway.livenessProbe.initialDelaySeconds }}
+            periodSeconds: {{ .Values.gateway.livenessProbe.periodSeconds }}
+            timeoutSeconds: {{ .Values.gateway.livenessProbe.timeoutSeconds }}
+            failureThreshold: {{ .Values.gateway.livenessProbe.failureThreshold }}


To reduce the issues of working with Helm's templates, I suggest opting for doing something more like this, and moving httpGet to values.yaml, removing enabled: true in favor of letting it being declared or not indicate if it should be included or not.

Hmm... Or maybe not if switching it on and off is a important matter to make it easy. But I wouldn't say it is.

If everything is to be made confiugrable in a Helm chart, it becomes messy otherwise, and if not, it becomes a big driver of new issues from users.

{{- with .Values.gateway.livenessProbe }} livenessProbe: {{- .Values.gateway.livenessProbe | toYaml | nindent 12 }} {{- end }}

Add liveness and readiness probes to API deployment

5284c86

droctothorpe commented Apr 17, 2020

View reviewed changes

resources/helm/dask-gateway/templates/gateway/deployment.yaml Outdated Show resolved Hide resolved

jcrist reviewed Apr 17, 2020

View reviewed changes

resources/helm/dask-gateway/templates/gateway/deployment.yaml Outdated Show resolved Hide resolved

Alexander Perlman added 3 commits April 17, 2020 12:26

Move probe configuration to values file

cc522c0

Fix typo

7ca4027

Fix typo

2b1bbe8

jcrist reviewed Apr 17, 2020

View reviewed changes

resources/helm/dask-gateway/values.yaml Outdated Show resolved Hide resolved

Increase failure threshold

b0f3f27

jcrist merged commit a2a6b82 into dask:master Apr 17, 2020

droctothorpe deleted the probes branch April 17, 2020 19:15

consideRatio reviewed Apr 21, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add liveness and readiness probes to API deployment #256

Add liveness and readiness probes to API deployment #256

droctothorpe commented Apr 17, 2020

jcrist left a comment

droctothorpe commented Apr 17, 2020

jcrist commented Apr 17, 2020

dhirschfeld commented Apr 18, 2020

jcrist commented Apr 18, 2020

consideRatio Apr 21, 2020 •

edited

Loading

Add liveness and readiness probes to API deployment #256

Add liveness and readiness probes to API deployment #256

Conversation

droctothorpe commented Apr 17, 2020

jcrist left a comment

Choose a reason for hiding this comment

droctothorpe commented Apr 17, 2020

jcrist commented Apr 17, 2020

dhirschfeld commented Apr 18, 2020

jcrist commented Apr 18, 2020

consideRatio Apr 21, 2020 • edited Loading

Choose a reason for hiding this comment

consideRatio Apr 21, 2020 •

edited

Loading