Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add liveness and readiness probes to API deployment #256

Merged
merged 5 commits into from
Apr 17, 2020

Conversation

droctothorpe
Copy link
Contributor

Idiomatically, production pods should leverage liveness probes. Liveness probes ensure that pods are restarted if the application is unresponsive but the main process is still running.

For pods running in production, you should always define a liveness probe. Without one, Kubernetes has no way of knowing whether your app is still alive or not. As long as the process is still running, Kubernetes will consider the container to be healthy.
-- From Kubernetes in Action by Marko Luksa

Dask Gateway Server includes a health check endpoint, but kubelet isn't checking it.

This PR updates the Helm deployment for Dask Gateway Server to include liveness and readiness probes.

I tested this on my local machine using Docker Desktop Kubernetes and confirmed that everything works.

Major kudos to @suchitpandya for identifying this discrepancy.

Copy link
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @droctothorpe! I added the health check route, but completely forgot to add it to the chart. Good catch.

@droctothorpe
Copy link
Contributor Author

Happy to contribute in whatever small way I can. Updated per your specifications and validated:
image

@jcrist
Copy link
Member

jcrist commented Apr 17, 2020

LGTM. Thanks!

@jcrist jcrist merged commit a2a6b82 into dask:master Apr 17, 2020
@droctothorpe droctothorpe deleted the probes branch April 17, 2020 19:15
@dhirschfeld
Copy link

I'm not an expert (by any means) but I've read that you have to be cautious with using liveness probes:
https://srcco.de/posts/kubernetes-liveness-probes-are-dangerous.html

Not sure if it's relevant here (prob depends on the implementation of the /api/health endpoint), but perhaps worth keeping in mind...

@jcrist
Copy link
Member

jcrist commented Apr 18, 2020

Yeah, the important thing is that checks for the liveness probe don't rely on any downstream services, they only check the health of the pod itself. Otherwise you run the risk of cascading failures. We're fine here in this regard.

initialDelaySeconds: {{ .Values.gateway.livenessProbe.initialDelaySeconds }}
periodSeconds: {{ .Values.gateway.livenessProbe.periodSeconds }}
timeoutSeconds: {{ .Values.gateway.livenessProbe.timeoutSeconds }}
failureThreshold: {{ .Values.gateway.livenessProbe.failureThreshold }}
Copy link
Collaborator

@consideRatio consideRatio Apr 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reduce the issues of working with Helm's templates, I suggest opting for doing something more like this, and moving httpGet to values.yaml, removing enabled: true in favor of letting it being declared or not indicate if it should be included or not.

Hmm... Or maybe not if switching it on and off is a important matter to make it easy. But I wouldn't say it is.

If everything is to be made confiugrable in a Helm chart, it becomes messy otherwise, and if not, it becomes a big driver of new issues from users.

{{- with .Values.gateway.livenessProbe }}
livenessProbe:
  {{- .Values.gateway.livenessProbe | toYaml | nindent 12 }}
{{- end }}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants