diff --git a/docs/architecture/autoscaling.md b/docs/architecture/autoscaling.md index 74d30b8e..8156937a 100644 --- a/docs/architecture/autoscaling.md +++ b/docs/architecture/autoscaling.md @@ -282,6 +282,25 @@ The latency between accepting a request for an unavailable function and serving That shouldn't happen, providing that you've set an adequate value for the idle detection for your function. But if it does, the OpenFaaS watchdog and our official function templates will allow a graceful termination of the function. See also: [Improving long-running jobs for OpenFaaS users](https://www.openfaas.com/blog/long-running-jobs/) +## Smoothing out scaling down with a stable window + +The `com.openfaas.scale.down.window` label can be set with a Go duration up to a maximum of `5m` or `300s`. When set, the autoscaler will record recommendations on each cycle, and only scale down a function to the highest recorded recommendation of replicas. + +![Example of a stable window](/images/stable-window.png) +> There is variable load every 2.5 minutes, however the autoscaler does not scale down due to the stable window picking the highest recommendation over the past 5 minutes. + +For example, a function receives a peak in traffic and scales to 10 replicas. The recommendations built up may include 8, 8, 7, 6, 5, 4, 5, 5 replicas, in this case, even if the autoscaler would pick something as low as 2 replicas, based upon the current load, it will only be allowed to scale down to 8 replicas. Once scale down window moves along, and the maximum recommendation decreases, then the value will eventually land on something that matches the current load being received. + +In the above scenario, if you were to turn on verbose autoscaling, you'd have seen the following log message, showing the traffic demands 2x replicas, however the stable window is smoothing the decrease out. + +``` +2024/08/05 15:16:25 [Scaler] cows.openfaas-fn 10 => 2 (want: 8) +``` + +The purpose of this option is to slow down the rate of scaling down, when a function receives variable traffic over a relatively long period of time. + +Scaling up, and scale to zero are unaffected, by default this setting is turned off. + ## Legacy scaling for the Community Edition (CE) !!! warning "Legacy scaling for the Community Edition (CE)" diff --git a/docs/architecture/classic-watchdog.jpg b/docs/architecture/classic-watchdog.jpg new file mode 100644 index 00000000..36647c43 Binary files /dev/null and b/docs/architecture/classic-watchdog.jpg differ diff --git a/docs/architecture/metrics.md b/docs/architecture/metrics.md index bff07c25..646dd509 100644 --- a/docs/architecture/metrics.md +++ b/docs/architecture/metrics.md @@ -9,12 +9,16 @@ There are two main uses for the built-in Prometheus server: 1. To power scale to zero, and the horizontal Pod autoscaler. 2. To provide basic metrics to end-users, and to power the Grafana dashboards offered to OpenFaaS Standard customers. +### Viewing metrics + +See the various [Grafana dashboards](/openfaas-pro/grafana-dashboards) curated by our team. + ### Long term retention of metrics -* There is no persistence by default, so restarting the Prometheus Pod will reset all metrics. This is as designed, since the metrics are collected for autoscaling primarily. -* The default retention period is 15 days, so anything older than that will no longer be visible. This is as designed, and will allow for SRE/DevOps work and active monitoring. +* There is no persistence in the Prometheus Pod, so restarting the Prometheus will remove all historic metrics. This is as designed, since the metrics are collected for autoscaling and short-term monitoring. +* The default retention period is 15 days, so anything older than that will no longer be visible. This is as designed, however the Helm chart does offer a way to modify this if disk space is becoming an issue, or you need to retain metrics for slightly longer. -What if you would like to enable long-term retention? +What if you would like to enable long-term retention of Prometheus metrics? Our recommendation is *not to* try to re-configure or alter the built-in Prometheus server, but to deploy your own, and to scrape the internal one via [Prometheus Federation](https://prometheus.io/docs/prometheus/latest/federation/). @@ -49,6 +53,16 @@ Advanced metrics for OpenFaaS Pro users: The `http_request*` metrics record the latency and statistics of `/system/*` routes to monitor the OpenFaaS gateway and its provider. The `/async-function` route is also recorded in these metrics to observe asynchronous ingestion rate and latency. +Additional metrics from the Operator: + +| Metric | Type | Description | Labels | Edition | +| ----------------------------------- | ---------- | ----------------------------------- | -------------------------- |--------------------| +| `faasnetes_scale_total` | counter | Number of times a function has been scaled (ignoring requests where current and desired replicas are equal) | `function_name`, `status` | Pro Edition | +| `faasnetes_sync_handler_gauge` | gauge | Number of reconciliation functions running at given time | `status` | Pro Edition | +| `faasnetes_sync_handler_histogram` | histogram | Time taken to reconcile function Custom Resources into Kubernetes objects | `status` | Pro Edition | + +The `faasnetes_scale_total` metric is useful for tracking the number of times a function has been scaled up or down. The `faasnetes_sync_handler_gauge` and `faasnetes_sync_handler_histogram` metrics are useful for tracking the amount of time spent reconciling function Custom Resources into Kubernetes objects in large deployments of OpenFaaS. + ## CPU & RAM usage/consumption CPU & RAM usage/consumption metrics are available for OpenFaaS Pro users via Prometheus and the OpenFaaS REST API, OpenFaaS Pro Dashboard and OpenFaaS CLI via `faas-cli describe`. diff --git a/docs/architecture/watchdog.md b/docs/architecture/watchdog.md index 120879c7..be30f76a 100644 --- a/docs/architecture/watchdog.md +++ b/docs/architecture/watchdog.md @@ -8,7 +8,7 @@ The watchdog becomes an "init process" with an embedded HTTP server written in G The classic watchdog has historically been used for all of the official OpenFaaS templates, but the of-watchdog (mentioned below) is now becoming more popular and templates exist for both watchdogs for the common programming languages in the default [templates repository](https://github.com/openfaas/templates) and [community template store](https://github.com/openfaas/store/blob/master/templates.json). - + *Pictured: technical conceptual diagram of the OpenFaaS watchdog during an invocation* diff --git a/docs/images/stable-window.png b/docs/images/stable-window.png new file mode 100644 index 00000000..74418291 Binary files /dev/null and b/docs/images/stable-window.png differ diff --git a/docs/openfaas-pro/builder.md b/docs/openfaas-pro/builder.md index 9c6958a3..dde6b012 100644 --- a/docs/openfaas-pro/builder.md +++ b/docs/openfaas-pro/builder.md @@ -84,6 +84,14 @@ faas-cli publish --remote-builder http://127.0.0.1:8081/build \ --payload-secret $HOME/.openfaas/payload.txt ``` +The `--platforms` flag also works for cross-compilation, or multi-arch builds: + +```bash +faas-cli publish --remote-builder http:// + --platforms "linux/amd64,linux/arm64" \ + --payload-secret $HOME/.openfaas/payload.txt +``` + To deploy the image that you've just built: ```bash @@ -271,6 +279,8 @@ You may need to enable build arguments for the Dockerfile, these can be passed t You may wish to cross-compile a function to run on an arm64 host, if so, you can provide a `platform` key in the configuration file. +You will need to make sure your Dockerfile uses the proper syntax, the official templates are a good reference if you need guidance, otherwise reach out to our team if you get stuck. + The below will build an image for arm64 only and must be deployed only to an arm64 host using OpenFaaS Profiles to ensure it is scheduled correctly to an arm64 host. ```json diff --git a/mkdocs.yml b/mkdocs.yml index 1b3b2dca..70805cef 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -190,12 +190,12 @@ nav: - Profiles: ./reference/profiles.md - Design & Architecture: - Invocations: ./architecture/invocations.md - - Production: ./architecture/production.md - - Stack: ./architecture/stack.md + - Autoscaling: ./architecture/autoscaling.md - Gateway: ./architecture/gateway.md - Watchdog: ./architecture/watchdog.md - - Autoscaling: ./architecture/autoscaling.md - Metrics: ./architecture/metrics.md + - Stack: ./architecture/stack.md + - Production: ./architecture/production.md - Performance: ./architecture/performance.md - FaaS Provider: ./architecture/faas-provider.md - Logs Provider: ./architecture/logs-provider.md