|
| 1 | +.. _ak8so-metrics: |
| 2 | + |
| 3 | +============================================= |
| 4 | +View Metrics and Troubleshoot Resource Issues |
| 5 | +============================================= |
| 6 | + |
| 7 | +.. meta:: |
| 8 | + :description: View and analyze performance metrics collected by Atlas Kubernetes Operator. |
| 9 | + |
| 10 | + |
| 11 | +.. default-domain:: mongodb |
| 12 | + |
| 13 | +.. contents:: On this page |
| 14 | + :local: |
| 15 | + :backlinks: none |
| 16 | + :depth: 1 |
| 17 | + :class: singlecol |
| 18 | + |
| 19 | +View and analyze performance metrics |
| 20 | +==================================== |
| 21 | + |
| 22 | +The AKO binary exposes standard controller-runtime metrics on http://localhost:8080/metrics. |
| 23 | +There, you can find the following: |
| 24 | + |
| 25 | +- Total number of reconciliation errors and successful reconciles per controller. |
| 26 | +- Length of reconcile queues per controller. |
| 27 | +- Reconciliation latency. |
| 28 | +- Standard resource metrics such as CPU, memory usage, and file descriptor usage. |
| 29 | +- Go runtime metrics such as the number of Go routines and GC duration. |
| 30 | + |
| 31 | +To learn more, see `Controller Metrics <https://book-v1.book.kubebuilder.io/beyond_basics/controller_metrics>`__. |
| 32 | + |
| 33 | +SRE Runbook |
| 34 | +=========== |
| 35 | + |
| 36 | +Resource Stuck in Reconciliation |
| 37 | +-------------------------------- |
| 38 | + |
| 39 | +Problem: Resource stuck in reconciliation |
| 40 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 41 | + |
| 42 | +This problem occurs when the ``AtlasProject`` resource is not in a ``Ready`` state. |
| 43 | +It can occur with every |ak8so| resource type. |
| 44 | + |
| 45 | +Symptoms |
| 46 | +````````` |
| 47 | + |
| 48 | +- The resource is not in a ``Ready`` state. |
| 49 | +- A high error rate. |
| 50 | + |
| 51 | +To monitor the error rate, you can create a query to calculate the |
| 52 | +reconciliation error rate for the ``AtlasProject`` controller as a percentage |
| 53 | +over the last minute. This metric helps in identifying and monitoring the |
| 54 | +health and stability of the ``AtlasProject`` controller. A high or rising |
| 55 | +error percentage indicates issues in the reconciliation process. |
| 56 | + |
| 57 | +Example Query |
| 58 | +^^^^^^^^^^^^^ |
| 59 | + |
| 60 | +To calculate the error rate, use the following `Prometheus <https://prometheus.io/docs/introduction/overview/>`__ query: |
| 61 | + |
| 62 | +.. code-block:: prometheus |
| 63 | + |
| 64 | + 100 * rate(controller_runtime_reconcile_errors_total{controller="AtlasProject"}[1m]) / rate(controller_runtime_reconcile_total{controller="AtlasProject"}[1m]) |
| 65 | + |
| 66 | +Status |
| 67 | +``````` |
| 68 | + |
| 69 | +Check the resource status condition for further details: |
| 70 | + |
| 71 | +.. code-block:: yaml |
| 72 | + |
| 73 | + status: |
| 74 | + conditions: |
| 75 | + - type: Ready |
| 76 | + status: "False" |
| 77 | + reason: .... |
| 78 | + |
| 79 | +Action Items |
| 80 | +```````````` |
| 81 | + |
| 82 | +1. **Verify Resource Status:** |
| 83 | + |
| 84 | + - Check the status condition message for more detailed information. |
| 85 | + - If the ``AtlasProject`` is not ready, proceed with the next troubleshooting steps. |
| 86 | + |
| 87 | +2. **Check Connection Secret:** |
| 88 | + |
| 89 | + - Ensure the connection secret referenced by ``spec.connectionSecretRef.name`` is correctly labeled with ``atlas.mongodb.com/type=credentials``. |
| 90 | + |
| 91 | +3. **Investigate Logs:** |
| 92 | + |
| 93 | + - Review logs for the ``AtlasProject`` controller for any potential errors or failed reconciliation attempts. |
| 94 | + |
| 95 | +Additional Resources |
| 96 | +````````````````````` |
| 97 | + |
| 98 | +- `AtlasProject resource <https://www.mongodb.com/docs/atlas/operator/upcoming/atlasproject-custom-resource/>`__ |
0 commit comments