Skip to content

Commit 6716c77

Browse files
DOCSP-52468 -- Add Metrics page with runbook entry (#13416)
* DOCSP-52468 -- Add Metrics page with runbook entry * DOCSP-52468 -- WIP * DOCSP-52468 -- review revisions
1 parent e2d393f commit 6716c77

File tree

2 files changed

+100
-1
lines changed

2 files changed

+100
-1
lines changed

content/atlas-operator/upcoming/source/ak8so-get-started.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,4 +71,5 @@ To learn more, see :ref:`ak8so-compatibility-ref`.
7171
Independent Custom Resource Definitions </ak8so-independent-crd>
7272
Migrate Parameters to CRDs </migrate-parameter-to-resource>
7373
Compatibility </ak8so-compatibility>
74-
74+
Metrics </ak8so-metrics>
75+
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
.. _ak8so-metrics:
2+
3+
=============================================
4+
View Metrics and Troubleshoot Resource Issues
5+
=============================================
6+
7+
.. meta::
8+
:description: View and analyze performance metrics collected by Atlas Kubernetes Operator.
9+
10+
11+
.. default-domain:: mongodb
12+
13+
.. contents:: On this page
14+
:local:
15+
:backlinks: none
16+
:depth: 1
17+
:class: singlecol
18+
19+
View and analyze performance metrics
20+
====================================
21+
22+
The AKO binary exposes standard controller-runtime metrics on http://localhost:8080/metrics.
23+
There, you can find the following:
24+
25+
- Total number of reconciliation errors and successful reconciles per controller.
26+
- Length of reconcile queues per controller.
27+
- Reconciliation latency.
28+
- Standard resource metrics such as CPU, memory usage, and file descriptor usage.
29+
- Go runtime metrics such as the number of Go routines and GC duration.
30+
31+
To learn more, see `Controller Metrics <https://book-v1.book.kubebuilder.io/beyond_basics/controller_metrics>`__.
32+
33+
SRE Runbook
34+
===========
35+
36+
Resource Stuck in Reconciliation
37+
--------------------------------
38+
39+
Problem: Resource stuck in reconciliation
40+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
41+
42+
This problem occurs when the ``AtlasProject`` resource is not in a ``Ready`` state.
43+
It can occur with every |ak8so| resource type.
44+
45+
Symptoms
46+
`````````
47+
48+
- The resource is not in a ``Ready`` state.
49+
- A high error rate.
50+
51+
To monitor the error rate, you can create a query to calculate the
52+
reconciliation error rate for the ``AtlasProject`` controller as a percentage
53+
over the last minute. This metric helps in identifying and monitoring the
54+
health and stability of the ``AtlasProject`` controller. A high or rising
55+
error percentage indicates issues in the reconciliation process.
56+
57+
Example Query
58+
^^^^^^^^^^^^^
59+
60+
To calculate the error rate, use the following `Prometheus <https://prometheus.io/docs/introduction/overview/>`__ query:
61+
62+
.. code-block:: prometheus
63+
64+
100 * rate(controller_runtime_reconcile_errors_total{controller="AtlasProject"}[1m]) / rate(controller_runtime_reconcile_total{controller="AtlasProject"}[1m])
65+
66+
Status
67+
```````
68+
69+
Check the resource status condition for further details:
70+
71+
.. code-block:: yaml
72+
73+
status:
74+
conditions:
75+
- type: Ready
76+
status: "False"
77+
reason: ....
78+
79+
Action Items
80+
````````````
81+
82+
1. **Verify Resource Status:**
83+
84+
- Check the status condition message for more detailed information.
85+
- If the ``AtlasProject`` is not ready, proceed with the next troubleshooting steps.
86+
87+
2. **Check Connection Secret:**
88+
89+
- Ensure the connection secret referenced by ``spec.connectionSecretRef.name`` is correctly labeled with ``atlas.mongodb.com/type=credentials``.
90+
91+
3. **Investigate Logs:**
92+
93+
- Review logs for the ``AtlasProject`` controller for any potential errors or failed reconciliation attempts.
94+
95+
Additional Resources
96+
`````````````````````
97+
98+
- `AtlasProject resource <https://www.mongodb.com/docs/atlas/operator/upcoming/atlasproject-custom-resource/>`__

0 commit comments

Comments
 (0)