You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Sep 21, 2021. It is now read-only.
Copy file name to clipboardexpand all lines: 500_Cluster_Admin/20_health.asciidoc
+39-35
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,14 @@
1
1
2
2
=== Cluster Health
3
3
4
-
An Elasticsearch cluster may consist of a single node with a single index. Or it((("cluster health")))((("clusters", "administration", "Cluster Health API")))
5
-
may have a hundred data nodes, three dedicated masters, a few dozen client nodes--all operating on a thousand indices (and tens of thousands of shards).
4
+
An Elasticsearch cluster may consist of a single node with a single index. Or it
5
+
may have a hundred data nodes, three dedicated masters, a few dozen client
6
+
nodes--all operating on a thousand indices (and tens of thousands of shards).
6
7
7
8
No matter the scale of the cluster, you'll want a quick way to assess the status
8
-
of your cluster. The `Cluster Health` API fills that role. You can think of it
9
-
as a 10,000-foot view of your cluster. It can reassure you that everything
10
-
is all right, or alert you to a problem somewhere in your cluster.
9
+
of your cluster. The `Cluster Health` API fills that role. You can think of it
10
+
as a 10,000-foot view of your cluster. It can reassure you that everything is
11
+
all right, or alert you to a problem somewhere in your cluster.
11
12
12
13
Let's execute a `cluster-health` API and see what the response looks like:
13
14
@@ -45,7 +46,7 @@ operational.
45
46
46
47
`yellow`::
47
48
All primary shards are allocated, but at least one replica is missing.
48
-
No data is missing, so search results will still be complete. However, your
49
+
No data is missing, so search results will still be complete. However, your
49
50
high availability is compromised to some degree. If _more_ shards disappear, you
50
51
might lose data. Think of `yellow` as a warning that should prompt investigation.
51
52
@@ -66,10 +67,10 @@ includes replica shards.
66
67
one node to another node. This number is often zero, but can increase when
67
68
Elasticsearch decides a cluster is not properly balanced, a new node is added,
68
69
or a node is taken down, for example.
69
-
- `initializing_shards` is a count of shards that are being freshly created. For
70
+
- `initializing_shards` is a count of shards that are being freshly created. For
70
71
example, when you first create an index, the shards will all briefly reside in
71
72
`initializing` state. This is typically a transient event, and shards shouldn't
72
-
linger in `initializing` too long. You may also see initializing shards when a
73
+
linger in `initializing` too long. You may also see initializing shards when a
73
74
node is first restarted: as shards are loaded from disk, they start as `initializing`.
74
75
- `unassigned_shards` are shards that exist in the cluster state, but cannot be
75
76
found in the cluster itself. A common source of unassigned shards are unassigned
@@ -79,7 +80,7 @@ cluster is `red` (since primaries are missing).
79
80
80
81
==== Drilling Deeper: Finding Problematic Indices
81
82
82
-
Imagine something goes wrong one day,((("indices", "problematic, finding"))) and you notice that your cluster health
83
+
Imagine something goes wrong one day, and you notice that your cluster health
83
84
looks like this:
84
85
85
86
[source,js]
@@ -98,15 +99,15 @@ looks like this:
98
99
}
99
100
----
100
101
101
-
OK, so what can we deduce from this health status? Well, our cluster is `red`,
102
-
which means we are missing data (primary + replicas). We know our cluster has
103
-
10 nodes, but see only 8 data nodes listed in the health. Two of our nodes
104
-
have gone missing. We see that there are 20 unassigned shards.
102
+
OK, so what can we deduce from this health status? Well, our cluster is `red`,
103
+
which means we are missing data (primary + replicas). We know our cluster has 10
104
+
nodes, but see only 8 data nodes listed in the health. Two of our nodes have
105
+
gone missing. We see that there are 20 unassigned shards.
105
106
106
107
That's about all the information we can glean. The nature of those missing
107
108
shards are still a mystery. Are we missing 20 indices with 1 primary shard each?
108
109
Or 1 index with 20 primary shards? Or 10 indices with 1 primary + 1 replica?
109
-
Which index?
110
+
Which index?
110
111
111
112
To answer these questions, we need to ask `cluster-health` for a little more
112
113
information by using the `level` parameter:
@@ -183,40 +184,43 @@ The `level` parameter accepts one more option:
183
184
GET _cluster/health?level=shards
184
185
----
185
186
186
-
The `shards` option will provide a very verbose output, which lists the status
187
+
The `shards` option will provide a very verbose output, which lists the status
187
188
and location of every shard inside every index. This output is sometimes useful,
188
189
but because of the verbosity can be difficult to work with. Once you know the index
189
-
that is having problems, other APIs that we discuss in this chapter will tend
190
+
that is having problems, other APIs that we discuss in this chapter will tend
190
191
to be more helpful.
191
192
192
193
==== Blocking for Status Changes
193
194
194
195
The `cluster-health` API has another neat trick that is useful when building
195
196
unit and integration tests, or automated scripts that work with Elasticsearch.
196
-
You can specify a `wait_for_status` parameter, which will only return after the status is satisfied. For example:
197
+
You can specify a `wait_for_status` parameter, which will only return after the
198
+
status is satisfied. For example:
197
199
198
200
[source,bash]
199
201
----
200
202
GET _cluster/health?wait_for_status=green
201
203
----
202
204
203
-
This call will _block_ (not return control to your program) until the `cluster-health` has turned `green`, meaning all primary and replica shards have been allocated.
204
-
This is important for automated scripts and tests.
205
+
This call will _block_ (not return control to your program) until the
206
+
`cluster-health` has turned `green`, meaning all primary and replica shards have
207
+
been allocated. This is important for automated scripts and tests.
205
208
206
209
If you create an index, Elasticsearch must broadcast the change in cluster state
207
-
to all nodes. Those nodes must initialize those new shards, and then respond to the
208
-
master that the shards are `Started`. This process is fast, but because of network
209
-
latency may take 10–20ms.
210
-
211
-
If you have an automated script that (a) creates an index and then (b) immediately
212
-
attempts to index a document, this operation may fail, because the index has not
213
-
been fully initialized yet. The time between (a) and (b) will likely be less than 1ms--not nearly enough time to account for network latency.
214
-
215
-
Rather than sleeping, just have your script/test call `cluster-health` with
216
-
a `wait_for_status` parameter. As soon as the index is fully created, the `cluster-health` will change to `green`, the call will return control to your script, and you may
217
-
begin indexing.
218
-
219
-
Valid options are `green`, `yellow`, and `red`. The call will return when the
220
-
requested status (or one "higher") is reached. For example, if you request `yellow`,
221
-
a status change to `yellow` or `green` will unblock the call.
222
-
210
+
to all nodes. Those nodes must initialize those new shards, and then respond to
211
+
the master that the shards are `Started`. This process is fast, but because of
212
+
network latency may take 10–20ms.
213
+
214
+
If you have an automated script that (a) creates an index and then (b)
215
+
immediately attempts to index a document, this operation may fail, because the
216
+
index has not been fully initialized yet. The time between (a) and (b) will
217
+
likely be less than 1ms--not nearly enough time to account for network latency.
218
+
219
+
Rather than sleeping, just have your script/test call `cluster-health` with a
220
+
`wait_for_status` parameter. As soon as the index is fully created, the
221
+
`cluster-health` will change to `green`, the call will return control to your
222
+
script, and you may begin indexing.
223
+
224
+
Valid options are `green`, `yellow`, and `red`. The call will return when the
225
+
requested status (or one "higher") is reached. For example, if you request
226
+
`yellow`, a status change to `yellow` or `green` will unblock the call.
0 commit comments