Skip to content
This repository was archived by the owner on Sep 21, 2021. It is now read-only.

Commit 2538cb0

Browse files
committed
Edited 510_Deployment/20_hardware.asciidoc with Atlas code editor
1 parent 5d90900 commit 2538cb0

File tree

1 file changed

+40
-41
lines changed

1 file changed

+40
-41
lines changed

510_Deployment/20_hardware.asciidoc

+40-41
Original file line numberDiff line numberDiff line change
@@ -2,117 +2,116 @@
22
=== Hardware
33

44
If you've been following the normal development path, you've probably been playing((("deployment", "hardware")))((("hardware")))
5-
with Elasticsearch on your laptop, or a small cluster of machines laying around.
5+
with Elasticsearch on your laptop or on a small cluster of machines laying around.
66
But when it comes time to deploy Elasticsearch to production, there are a few
77
recommendations that you should consider. Nothing is a hard-and-fast rule;
8-
Elasticsearch is used for a wide range of tasks and on bewildering array of
9-
machines. But they provide good starting points based on our experience with
10-
production clusters
8+
Elasticsearch is used for a wide range of tasks and on a bewildering array of
9+
machines. But these recommendations provide good starting points based on our experience with
10+
production clusters.
1111

1212
==== Memory
1313

1414
If there is one resource that you will run out of first, it will likely be memory.((("hardware", "memory")))((("memory")))
1515
Sorting and aggregations can both be memory hungry, so enough heap space to
16-
accommodate these are important. Even when the heap is comparatively small,
17-
extra memory can be given to the OS file system cache. Because many data structures
16+
accommodate these is important. Even when the heap is comparatively small,
17+
extra memory can be given to the OS filesystem cache. Because many data structures
1818
used by Lucene are disk-based formats, Elasticsearch leverages the OS cache to
1919
great effect.
2020

21-
A machine with 64gb of RAM is the ideal sweet-spot, but 32gb and 16gb machines
22-
are also very common. Less than 8gb tends to be counterproductive (you end up
23-
needing many, many small machines) and greater than 64gb has problems which we will
24-
discuss in <<heap-sizing>>
21+
A machine with 64GB of RAM is the ideal sweet spot, but 32GB and 16GB machines
22+
are also common. Less than 8GB tends to be counterproductive (you end up
23+
needing many, many small machines), and greater than 64GB has problems that we will
24+
discuss in <<heap-sizing>>.
2525

2626
==== CPUs
2727

2828
Most Elasticsearch deployments tend to be rather light on CPU requirements. As
2929
such,((("CPUs (central processing units)")))((("hardware", "CPUs"))) the exact processor setup matters less than the other resources. You should
30-
choose a modern processor with multiple cores. Common clusters utilize 2-8
30+
choose a modern processor with multiple cores. Common clusters utilize two to eight
3131
core machines.
3232

33-
If you need to choose between faster CPUs or more cores...choose more cores. The
33+
If you need to choose between faster CPUs or more cores, choose more cores. The
3434
extra concurrency that multiple cores offers will far outweigh a slightly faster
35-
clock-speed.
35+
clock speed.
3636

3737
==== Disks
3838

3939
Disks are important for all clusters,((("disks")))((("hardware", "disks"))) and doubly so for indexing-heavy clusters
4040
(such as those that ingest log data). Disks are the slowest subsystem in a server,
41-
which means that write-heavy clusters can easily saturate their disks which in
42-
turn becomes the bottleneck of the cluster.
41+
which means that write-heavy clusters can easily saturate their disks, which in
42+
turn become the bottleneck of the cluster.
4343

4444
If you can afford SSDs, they are by far superior to any spinning media. SSD-backed
4545
nodes see boosts in both query and indexing performance. If you can afford it,
4646
SSDs are the way to go.
4747

48-
.Check your IO Scheduler
48+
.Check Your I/O Scheduler
4949
****
50-
If you are using SSDs, make sure your OS I/O Scheduler is((("I/O scheduler"))) configured correctly.
51-
When you write data to disk, the I/O Scheduler decides when that data is
50+
If you are using SSDs, make sure your OS I/O scheduler is((("I/O scheduler"))) configured correctly.
51+
When you write data to disk, the I/O scheduler decides when that data is
5252
_actually_ sent to the disk. The default under most *nix distributions is a
5353
scheduler called `cfq` (Completely Fair Queuing).
5454
55-
This scheduler allocates "time slices" to each process, and then optimizes the
55+
This scheduler allocates _time slices_ to each process, and then optimizes the
5656
delivery of these various queues to the disk. It is optimized for spinning media:
5757
the nature of rotating platters means it is more efficient to write data to disk
5858
based on physical layout.
5959
60-
This is very inefficient for SSD, however, since there are no spinning platters
60+
This is inefficient for SSD, however, since there are no spinning platters
6161
involved. Instead, `deadline` or `noop` should be used instead. The deadline
62-
scheduler optimizes based on how long writes have been pending, while noop
62+
scheduler optimizes based on how long writes have been pending, while `noop`
6363
is just a simple FIFO queue.
6464
65-
This simple change can have dramatic impacts. We've seen a 500x improvement
65+
This simple change can have dramatic impacts. We've seen a 500-fold improvement
6666
to write throughput just by using the correct scheduler.
6767
****
6868

69-
If you use spinning media, try to obtain the fastest disks possible (high
70-
performance server disks 15k RPM drives).
69+
If you use spinning media, try to obtain the fastest disks possible (high-performance server disks, 15k RPM drives).
7170

7271
Using RAID 0 is an effective way to increase disk speed, for both spinning disks
7372
and SSD. There is no need to use mirroring or parity variants of RAID, since
74-
high-availability is built into Elasticsearch via replicas.
73+
high availability is built into Elasticsearch via replicas.
7574

76-
Finally, avoid network-attached storages (NAS). People routinely claim their
75+
Finally, avoid network-attached storage (NAS). People routinely claim their
7776
NAS solution is faster and more reliable than local drives. Despite these claims,
78-
we have never seen NAS live up to their hype. NAS are often slower, display
79-
larger latencies with a wider deviation in average latency, and are a single
77+
we have never seen NAS live up to its hype. NAS is often slower, displays
78+
larger latencies with a wider deviation in average latency, and is a single
8079
point of failure.
8180

8281
==== Network
8382

8483
A fast and reliable network is obviously important to performance in a distributed((("hardware", "network")))((("network")))
85-
system. Low latency helps assure that nodes can communicate easily, while
86-
high bandwidth helps shard movement and recovery. Modern datacenter networking
87-
(1gigE, 10gigE) is sufficient for the vast majority of clusters.
84+
system. Low latency helps ensure that nodes can communicate easily, while
85+
high bandwidth helps shard movement and recovery. Modern data-center networking
86+
(1GbE, 10GbE) is sufficient for the vast majority of clusters.
8887

89-
Avoid clusters that span multiple data-centers, even if the data-centers are
88+
Avoid clusters that span multiple data centers, even if the data centers are
9089
colocated in close proximity. Definitely avoid clusters that span large geographic
9190
distances.
9291

93-
Elasticsearch clusters assume that all nodes are equal...not that half the nodes
94-
are actually 150ms distant in another datacenter. Larger latencies tend to
92+
Elasticsearch clusters assume that all nodes are equal--not that half the nodes
93+
are actually 150ms distant in another data center. Larger latencies tend to
9594
exacerbate problems in distributed systems and make debugging and resolution
9695
more difficult.
9796

98-
Similar to the NAS argument, everyone claims their pipe between data-centers is
99-
robust and low latency. This is true...until it isn't (a network failure will
100-
happen eventually, you can count on it). From our experience, the hassle of
101-
managing cross-datacenter clusters is simply not worth the cost.
97+
Similar to the NAS argument, everyone claims that their pipe between data centers is
98+
robust and low latency. This is true--until it isn't (a network failure will
99+
happen eventually; you can count on it). From our experience, the hassle of
100+
managing cross - data center clusters is simply not worth the cost.
102101

103102
==== General Considerations
104103

105-
It is possible nowadays to obtain truly enormous machines.((("hardware", "general considerations"))) Hundreds of gigabytes
104+
It is possible nowadays to obtain truly enormous machines:((("hardware", "general considerations"))) hundreds of gigabytes
106105
of RAM with dozens of CPU cores. Conversely, it is also possible to spin up
107106
thousands of small virtual machines in cloud platforms such as EC2. Which
108107
approach is best?
109108

110-
In general, it is better to prefer "medium" to "large" boxes. Avoid small machines
109+
In general, it is better to prefer medium-to-large boxes. Avoid small machines,
111110
because you don't want to manage a cluster with a thousand nodes, and the overhead
112111
of simply running Elasticsearch is more apparent on such small boxes.
113112

114113
At the same time, avoid the truly enormous machines. They often lead to imbalanced
115-
resource usage (e.g. all the memory is being used, but none of the CPU) and can
114+
resource usage (for example, all the memory is being used, but none of the CPU) and can
116115
add logistical complexity if you have to run multiple nodes per machine.
117116

118117

0 commit comments

Comments
 (0)