Skip to content

Commit aae4e81

Browse files
committedMar 11, 2020
Examples: Add example and benchmarks for the BulkIndexer helper
This patch adds an example for using the BulkIndexer helper into _examples/bulk/default.go, as well as end-to-end benchmarks into _examples/bulk/benchmarks. Related #137
1 parent 56b3816 commit aae4e81

16 files changed

+2447
-25
lines changed
 

‎_examples/bulk/.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
go.sum
2+
*_easyjson.go

‎_examples/bulk/Makefile

+18-3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,21 @@
11
GO_TEST_CMD = $(if $(shell which richgo),richgo test,go test)
22

3-
test: ## Run tests
4-
go run bulk.go
3+
test: test-default test-indexer
54

6-
.PHONY: test
5+
test-default:
6+
go run default.go
7+
8+
test-indexer:
9+
go run indexer.go
10+
11+
test-benchmarks: clean setup
12+
cd benchmarks && go run benchmarks.go
13+
14+
setup:
15+
@go get -u github.com/mailru/easyjson/...
16+
cd benchmarks && go generate ./model
17+
18+
clean:
19+
@rm -f benchmarks/model/*_easyjson.go
20+
21+
.PHONY: test test-default test-indexer test-benchmarks setup clean

‎_examples/bulk/README.md

+39-11
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Example: Bulk Indexing
22

3-
## `bulk.go`
3+
## `default.go`
44

5-
The [`bulk.go`](bulk.go) example demonstrates how to properly operate the Elasticsearch's
5+
The [`default.go`](default.go) example demonstrates how to properly operate the Elasticsearch's
66
[Bulk API]([https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html]).
77

88
The example intentionally doesn't use any abstractions or helper functions, to
@@ -17,13 +17,41 @@ demonstrate the low-level mechanics of working with the Bulk API:
1717
* printing a report.
1818

1919
```bash
20-
go run bulk.go -count=100000 -batch=25000
21-
22-
# > Generated 100000 articles
23-
# > Batch 1 of 4
24-
# > Batch 2 of 4
25-
# > Batch 3 of 4
26-
# > Batch 4 of 4
27-
# ================================================================================
28-
# Sucessfuly indexed [100000] documents in 8.02s (12469 docs/sec)
20+
go run default.go -count=100000 -batch=25000
21+
22+
# Bulk: documents [100,000] batch size [25,000]
23+
# ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
24+
# → Generated 100,000 articles
25+
# → Sending batch [1/4] [2/4] [3/4] [4/4]
26+
# ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
27+
# Sucessfuly indexed [100,000] documents in 3.423s (29,214 docs/sec)
2928
```
29+
30+
## `indexer.go`
31+
32+
The [`indexer.go`](indexer.go) example demonstrates how to use the [`esutil.BulkIndexer`](../esutil/bulk_indexer.go) helper for efficient indexing in parallel.
33+
34+
```bash
35+
go run indexer.go -count=100000 -flush=1000000
36+
37+
# BulkIndexer: documents [100,000] workers [8] flush [1.0 MB]
38+
# ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
39+
# → Generated 100,000 articles
40+
# ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
41+
# Sucessfuly indexed [100,000] documents in 1.909s (52,383 docs/sec)
42+
```
43+
44+
The helper allows you to `Add()` bulk indexer items, and flushes each batch based on the configured threshold.
45+
46+
```golang
47+
indexer, _ := esutil.NewBulkIndexer(esutil.BulkIndexerConfig{})
48+
indexer.Add(
49+
context.Background(),
50+
esutil.BulkIndexerItem{
51+
Action: "index",
52+
Body: strings.NewReader(`{"title":"Test"}`),
53+
})
54+
indexer.Close(context.Background())
55+
```
56+
57+
Please refer to the [`benchmarks`](benchmarks) folder for performance tests with different types of payload.

‎_examples/bulk/benchmarks/README.md

+124
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# Bulk Indexer Benchmarks
2+
3+
The [`benchmarks.go`](benchmarks.go) file executes end-to-end benchmarks for `esutil.NewBulkIndexer`. It allows to configure indexer parameters, index settings, number of runs. See `go run benchmarks.go --help` for an overview of configuration options:
4+
5+
```
6+
go run benchmarks.go --help
7+
-count int
8+
Number of documents to generate (default 100000)
9+
-dataset string
10+
Dataset to use for indexing (default "small")
11+
-debug
12+
Enable logging output
13+
-easyjson
14+
Use mailru/easyjson for JSON decoding
15+
-fasthttp
16+
Use valyala/fasthttp for HTTP transport
17+
-flush value
18+
Flush threshold in bytes (default 3MB)
19+
-index string
20+
Index name (default "test-bulk-benchmarks")
21+
-mockserver
22+
Measure added, not flushed items
23+
-replicas int
24+
Number of index replicas (default 0)
25+
-runs int
26+
Number of runs (default 10)
27+
-shards int
28+
Number of index shards (default 3)
29+
-wait duration
30+
Wait duration between runs (default 1s)
31+
-warmup int
32+
Number of warmup runs (default 3)
33+
-workers int
34+
Number of indexer workers (default 4)
35+
```
36+
37+
Before running the benchmarks, install `easyjson` and generate the auxiliary files:
38+
39+
```
40+
go mod download
41+
go get -u github.com/mailru/easyjson/...
42+
grep '~/go/bin' ~/.profile || echo 'export PATH=$PATH:~/go/bin' >> ~/.profile && source ~/.profile
43+
go generate -v ./model
44+
```
45+
46+
## Small Document
47+
48+
The [`small`](data/small/document.json) dataset uses a small document (126B).
49+
50+
```
51+
ELASTICSEARCH_URL=http://server:9200 go run benchmarks.go --dataset=small --count=1_000_000 --flush=2MB --shards=5 --replicas=0 --fasthttp=true --easyjson=true
52+
small: run [10x] warmup [3x] shards [5] replicas [0] workers [8] flush [2.0 MB] wait [1s] fasthttp easyjson
53+
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
54+
1) add=1M flush=1M fail=0 reqs=52 dur=3.58s 279,173 docs/sec
55+
2) add=1M flush=1M fail=0 reqs=52 dur=3.52s 284,090 docs/sec
56+
3) add=1M flush=1M fail=0 reqs=52 dur=3.45s 289,351 docs/sec
57+
4) add=1M flush=1M fail=0 reqs=52 dur=3.49s 286,123 docs/sec
58+
5) add=1M flush=1M fail=0 reqs=52 dur=3.47s 287,852 docs/sec
59+
6) add=1M flush=1M fail=0 reqs=52 dur=3.47s 288,184 docs/sec
60+
7) add=1M flush=1M fail=0 reqs=52 dur=3.54s 282,246 docs/sec
61+
8) add=1M flush=1M fail=0 reqs=52 dur=3.47s 288,101 docs/sec
62+
9) add=1M flush=1M fail=0 reqs=52 dur=3.54s 282,485 docs/sec
63+
10) add=1M flush=1M fail=0 reqs=52 dur=3.46s 288,350 docs/sec
64+
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
65+
docs/sec: min [279,173] max [289,351] mean [286,987]
66+
```
67+
68+
## HTTP Log Event
69+
70+
The [`httplog`](data/httplog/document.json) dataset uses a bigger document (2.5K), corresponding to a log event gathered by [Filebeat](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-module-nginx.html) from Nginx.
71+
72+
```
73+
ELASTICSEARCH_URL=http://server:9200 go run benchmarks.go --dataset=httplog --count=1_000_000 --flush=3MB --shards=5 --replicas=0 --fasthttp=true --easyjson=true
74+
httplog: run [10x] warmup [3x] shards [5] replicas [0] workers [8] flush [3.0 MB] wait [1s] fasthttp easyjson
75+
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
76+
1) add=1M flush=1M fail=0 reqs=649 dur=19.93s 50,165 docs/sec
77+
2) add=1M flush=1M fail=0 reqs=649 dur=18.84s 53,072 docs/sec
78+
3) add=1M flush=1M fail=0 reqs=649 dur=19.13s 52,249 docs/sec
79+
4) add=1M flush=1M fail=0 reqs=649 dur=19.26s 51,912 docs/sec
80+
5) add=1M flush=1M fail=0 reqs=649 dur=18.98s 52,662 docs/sec
81+
6) add=1M flush=1M fail=0 reqs=649 dur=19.21s 52,056 docs/sec
82+
7) add=1M flush=1M fail=0 reqs=649 dur=18.91s 52,865 docs/sec
83+
8) add=1M flush=1M fail=0 reqs=649 dur=19.25s 51,934 docs/sec
84+
9) add=1M flush=1M fail=0 reqs=649 dur=19.44s 51,440 docs/sec
85+
10) add=1M flush=1M fail=0 reqs=649 dur=19.24s 51,966 docs/sec
86+
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
87+
docs/sec: min [50,165] max [53,072] mean [52,011]
88+
```
89+
90+
## Mock Server
91+
92+
The `--mockserver` flag allows to run the benchmark against a "mock server", in this case Nginx, to understand a theoretical performance of the client, without the overhead of a real Elasticsearch cluster.
93+
94+
```
95+
ELASTICSEARCH_URL=http://server:8000 go run benchmarks.go --dataset=small --count=1_000_000 --flush=2MB --warmup=0 --mockserver
96+
small: run [10x] warmup [0x] shards [3] replicas [0] workers [8] flush [2.0 MB] wait [1s]
97+
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
98+
1) add=1M flush=0 fail=0 reqs=56 dur=810ms 1,222,493 docs/sec
99+
2) add=1M flush=0 fail=0 reqs=56 dur=810ms 1,230,012 docs/sec
100+
3) add=1M flush=0 fail=0 reqs=56 dur=790ms 1,251,564 docs/sec
101+
4) add=1M flush=0 fail=0 reqs=56 dur=840ms 1,187,648 docs/sec
102+
5) add=1M flush=0 fail=0 reqs=56 dur=800ms 1,237,623 docs/sec
103+
6) add=1M flush=0 fail=0 reqs=56 dur=800ms 1,237,623 docs/sec
104+
7) add=1M flush=0 fail=0 reqs=56 dur=800ms 1,240,694 docs/sec
105+
8) add=1M flush=0 fail=0 reqs=56 dur=820ms 1,216,545 docs/sec
106+
9) add=1M flush=0 fail=0 reqs=56 dur=790ms 1,253,132 docs/sec
107+
10) add=1M flush=0 fail=0 reqs=56 dur=810ms 1,223,990 docs/sec
108+
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
109+
docs/sec: min [1,187,648] max [1,253,132] mean [1,233,818]
110+
```
111+
112+
## Environment
113+
114+
Please note that these results are only illustrative, and the real performance depends on many factors:
115+
the size and structure of your data, the index settings and mappings, the cluster setup or the hardware specification.
116+
117+
The benchmarks have been run in the following environment:
118+
119+
* OS: Ubuntu 18.04.4 LTS (5.0.0-1031-gcp)
120+
* Client: A `n2-standard-8` [GCP instance](https://cloud.google.com/compute/docs/machine-types#n2_machine_types) (8 vCPUs/32GB RAM)
121+
* Server: A `n2-standard-16` [GCP instance](https://cloud.google.com/compute/docs/machine-types#n2_machine_types) (16 vCPUs/64GB RAM)
122+
* Disk: A [local SSD](https://cloud.google.com/compute/docs/disks#localssds) formatted as `ext4` on NVMe interface for Elasticsearch data
123+
* A single-node Elasticsearch cluster, `7.6.0`, [default distribution](https://www.elastic.co/downloads/elasticsearch), installed from a TAR, with 4GB locked for heap
124+
* Nginx 1.17.8 with [`nginx.conf`](etc/nginx.conf)

0 commit comments

Comments
 (0)
Please sign in to comment.