|
| 1 | +# Bulk Indexer Benchmarks |
| 2 | + |
| 3 | +The [`benchmarks.go`](benchmarks.go) file executes end-to-end benchmarks for `esutil.NewBulkIndexer`. It allows to configure indexer parameters, index settings, number of runs. See `go run benchmarks.go --help` for an overview of configuration options: |
| 4 | + |
| 5 | +``` |
| 6 | +go run benchmarks.go --help |
| 7 | + -count int |
| 8 | + Number of documents to generate (default 100000) |
| 9 | + -dataset string |
| 10 | + Dataset to use for indexing (default "small") |
| 11 | + -debug |
| 12 | + Enable logging output |
| 13 | + -easyjson |
| 14 | + Use mailru/easyjson for JSON decoding |
| 15 | + -fasthttp |
| 16 | + Use valyala/fasthttp for HTTP transport |
| 17 | + -flush value |
| 18 | + Flush threshold in bytes (default 3MB) |
| 19 | + -index string |
| 20 | + Index name (default "test-bulk-benchmarks") |
| 21 | + -mockserver |
| 22 | + Measure added, not flushed items |
| 23 | + -replicas int |
| 24 | + Number of index replicas (default 0) |
| 25 | + -runs int |
| 26 | + Number of runs (default 10) |
| 27 | + -shards int |
| 28 | + Number of index shards (default 3) |
| 29 | + -wait duration |
| 30 | + Wait duration between runs (default 1s) |
| 31 | + -warmup int |
| 32 | + Number of warmup runs (default 3) |
| 33 | + -workers int |
| 34 | + Number of indexer workers (default 4) |
| 35 | +``` |
| 36 | + |
| 37 | +Before running the benchmarks, install `easyjson` and generate the auxiliary files: |
| 38 | + |
| 39 | +``` |
| 40 | +go mod download |
| 41 | +go get -u github.com/mailru/easyjson/... |
| 42 | +grep '~/go/bin' ~/.profile || echo 'export PATH=$PATH:~/go/bin' >> ~/.profile && source ~/.profile |
| 43 | +go generate -v ./model |
| 44 | +``` |
| 45 | + |
| 46 | +## Small Document |
| 47 | + |
| 48 | +The [`small`](data/small/document.json) dataset uses a small document (126B). |
| 49 | + |
| 50 | +``` |
| 51 | +ELASTICSEARCH_URL=http://server:9200 go run benchmarks.go --dataset=small --count=1_000_000 --flush=2MB --shards=5 --replicas=0 --fasthttp=true --easyjson=true |
| 52 | +small: run [10x] warmup [3x] shards [5] replicas [0] workers [8] flush [2.0 MB] wait [1s] fasthttp easyjson |
| 53 | +▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ |
| 54 | + 1) add=1M flush=1M fail=0 reqs=52 dur=3.58s 279,173 docs/sec |
| 55 | + 2) add=1M flush=1M fail=0 reqs=52 dur=3.52s 284,090 docs/sec |
| 56 | + 3) add=1M flush=1M fail=0 reqs=52 dur=3.45s 289,351 docs/sec |
| 57 | + 4) add=1M flush=1M fail=0 reqs=52 dur=3.49s 286,123 docs/sec |
| 58 | + 5) add=1M flush=1M fail=0 reqs=52 dur=3.47s 287,852 docs/sec |
| 59 | + 6) add=1M flush=1M fail=0 reqs=52 dur=3.47s 288,184 docs/sec |
| 60 | + 7) add=1M flush=1M fail=0 reqs=52 dur=3.54s 282,246 docs/sec |
| 61 | + 8) add=1M flush=1M fail=0 reqs=52 dur=3.47s 288,101 docs/sec |
| 62 | + 9) add=1M flush=1M fail=0 reqs=52 dur=3.54s 282,485 docs/sec |
| 63 | + 10) add=1M flush=1M fail=0 reqs=52 dur=3.46s 288,350 docs/sec |
| 64 | +▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ |
| 65 | +docs/sec: min [279,173] max [289,351] mean [286,987] |
| 66 | +``` |
| 67 | + |
| 68 | +## HTTP Log Event |
| 69 | + |
| 70 | +The [`httplog`](data/httplog/document.json) dataset uses a bigger document (2.5K), corresponding to a log event gathered by [Filebeat](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-module-nginx.html) from Nginx. |
| 71 | + |
| 72 | +``` |
| 73 | +ELASTICSEARCH_URL=http://server:9200 go run benchmarks.go --dataset=httplog --count=1_000_000 --flush=3MB --shards=5 --replicas=0 --fasthttp=true --easyjson=true |
| 74 | +httplog: run [10x] warmup [3x] shards [5] replicas [0] workers [8] flush [3.0 MB] wait [1s] fasthttp easyjson |
| 75 | +▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ |
| 76 | + 1) add=1M flush=1M fail=0 reqs=649 dur=19.93s 50,165 docs/sec |
| 77 | + 2) add=1M flush=1M fail=0 reqs=649 dur=18.84s 53,072 docs/sec |
| 78 | + 3) add=1M flush=1M fail=0 reqs=649 dur=19.13s 52,249 docs/sec |
| 79 | + 4) add=1M flush=1M fail=0 reqs=649 dur=19.26s 51,912 docs/sec |
| 80 | + 5) add=1M flush=1M fail=0 reqs=649 dur=18.98s 52,662 docs/sec |
| 81 | + 6) add=1M flush=1M fail=0 reqs=649 dur=19.21s 52,056 docs/sec |
| 82 | + 7) add=1M flush=1M fail=0 reqs=649 dur=18.91s 52,865 docs/sec |
| 83 | + 8) add=1M flush=1M fail=0 reqs=649 dur=19.25s 51,934 docs/sec |
| 84 | + 9) add=1M flush=1M fail=0 reqs=649 dur=19.44s 51,440 docs/sec |
| 85 | + 10) add=1M flush=1M fail=0 reqs=649 dur=19.24s 51,966 docs/sec |
| 86 | +▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ |
| 87 | +docs/sec: min [50,165] max [53,072] mean [52,011] |
| 88 | +``` |
| 89 | + |
| 90 | +## Mock Server |
| 91 | + |
| 92 | +The `--mockserver` flag allows to run the benchmark against a "mock server", in this case Nginx, to understand a theoretical performance of the client, without the overhead of a real Elasticsearch cluster. |
| 93 | + |
| 94 | +``` |
| 95 | +ELASTICSEARCH_URL=http://server:8000 go run benchmarks.go --dataset=small --count=1_000_000 --flush=2MB --warmup=0 --mockserver |
| 96 | +small: run [10x] warmup [0x] shards [3] replicas [0] workers [8] flush [2.0 MB] wait [1s] |
| 97 | +▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ |
| 98 | + 1) add=1M flush=0 fail=0 reqs=56 dur=810ms 1,222,493 docs/sec |
| 99 | + 2) add=1M flush=0 fail=0 reqs=56 dur=810ms 1,230,012 docs/sec |
| 100 | + 3) add=1M flush=0 fail=0 reqs=56 dur=790ms 1,251,564 docs/sec |
| 101 | + 4) add=1M flush=0 fail=0 reqs=56 dur=840ms 1,187,648 docs/sec |
| 102 | + 5) add=1M flush=0 fail=0 reqs=56 dur=800ms 1,237,623 docs/sec |
| 103 | + 6) add=1M flush=0 fail=0 reqs=56 dur=800ms 1,237,623 docs/sec |
| 104 | + 7) add=1M flush=0 fail=0 reqs=56 dur=800ms 1,240,694 docs/sec |
| 105 | + 8) add=1M flush=0 fail=0 reqs=56 dur=820ms 1,216,545 docs/sec |
| 106 | + 9) add=1M flush=0 fail=0 reqs=56 dur=790ms 1,253,132 docs/sec |
| 107 | + 10) add=1M flush=0 fail=0 reqs=56 dur=810ms 1,223,990 docs/sec |
| 108 | +▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ |
| 109 | +docs/sec: min [1,187,648] max [1,253,132] mean [1,233,818] |
| 110 | +``` |
| 111 | + |
| 112 | +## Environment |
| 113 | + |
| 114 | +Please note that these results are only illustrative, and the real performance depends on many factors: |
| 115 | +the size and structure of your data, the index settings and mappings, the cluster setup or the hardware specification. |
| 116 | + |
| 117 | +The benchmarks have been run in the following environment: |
| 118 | + |
| 119 | +* OS: Ubuntu 18.04.4 LTS (5.0.0-1031-gcp) |
| 120 | +* Client: A `n2-standard-8` [GCP instance](https://cloud.google.com/compute/docs/machine-types#n2_machine_types) (8 vCPUs/32GB RAM) |
| 121 | +* Server: A `n2-standard-16` [GCP instance](https://cloud.google.com/compute/docs/machine-types#n2_machine_types) (16 vCPUs/64GB RAM) |
| 122 | +* Disk: A [local SSD](https://cloud.google.com/compute/docs/disks#localssds) formatted as `ext4` on NVMe interface for Elasticsearch data |
| 123 | +* A single-node Elasticsearch cluster, `7.6.0`, [default distribution](https://www.elastic.co/downloads/elasticsearch), installed from a TAR, with 4GB locked for heap |
| 124 | +* Nginx 1.17.8 with [`nginx.conf`](etc/nginx.conf) |
0 commit comments