|
| 1 | +## Connection pooling |
| 2 | + |
| 3 | +### Basics |
| 4 | + |
| 5 | +The driver communicates with Cassandra over TCP, using the Cassandra binary protocol. This protocol |
| 6 | +is asynchronous, which allows each TCP connection to handle multiple simultaneous requests: |
| 7 | + |
| 8 | +* when a query gets executed, a *stream id* gets assigned to it. It is a unique identifier on the |
| 9 | + current connection; |
| 10 | +* the driver writes a request containing the stream id and the query on the connection, and then |
| 11 | + proceeds without waiting for the response (if you're using the asynchronous API, this is when the |
| 12 | + driver will send you back a `java.util.concurrent.CompletionStage`). Once the request has been |
| 13 | + written to the connection, we say that it is *in flight*; |
| 14 | +* at some point, Cassandra will send back a response on the connection. This response also contains |
| 15 | + the stream id, which allows the driver to trigger a callback that will complete the corresponding |
| 16 | + query (this is the point where your `CompletionStage` will get completed). |
| 17 | + |
| 18 | +You don't need to manage connections yourself. You simply interact with a [CqlSession] object, which |
| 19 | +takes care of it. |
| 20 | + |
| 21 | +**For a given session, there is one connection pool per connected node** (a node is connected when |
| 22 | +it is up and not ignored by the [load balancing policy](../load_balancing/)). |
| 23 | + |
| 24 | +The number of connections per pool is configurable (this will be described in the next section). |
| 25 | +There are up to 32768 stream ids per connection. |
| 26 | + |
| 27 | +```ditaa |
| 28 | ++-------+1 n+----+1 n+----------+1 32K+-------+ |
| 29 | ++Session+-------+Pool+-------+Connection+-------+Request+ |
| 30 | ++-------+ +----+ +----------+ +-------+ |
| 31 | +``` |
| 32 | + |
| 33 | +### Configuration |
| 34 | + |
| 35 | +Pool sizes are defined in the `connection` section of the [configuration](../configuration/). Here |
| 36 | +are the relevant options with their default values: |
| 37 | + |
| 38 | +``` |
| 39 | +datastax-java-driver.connection { |
| 40 | + max-requests-per-connection = 1024 |
| 41 | + pool { |
| 42 | + local.size = 1 |
| 43 | + remote.size = 1 |
| 44 | + } |
| 45 | +} |
| 46 | +``` |
| 47 | + |
| 48 | +Unlike previous versions of the driver, pools do not resize dynamically. However you can adjust the |
| 49 | +options at runtime, the driver will detect and apply the changes. |
| 50 | + |
| 51 | +#### Heartbeat |
| 52 | + |
| 53 | +If connections stay idle for too long, they might be dropped by intermediate network devices |
| 54 | +(routers, firewalls...). Normally, TCP keepalive should take care of this; but tweaking low-level |
| 55 | +keepalive settings might be impractical in some environments. |
| 56 | + |
| 57 | +The driver provides application-side keepalive in the form of a connection heartbeat: when a |
| 58 | +connection does receive incoming reads for a given amount of time, the driver will simulate activity |
| 59 | +by writing a dummy request to it. If that request fails, the connection is trashed and replaced. |
| 60 | + |
| 61 | +This feature is enabled by default. Here are the default values in the configuration: |
| 62 | + |
| 63 | +``` |
| 64 | +datastax-java-driver.connection { |
| 65 | + heartbeat { |
| 66 | + interval = 30 seconds |
| 67 | + |
| 68 | + # How long the driver waits for the response to a heartbeat. If this timeout fires, the heartbeat |
| 69 | + # is considered failed. |
| 70 | + timeout = 500 milliseconds |
| 71 | + } |
| 72 | +} |
| 73 | +``` |
| 74 | + |
| 75 | +Both options can be changed at runtime, the new value will be used for new connections created after |
| 76 | +the change. |
| 77 | + |
| 78 | +### Monitoring |
| 79 | + |
| 80 | +The driver exposes node-level [metrics](../metrics/) to monitor your pools (note that all metrics |
| 81 | +are disabled by default, you'll need to change your configuration to enable them): |
| 82 | + |
| 83 | +``` |
| 84 | +datastax-java-driver { |
| 85 | + metrics.node.enabled = [ |
| 86 | + # The number of connections open to this node for regular requests (exposed as a |
| 87 | + # Gauge<Integer>). |
| 88 | + # |
| 89 | + # This includes the control connection (which uses at most one extra connection to a random |
| 90 | + # node in the cluster). |
| 91 | + pool.open-connections, |
| 92 | + |
| 93 | + # The number of stream ids available on the connections to this node (exposed as a |
| 94 | + # Gauge<Integer>). |
| 95 | + # |
| 96 | + # Stream ids are used to multiplex requests on each connection, so this is an indication of |
| 97 | + # how many more requests the node could handle concurrently before becoming saturated (note |
| 98 | + # that this is a driver-side only consideration, there might be other limitations on the |
| 99 | + # server that prevent reaching that theoretical limit). |
| 100 | + pool.available-streams, |
| 101 | + |
| 102 | + # The number of requests currently executing on the connections to this node (exposed as a |
| 103 | + # Gauge<Integer>). This includes orphaned streams. |
| 104 | + pool.in-flight, |
| 105 | + |
| 106 | + # The number of "orphaned" stream ids on the connections to this node (exposed as a |
| 107 | + # Gauge<Integer>). |
| 108 | + # |
| 109 | + # See the description of the connection.max-orphan-requests option for more details. |
| 110 | + pool.orphaned-streams, |
| 111 | + ] |
| 112 | +} |
| 113 | +``` |
| 114 | + |
| 115 | +In particular, it's a good idea to keep an eye on those two metrics: |
| 116 | + |
| 117 | +* `pool.open-connections`: if this doesn't match your configured pool size, something is preventing |
| 118 | + connections from opening (either configuration or network issues, or a server-side limitation -- |
| 119 | + see [CASSANDRA-8086]); |
| 120 | +* `pool.available-streams`: if this is often close to 0, it's a sign that the pool is getting |
| 121 | + saturated. Maybe `max-requests-per-connection` is too low, or more connections should be added. |
| 122 | + |
| 123 | +### Tuning |
| 124 | + |
| 125 | +The driver defaults should be good for most scenarios. |
| 126 | + |
| 127 | +In our experience, raising `max-requests-per-connection` above 1024 does not bring any significant |
| 128 | +improvement: the server is only going to service so many requests at a time anyway, so additional |
| 129 | +requests are just going to pile up. |
| 130 | + |
| 131 | +Similarly, 1 connection per node is generally sufficient. However, it might become a bottleneck in |
| 132 | +very high performance scenarios: all I/O for a connection happens on the same thread, so it's |
| 133 | +possible for that thread to max out its CPU core. In our benchmarks, this happened with a |
| 134 | +single-node cluster and a high throughput (approximately 80K requests / second / connection). |
| 135 | + |
| 136 | +It's unlikely that you'll run into this issue: in most real-world deployments, the driver connects |
| 137 | +to more than one node, so the load will spread across more I/O threads. However if you suspect that |
| 138 | +you experience the issue, here's what to look out for: |
| 139 | + |
| 140 | +* the driver throughput plateaus but the process does not appear to max out any system resource (in |
| 141 | + particular, overall CPU usage is well below 100%); |
| 142 | +* one of the driver's I/O threads maxes out its CPU core. You can see that with a profiler, or |
| 143 | + OS-level tools like `pidstat -tu` on Linux. With the default configuration, I/O threads are called |
| 144 | + `<session_name>-io-<n>`. |
| 145 | + |
| 146 | +Try adding more connections per node. Thanks to the driver's hot-reload mechanism, you can do that |
| 147 | +at runtime and see the effects immediately. |
| 148 | + |
| 149 | +[CqlSession]: http://docs.datastax.com/en/drivers/java/4.0/com/datastax/oss/driver/api/core/CqlSession.html |
| 150 | +[CASSANDRA-8086]: https://issues.apache.org/jira/browse/CASSANDRA-8086 |
0 commit comments