Document pooling

olim7t · olim7t · commit 5d80a4696a8f · 2018-04-12T13:26:49.000-07:00
diff --git a/core/src/main/resources/reference.conf b/core/src/main/resources/reference.conf
@@ -173,7 +173,7 @@ datastax-java-driver {
     #
     # This option can be changed at runtime, the new value will be used for new connections created
     # after the change.
-    max-requests-per-connection = 32768
+    max-requests-per-connection = 1024
 
     # The maximum number of "orphaned" requests before a connection gets closed automatically.
     #
diff --git a/manual/core/pooling/README.md b/manual/core/pooling/README.md
@@ -0,0 +1,150 @@
+## Connection pooling
+
+### Basics
+
+The driver communicates with Cassandra over TCP, using the Cassandra binary protocol. This protocol
+is asynchronous, which allows each TCP connection to handle multiple simultaneous requests:
+
+* when a query gets executed, a *stream id* gets assigned to it. It is a unique identifier on the
+  current connection;
+* the driver writes a request containing the stream id and the query on the connection, and then
+  proceeds without waiting for the response (if you're using the asynchronous API, this is when the
+  driver will send you back a `java.util.concurrent.CompletionStage`). Once the request has been
+  written to the connection, we say that it is *in flight*;
+* at some point, Cassandra will send back a response on the connection. This response also contains
+  the stream id, which allows the driver to trigger a callback that will complete the corresponding
+  query (this is the point where your `CompletionStage` will get completed).
+
+You don't need to manage connections yourself. You simply interact with a [CqlSession] object, which
+takes care of it.
+
+**For a given session, there is one connection pool per connected node** (a node is connected when
+it is up and not ignored by the [load balancing policy](../load_balancing/)).
+
+The number of connections per pool is configurable (this will be described in the next section).
+There are up to 32768 stream ids per connection.
+
+```ditaa
++-------+1     n+----+1     n+----------+1   32K+-------+
++Session+-------+Pool+-------+Connection+-------+Request+
++-------+       +----+       +----------+       +-------+
+```
+
+### Configuration
+
+Pool sizes are defined in the `connection` section of the [configuration](../configuration/). Here
+are the relevant options with their default values:
+
+```
+datastax-java-driver.connection {
+  max-requests-per-connection = 1024
+  pool {
+    local.size = 1
+    remote.size = 1
+  }
+}
+```
+
+Unlike previous versions of the driver, pools do not resize dynamically. However you can adjust the
+options at runtime, the driver will detect and apply the changes.
+
+#### Heartbeat
+
+If connections stay idle for too long, they might be dropped by intermediate network devices
+(routers, firewalls...). Normally, TCP keepalive should take care of this; but tweaking low-level
+keepalive settings might be impractical in some environments.
+
+The driver provides application-side keepalive in the form of a connection heartbeat: when a
+connection does receive incoming reads for a given amount of time, the driver will simulate activity
+by writing a dummy request to it. If that request fails, the connection is trashed and replaced.
+
+This feature is enabled by default. Here are the default values in the configuration:
+
+```
+datastax-java-driver.connection {
+  heartbeat {
+    interval = 30 seconds
+  
+    # How long the driver waits for the response to a heartbeat. If this timeout fires, the heartbeat
+    # is considered failed.
+    timeout = 500 milliseconds
+  }
+}
+```
+
+Both options can be changed at runtime, the new value will be used for new connections created after
+the change.
+
+### Monitoring
+
+The driver exposes node-level [metrics](../metrics/) to monitor your pools (note that all metrics
+are disabled by default, you'll need to change your configuration to enable them):
+
+```
+datastax-java-driver {
+  metrics.node.enabled = [
+    # The number of connections open to this node for regular requests (exposed as a
+    # Gauge<Integer>).
+    #
+    # This includes the control connection (which uses at most one extra connection to a random
+    # node in the cluster).
+    pool.open-connections,
+    
+    # The number of stream ids available on the connections to this node (exposed as a
+    # Gauge<Integer>).
+    #
+    # Stream ids are used to multiplex requests on each connection, so this is an indication of
+    # how many more requests the node could handle concurrently before becoming saturated (note
+    # that this is a driver-side only consideration, there might be other limitations on the
+    # server that prevent reaching that theoretical limit).
+    pool.available-streams,
+    
+    # The number of requests currently executing on the connections to this node (exposed as a
+    # Gauge<Integer>). This includes orphaned streams.
+    pool.in-flight,
+    
+    # The number of "orphaned" stream ids on the connections to this node (exposed as a
+    # Gauge<Integer>).
+    #
+    # See the description of the connection.max-orphan-requests option for more details.
+    pool.orphaned-streams,
+  ]
+}
+```
+
+In particular, it's a good idea to keep an eye on those two metrics:
+
+* `pool.open-connections`: if this doesn't match your configured pool size, something is preventing
+  connections from opening (either configuration or network issues, or a server-side limitation --
+  see [CASSANDRA-8086]);
+* `pool.available-streams`: if this is often close to 0, it's a sign that the pool is getting
+  saturated. Maybe `max-requests-per-connection` is too low, or more connections should be added.
+
+### Tuning
+
+The driver defaults should be good for most scenarios.
+
+In our experience, raising `max-requests-per-connection` above 1024 does not bring any significant
+improvement: the server is only going to service so many requests at a time anyway, so additional
+requests are just going to pile up.
+
+Similarly, 1 connection per node is generally sufficient. However, it might become a bottleneck in
+very high performance scenarios: all I/O for a connection happens on the same thread, so it's
+possible for that thread to max out its CPU core. In our benchmarks, this happened with a
+single-node cluster and a high throughput (approximately 80K requests / second / connection).
+
+It's unlikely that you'll run into this issue: in most real-world deployments, the driver connects
+to more than one node, so the load will spread across more I/O threads. However if you suspect that
+you experience the issue, here's what to look out for:
+
+* the driver throughput plateaus but the process does not appear to max out any system resource (in
+  particular, overall CPU usage is well below 100%);
+* one of the driver's I/O threads maxes out its CPU core. You can see that with a profiler, or
+  OS-level tools like `pidstat -tu` on Linux. With the default configuration, I/O threads are called
+  `<session_name>-io-<n>`.
+
+Try adding more connections per node. Thanks to the driver's hot-reload mechanism, you can do that
+at runtime and see the effects immediately. 
+
+[CqlSession]: http://docs.datastax.com/en/drivers/java/4.0/com/datastax/oss/driver/api/core/CqlSession.html
+[CASSANDRA-8086]: https://issues.apache.org/jira/browse/CASSANDRA-8086

Original file line number	Diff line number	Diff line change
`@@ -173,7 +173,7 @@ datastax-java-driver {`
`173`	`173`	`#`
`174`	`174`	`# This option can be changed at runtime, the new value will be used for new connections created`
`175`	`175`	`# after the change.`
`176`		`- max-requests-per-connection = 32768`
	`176`	`+ max-requests-per-connection = 1024`
`177`	`177`
`178`	`178`	`# The maximum number of "orphaned" requests before a connection gets closed automatically.`
`179`	`179`	`#`