Skip to content

Conversation

@fabianfett
Copy link
Collaborator

@fabianfett fabianfett commented May 1, 2021

In #135 a connection state machine was introduced. To keep this simple at first, the state machine operates on single PostgresBackendMessages. This is very inefficient for queries that result in a lot of rows. This PR improves query performance by badging Postgres data row messages at the Decoder and operating on those batches from there on.

Open ends:

  • Two open warnings

Comment on lines 94 to 97
case forwardRow([PSQLData], to: EventLoopPromise<StateMachineStreamNextResult>)
case forwardCommandComplete(CircularBuffer<[PSQLData]>, commandTag: String, to: EventLoopPromise<StateMachineStreamNextResult>)
case forwardStreamError(PSQLError, to: EventLoopPromise<StateMachineStreamNextResult>, cleanupContext: CleanUpContext?)
// actions if query has not asked for next row but are pushing the final bytes to it
case forwardStreamErrorToCurrentQuery(PSQLError, read: Bool, cleanupContext: CleanUpContext?)
case forwardStreamCompletedToCurrentQuery(CircularBuffer<[PSQLData]>, commandTag: String, read: Bool)
case forwardRows(CircularBuffer<PSQLBackendMessage.DataRow>)
case forwardStreamComplete(CircularBuffer<PSQLBackendMessage.DataRow>, commandTag: String, read: Bool)
case forwardStreamError(PSQLError, read: Bool, cleanupContext: CleanUpContext?)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduction in Actions 🥳

let promise = self.eventLoop.makePromise(of: Row?.self)
self.downstreamState = .waitingForNext(promise)
self.upstreamState = .streaming(buffer: buffer, dataSource: dataSource)
dataSource.request()
Copy link
Collaborator Author

@fabianfett fabianfett May 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weissi This is the invocation, where the PSQLRowStream asks for more rows. We could change this to:

context.channel.triggerUserOutboundEvent(PSQLRowStreamEvent.requestMoreRows, promise: nil)

@fabianfett fabianfett requested a review from gwynne May 1, 2021 09:31
@fabianfett fabianfett force-pushed the ff-batch-performance branch from d8b1e1b to 468a648 Compare May 2, 2021 12:32
@fabianfett
Copy link
Collaborator Author

fabianfett commented May 2, 2021

Adding some performance numbers... (Swift main on focal in docker)

Running a query SELECT * FROM "large_table" that returns 1million rows with three number each:

1.4.4 release (before state machine):

root@3bcc272414db:/vapor/postgres-perf# time .build/release/PostgresPerf
2021-05-02T13:29:45+0000 info psql : PostgresQueryMetadata(command: "SELECT", oid: nil, rows: Optional(1000000))
2021-05-02T13:29:45+0000 info psql : took: 3.9132070541381836 seconds

real	0m4.455s
user	0m1.970s
sys	0m0.376s

main branch with state machine (perf regression):

root@3bcc272414db:/vapor/postgres-perf# time .build/release/PostgresPerf
2021-05-02T13:12:26+0000 info psql : PostgresQueryMetadata(command: "SELECT", oid: nil, rows: Optional(1000000))
2021-05-02T13:12:26+0000 info psql : query took: 5.985878944396973 seconds

real	0m6.256s
user	0m5.955s
sys	0m0.135s

this branch:

root@3bcc272414db:/vapor/postgres-perf# time .build/release/PostgresPerf
2021-05-02T13:13:23+0000 info psql : PostgresQueryMetadata(command: "SELECT", oid: nil, rows: Optional(1000000))
2021-05-02T13:13:23+0000 info psql : query took: 3.5723689794540405 seconds

real	0m4.003s
user	0m1.110s
sys	0m0.435s

While the speed bump is nice. Even more important: The query doesn't use 100% cpu anymore.

@fabianfett fabianfett force-pushed the ff-batch-performance branch from 468a648 to 0840cbb Compare May 2, 2021 13:35
@fabianfett fabianfett force-pushed the ff-batch-performance branch from 0840cbb to e20f441 Compare May 5, 2021 13:55
@jdmcd
Copy link
Member

jdmcd commented May 5, 2021

This is really enticing, any chance we can port this easily to MySQL as well? 👀

@fabianfett
Copy link
Collaborator Author

@jdmcd Since this pr builds on top of #135, I'm afraid some groundwork would need to be done to have this land.

@jdmcd
Copy link
Member

jdmcd commented May 5, 2021

Ah, I see

@fabianfett fabianfett marked this pull request as draft May 6, 2021 12:24
@fabianfett fabianfett force-pushed the ff-batch-performance branch 3 times, most recently from 2fe77c2 to 65200ab Compare May 10, 2021 19:03
@gwynne
Copy link
Member

gwynne commented Aug 16, 2021

This is really enticing, any chance we can port this easily to MySQL as well? 👀

@jdmcd FWIW this is something I've already got on my list to look into for MySQL 🙂

@fabianfett fabianfett force-pushed the ff-batch-performance branch from 65200ab to bdd2ce4 Compare August 16, 2021 18:36
@fabianfett
Copy link
Collaborator Author

Just applied another fix. Will push shortly. On Swift 5.5 nightly:

Read 1m rows into an array
main        : 5.354 sec
this branch : 1.790 sec

Consume 1m rows on el
main        : 5.176 sec
this branch : 1.475 sec

@fabianfett fabianfett force-pushed the ff-batch-performance branch 5 times, most recently from f399384 to a693d81 Compare August 20, 2021 21:52
@fabianfett fabianfett force-pushed the ff-batch-performance branch from a693d81 to ab4d9e6 Compare August 23, 2021 07:38
@fabianfett fabianfett marked this pull request as ready for review August 23, 2021 08:05
Co-authored-by: Gwynne Raskind <gwynne@darkrainfall.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants