async_hooks: use typed array stack as fast path #17780
async_hooks: use typed array stack as fast path #17780addaleax wants to merge 4 commits intonodejs:masterfrom
Conversation
- Communicate the current async stack length through a
typed array field rather than a native binding method
- Add a new fixed-size `async_ids_fast_stack` typed array
that contains the async ID stack up to a fixed limit.
This increases performance noticeably, since most of the time
the async ID stack will not be more than a handful of
levels deep.
- Make the JS `pushAsyncIds()` and `popAsyncIds()` functions
do the same thing as the native ones if the fast path
is applicable.
Benchmarks:
$ ./node benchmark/compare.js --new ./node --old ./node-master --runs 10 --filter next-tick process | Rscript benchmark/compare.R
[00:03:25|% 100| 6/6 files | 20/20 runs | 1/1 configs]: Done
improvement confidence p.value
process/next-tick-breadth-args.js millions=4 19.72 % *** 3.013913e-06
process/next-tick-breadth.js millions=4 27.33 % *** 5.847983e-11
process/next-tick-depth-args.js millions=12 40.08 % *** 1.237127e-13
process/next-tick-depth.js millions=12 77.27 % *** 1.413290e-11
process/next-tick-exec-args.js millions=5 13.58 % *** 1.245180e-07
process/next-tick-exec.js millions=5 16.80 % *** 2.961386e-07
2ce172b to
6bed72a
Compare
|
I think we should just remove the slow path. Having more than 32 nested |
|
@AndreasMadsen I know that @trevnorris made the switch from using only a typed array to a I agree that the slow path is not likely to be taken but I don’t see why we shouldn’t support it just because of that |
It is code complexity for an edge case that is unlikely to ever happen. |
|
Possibly naive question: can't you grow the typed array when it's full? Seems simpler than a separate data structure for the fallback case. |
|
@bnoordhuis You mean, replace by a larger typed array? Yeah, I guess you could. :) @AndreasMadsen Would you be okay with that approach? |
|
@addaleax Yes. I was thinking the same. |
|
@AndreasMadsen @bnoordhuis I’ve switched this to a resizable typed array … it doesn’t seem to affect the perf improvements here. :) I’m totally fine with this approach, but I have to admit I don’t see a huge benefit over the previous code… |
bnoordhuis
left a comment
There was a problem hiding this comment.
Left some comments that you are welcome to ignore if I'm completely off-mark. :-)
src/env.h
Outdated
| // Keep track of the environment copy itself. | ||
| Environment* env_; | ||
| // Stores the ids of the current execution context stack. | ||
| std::stack<async_context> async_ids_stack_; |
There was a problem hiding this comment.
You can remove the #include <stack> now, I think?
| async_hook_fields[kStackLength]++; | ||
| async_id_fields[kExecutionAsyncId] = asyncId; | ||
| async_id_fields[kTriggerAsyncId] = triggerAsyncId; | ||
| } |
There was a problem hiding this comment.
I'd take a slightly different approach here. Growing the stack can be done in JS: allocate a new typed array, copy over the elements, then call into C++ to update the AliasedBuffer's reference. That probably also obviates the need to overload operator=(AliasedBuffer&&).
If you wrote it this way because of how PromiseHook() calls env->async_hooks()->push_async_ids(), perhaps you can work around that by ensuring there are always a few free slots? I think that should work although admittedly I didn't look too closely.
There was a problem hiding this comment.
@bnoordhuis It’s not just PromiseHook(), MakeCallback() also does that – I think for your suggestion to work in a generic fashion you’d have to call that growing function from C++, which kind of just moves the problem around
lib/internal/async_hooks.js
Outdated
| // This is the equivalent of the native push_async_ids() call. | ||
| function pushAsyncIds(asyncId, triggerAsyncId) { | ||
| const stackLength = async_hook_fields[kStackLength]; | ||
| if (stackLength >= async_hook_fields[kStackCapacity]) |
There was a problem hiding this comment.
Is this needed? You can just check async_ids_stack.length, right?
- Communicate the current async stack length through a
typed array field rather than a native binding method
- Add a new fixed-size `async_ids_fast_stack` typed array
that contains the async ID stack up to a fixed limit.
This increases performance noticeably, since most of the time
the async ID stack will not be more than a handful of
levels deep.
- Make the JS `pushAsyncIds()` and `popAsyncIds()` functions
do the same thing as the native ones if the fast path
is applicable.
Benchmarks:
$ ./node benchmark/compare.js --new ./node --old ./node-master --runs 10 --filter next-tick process | Rscript benchmark/compare.R
[00:03:25|% 100| 6/6 files | 20/20 runs | 1/1 configs]: Done
improvement confidence p.value
process/next-tick-breadth-args.js millions=4 19.72 % *** 3.013913e-06
process/next-tick-breadth.js millions=4 27.33 % *** 5.847983e-11
process/next-tick-depth-args.js millions=12 40.08 % *** 1.237127e-13
process/next-tick-depth.js millions=12 77.27 % *** 1.413290e-11
process/next-tick-exec-args.js millions=5 13.58 % *** 1.245180e-07
process/next-tick-exec.js millions=5 16.80 % *** 2.961386e-07
PR-URL: #17780
Reviewed-By: James M Snell <jasnell@gmail.com>
|
Landed in 83e5215 |
|
This does not land cleanly on v9.x, could someone please manually backport |
|
This should apply cleanly or at least almost cleanly after #18079 lands |
- Communicate the current async stack length through a
typed array field rather than a native binding method
- Add a new fixed-size `async_ids_fast_stack` typed array
that contains the async ID stack up to a fixed limit.
This increases performance noticeably, since most of the time
the async ID stack will not be more than a handful of
levels deep.
- Make the JS `pushAsyncIds()` and `popAsyncIds()` functions
do the same thing as the native ones if the fast path
is applicable.
Benchmarks:
$ ./node benchmark/compare.js --new ./node --old ./node-master --runs 10 --filter next-tick process | Rscript benchmark/compare.R
[00:03:25|% 100| 6/6 files | 20/20 runs | 1/1 configs]: Done
improvement confidence p.value
process/next-tick-breadth-args.js millions=4 19.72 % *** 3.013913e-06
process/next-tick-breadth.js millions=4 27.33 % *** 5.847983e-11
process/next-tick-depth-args.js millions=12 40.08 % *** 1.237127e-13
process/next-tick-depth.js millions=12 77.27 % *** 1.413290e-11
process/next-tick-exec-args.js millions=5 13.58 % *** 1.245180e-07
process/next-tick-exec.js millions=5 16.80 % *** 2.961386e-07
PR-URL: nodejs#17780
Backport-PR-URL: nodejs#18179
Reviewed-By: James M Snell <jasnell@gmail.com>
- Communicate the current async stack length through a
typed array field rather than a native binding method
- Add a new fixed-size `async_ids_fast_stack` typed array
that contains the async ID stack up to a fixed limit.
This increases performance noticeably, since most of the time
the async ID stack will not be more than a handful of
levels deep.
- Make the JS `pushAsyncIds()` and `popAsyncIds()` functions
do the same thing as the native ones if the fast path
is applicable.
Benchmarks:
$ ./node benchmark/compare.js --new ./node --old ./node-master --runs 10 --filter next-tick process | Rscript benchmark/compare.R
[00:03:25|% 100| 6/6 files | 20/20 runs | 1/1 configs]: Done
improvement confidence p.value
process/next-tick-breadth-args.js millions=4 19.72 % *** 3.013913e-06
process/next-tick-breadth.js millions=4 27.33 % *** 5.847983e-11
process/next-tick-depth-args.js millions=12 40.08 % *** 1.237127e-13
process/next-tick-depth.js millions=12 77.27 % *** 1.413290e-11
process/next-tick-exec-args.js millions=5 13.58 % *** 1.245180e-07
process/next-tick-exec.js millions=5 16.80 % *** 2.961386e-07
Backport-PR-URL: #18179
PR-URL: #17780
Reviewed-By: James M Snell <jasnell@gmail.com>
|
hm, it still isn't cherry-picking cleanly. Anyone have time to backport? Thanks! |
- Communicate the current async stack length through a
typed array field rather than a native binding method
- Add a new fixed-size `async_ids_fast_stack` typed array
that contains the async ID stack up to a fixed limit.
This increases performance noticeably, since most of the time
the async ID stack will not be more than a handful of
levels deep.
- Make the JS `pushAsyncIds()` and `popAsyncIds()` functions
do the same thing as the native ones if the fast path
is applicable.
Benchmarks:
$ ./node benchmark/compare.js --new ./node --old ./node-master --runs 10 --filter next-tick process | Rscript benchmark/compare.R
[00:03:25|% 100| 6/6 files | 20/20 runs | 1/1 configs]: Done
improvement confidence p.value
process/next-tick-breadth-args.js millions=4 19.72 % *** 3.013913e-06
process/next-tick-breadth.js millions=4 27.33 % *** 5.847983e-11
process/next-tick-depth-args.js millions=12 40.08 % *** 1.237127e-13
process/next-tick-depth.js millions=12 77.27 % *** 1.413290e-11
process/next-tick-exec-args.js millions=5 13.58 % *** 1.245180e-07
process/next-tick-exec.js millions=5 16.80 % *** 2.961386e-07
PR-URL: nodejs#17780
Reviewed-By: James M Snell <jasnell@gmail.com>
- Communicate the current async stack length through a
typed array field rather than a native binding method
- Add a new fixed-size `async_ids_fast_stack` typed array
that contains the async ID stack up to a fixed limit.
This increases performance noticeably, since most of the time
the async ID stack will not be more than a handful of
levels deep.
- Make the JS `pushAsyncIds()` and `popAsyncIds()` functions
do the same thing as the native ones if the fast path
is applicable.
Benchmarks:
$ ./node benchmark/compare.js --new ./node --old ./node-master --runs 10 --filter next-tick process | Rscript benchmark/compare.R
[00:03:25|% 100| 6/6 files | 20/20 runs | 1/1 configs]: Done
improvement confidence p.value
process/next-tick-breadth-args.js millions=4 19.72 % *** 3.013913e-06
process/next-tick-breadth.js millions=4 27.33 % *** 5.847983e-11
process/next-tick-depth-args.js millions=12 40.08 % *** 1.237127e-13
process/next-tick-depth.js millions=12 77.27 % *** 1.413290e-11
process/next-tick-exec-args.js millions=5 13.58 % *** 1.245180e-07
process/next-tick-exec.js millions=5 16.80 % *** 2.961386e-07
Backport-PR-URL: #18290
PR-URL: #17780
Reviewed-By: James M Snell <jasnell@gmail.com>
Notable changes: * deps: * update V8 to 6.2.414.46 (Michaël Zasso) [#16413](#16413) * revert ABI breaking changes in V8 6.2 (Anna Henningsen) [#16413](#16413) * upgrade libuv to 1.19.1 (cjihrig) [#18260](#18260) * re land npm 5.6.0 (Myles Borins) [#18625](#18625) * ICU 60 bump (Steven R. Loomis) [#16876](#16876) * crypto: * Support both OpenSSL 1.1.0 and 1.0.2 (David Benjamin) [#16130](#16130) * warn on invalid authentication tag length (Tobias Nießen) [#17566](#17566) * async_hooks: * update defaultTriggerAsyncIdScope for perf (Anatoli Papirovski) [#18004](#18004) * use typed array stack as fast path (Anna Henningsen) [#17780](#17780) * use scope for defaultTriggerAsyncId (Andreas Madsen) [#17273](#17273) * separate missing from default context (Andreas Madsen) [#17273](#17273) * rename initTriggerId (Andreas Madsen) [#17273](#17273) * deprecate undocumented API (Andreas Madsen) [#16972](#16972) * add destroy event for gced AsyncResources (Sebastian Mayr) [#16998](#16998) * add trace events to async_hooks (Andreas Madsen) [#15538](#15538) * set HTTPParser trigger to socket (Andreas Madsen) [#18003](#18003) * add provider types for net server (Andreas Madsen) [#17157](#17157) * n-api: * add helper for addons to get the event loop (Anna Henningsen) [#17109](#17109) * cli: * add --stack-trace-limit to NODE_OPTIONS (Anna Henningsen) [#16495](#16495) * console: * add support for console.debug (Benjamin Zaslavsky) [#17033](#17033) * module: * add builtinModules (Jon Moss) [#16386](#16386) * replace default paths in require.resolve() (cjihrig) [#17113](#17113) * src: * add helper for addons to get the event loop (Anna Henningsen) [#17109](#17109) * add process.ppid (cjihrig) [#16839](#16839) * http: * support generic `Duplex` streams (Anna Henningsen) [#16267](#16267) * add rawPacket in err of `clientError` event (XadillaX) [#17672](#17672) * better support for IPv6 addresses (Mattias Holmlund) [#14772](#14772) * net: * remove ADDRCONFIG DNS hint on Windows (Bartosz Sosnowski) [#17662](#17662) * process: * fix reading zero-length env vars on win32 (Anna Henningsen) [#18463](#18463) * tls: * unconsume stream on destroy (Anna Henningsen) [#17478](#17478) * process: * improve unhandled rejection message (Madara Uchiha) [#17158](#17158) * stream: * remove usage of *State.highWaterMark (Calvin Metcalf) [#12860](#12860) * trace_events: * add executionAsyncId to init events (Andreas Madsen) [#17196](#17196) PR-URL: #18336
Notable changes: * deps: * update V8 to 6.2.414.46 (Michaël Zasso) [#16413](#16413) * revert ABI breaking changes in V8 6.2 (Anna Henningsen) [#16413](#16413) * upgrade libuv to 1.19.1 (cjihrig) [#18260](#18260) * re land npm 5.6.0 (Myles Borins) [#18625](#18625) * ICU 60 bump (Steven R. Loomis) [#16876](#16876) * crypto: * Support both OpenSSL 1.1.0 and 1.0.2 (David Benjamin) [#16130](#16130) * warn on invalid authentication tag length (Tobias Nießen) [#17566](#17566) * async_hooks: * update defaultTriggerAsyncIdScope for perf (Anatoli Papirovski) [#18004](#18004) * use typed array stack as fast path (Anna Henningsen) [#17780](#17780) * use scope for defaultTriggerAsyncId (Andreas Madsen) [#17273](#17273) * separate missing from default context (Andreas Madsen) [#17273](#17273) * rename initTriggerId (Andreas Madsen) [#17273](#17273) * deprecate undocumented API (Andreas Madsen) [#16972](#16972) * add destroy event for gced AsyncResources (Sebastian Mayr) [#16998](#16998) * add trace events to async_hooks (Andreas Madsen) [#15538](#15538) * set HTTPParser trigger to socket (Andreas Madsen) [#18003](#18003) * add provider types for net server (Andreas Madsen) [#17157](#17157) * n-api: * add helper for addons to get the event loop (Anna Henningsen) [#17109](#17109) * cli: * add --stack-trace-limit to NODE_OPTIONS (Anna Henningsen) [#16495](#16495) * console: * add support for console.debug (Benjamin Zaslavsky) [#17033](#17033) * module: * add builtinModules (Jon Moss) [#16386](#16386) * replace default paths in require.resolve() (cjihrig) [#17113](#17113) * src: * add helper for addons to get the event loop (Anna Henningsen) [#17109](#17109) * add process.ppid (cjihrig) [#16839](#16839) * http: * support generic `Duplex` streams (Anna Henningsen) [#16267](#16267) * add rawPacket in err of `clientError` event (XadillaX) [#17672](#17672) * better support for IPv6 addresses (Mattias Holmlund) [#14772](#14772) * net: * remove ADDRCONFIG DNS hint on Windows (Bartosz Sosnowski) [#17662](#17662) * process: * fix reading zero-length env vars on win32 (Anna Henningsen) [#18463](#18463) * tls: * unconsume stream on destroy (Anna Henningsen) [#17478](#17478) * process: * improve unhandled rejection message (Madara Uchiha) [#17158](#17158) * stream: * remove usage of *State.highWaterMark (Calvin Metcalf) [#12860](#12860) * trace_events: * add executionAsyncId to init events (Andreas Madsen) [#17196](#17196) PR-URL: #18336
Notable changes: * deps: * update V8 to 6.2.414.46 (Michaël Zasso) [nodejs#16413](nodejs#16413) * revert ABI breaking changes in V8 6.2 (Anna Henningsen) [nodejs#16413](nodejs#16413) * upgrade libuv to 1.19.1 (cjihrig) [nodejs#18260](nodejs#18260) * re land npm 5.6.0 (Myles Borins) [nodejs#18625](nodejs#18625) * ICU 60 bump (Steven R. Loomis) [nodejs#16876](nodejs#16876) * crypto: * Support both OpenSSL 1.1.0 and 1.0.2 (David Benjamin) [nodejs#16130](nodejs#16130) * warn on invalid authentication tag length (Tobias Nießen) [nodejs#17566](nodejs#17566) * async_hooks: * update defaultTriggerAsyncIdScope for perf (Anatoli Papirovski) [nodejs#18004](nodejs#18004) * use typed array stack as fast path (Anna Henningsen) [nodejs#17780](nodejs#17780) * use scope for defaultTriggerAsyncId (Andreas Madsen) [nodejs#17273](nodejs#17273) * separate missing from default context (Andreas Madsen) [nodejs#17273](nodejs#17273) * rename initTriggerId (Andreas Madsen) [nodejs#17273](nodejs#17273) * deprecate undocumented API (Andreas Madsen) [nodejs#16972](nodejs#16972) * add destroy event for gced AsyncResources (Sebastian Mayr) [nodejs#16998](nodejs#16998) * add trace events to async_hooks (Andreas Madsen) [nodejs#15538](nodejs#15538) * set HTTPParser trigger to socket (Andreas Madsen) [nodejs#18003](nodejs#18003) * add provider types for net server (Andreas Madsen) [nodejs#17157](nodejs#17157) * n-api: * add helper for addons to get the event loop (Anna Henningsen) [nodejs#17109](nodejs#17109) * cli: * add --stack-trace-limit to NODE_OPTIONS (Anna Henningsen) [nodejs#16495](nodejs#16495) * console: * add support for console.debug (Benjamin Zaslavsky) [nodejs#17033](nodejs#17033) * module: * add builtinModules (Jon Moss) [nodejs#16386](nodejs#16386) * replace default paths in require.resolve() (cjihrig) [nodejs#17113](nodejs#17113) * src: * add helper for addons to get the event loop (Anna Henningsen) [nodejs#17109](nodejs#17109) * add process.ppid (cjihrig) [nodejs#16839](nodejs#16839) * http: * support generic `Duplex` streams (Anna Henningsen) [nodejs#16267](nodejs#16267) * add rawPacket in err of `clientError` event (XadillaX) [nodejs#17672](nodejs#17672) * better support for IPv6 addresses (Mattias Holmlund) [nodejs#14772](nodejs#14772) * net: * remove ADDRCONFIG DNS hint on Windows (Bartosz Sosnowski) [nodejs#17662](nodejs#17662) * process: * fix reading zero-length env vars on win32 (Anna Henningsen) [nodejs#18463](nodejs#18463) * tls: * unconsume stream on destroy (Anna Henningsen) [nodejs#17478](nodejs#17478) * process: * improve unhandled rejection message (Madara Uchiha) [nodejs#17158](nodejs#17158) * stream: * remove usage of *State.highWaterMark (Calvin Metcalf) [nodejs#12860](nodejs#12860) * trace_events: * add executionAsyncId to init events (Andreas Madsen) [nodejs#17196](nodejs#17196) PR-URL: nodejs#18336
typed array field rather than a native binding method
async_ids_fast_stacktyped arraythat contains the async ID stack up to a fixed limit.
This increases performance noticeably, since most of the time
the async ID stack will not be more than a handful of
levels deep.
pushAsyncIds()andpopAsyncIds()functionsdo the same thing as the native ones if the fast path
is applicable.
Benchmarks:
Checklist
make -j4 test(UNIX), orvcbuild test(Windows) passesAffected core subsystem(s)
async_hooks