buffers: speed up swap16/32, add swap64#7157
Conversation
benchmark/buffers/buffer-swap.js
Outdated
There was a problem hiding this comment.
Is commenting these out intentional?
There was a problem hiding this comment.
Yep, see comment a few lines down. The commented-out ones are only for picking the crossover point between the JS and C++ implementations, and make the benchmarks super slow.
|
/cc @nodejs/buffer |
src/node_buffer.cc
Outdated
There was a problem hiding this comment.
Could you wrap the xes in the macros in parentheses, like (x)? I don’t think it makes a difference right now, it’s just a good practice to do that.
doc/api/buffer.md
Outdated
There was a problem hiding this comment.
Using added: REPLACEME will lead the person doing the release that contains this feature to fill in the correct version.
81556e8 to
a802e79
Compare
|
Made all changes suggested so far, thanks. (@trevnorris I also added |
benchmark/buffers/buffer-swap.js
Outdated
There was a problem hiding this comment.
Not sure what the TODO would indicate. The JS methods are copy/pastes of what's in lib/buffer.js and are only for evaluating where to set the crossover point between JS and native implementations. LMK if you had something else in mind...
|
LGTM with a couple nits. Would like to make sure @trevnorris, @addaleax and @bnoordhuis are happy with it also |
a802e79 to
155b3da
Compare
src/node_buffer.cc
Outdated
There was a problem hiding this comment.
nit: check alignment of \ at the end.
ad6f206 to
adb07aa
Compare
|
(Safe impls and tests for unaligned buffers added. Pending new benchmarks for aligned v. unaligned, and benchmarking |
lib/buffer.js
Outdated
There was a problem hiding this comment.
missed this on the original swap pr. no need to use .apply(). just pass this as an argument and access it via args[0] in c++. rest of the code does this.
|
Mind applying this patch to the PR? diff --git a/benchmark/buffers/buffer-swap.js b/benchmark/buffers/buffer-swap.js
index 0be334d..dfc2158 100644
--- a/benchmark/buffers/buffer-swap.js
+++ b/benchmark/buffers/buffer-swap.js
@@ -1,8 +1,10 @@
'use strict';
const common = require('../common.js');
+const v8 = require('v8');
const bench = common.createBenchmark(main, {
+ aligned: ['true'],
method: ['swap16', 'swap32', 'swap64'/*, 'htons', 'htonl', 'htonll'*/],
len: [8, 64, 128, 256, 512, 768, 1024, 1536, 2056, 4096, 8192],
n: [5e7]
@@ -54,24 +56,35 @@ Buffer.prototype.htonll = function htonl() {
return this;
};
-function createBuffer(len) {
+function createBuffer(len, aligned) {
+ len += aligned ? 0 : 1;
const buf = Buffer.allocUnsafe(len);
for (var i = 1; i <= len; i++)
buf[i - 1] = i;
- return buf;
+ return aligned ? buf : buf.slice(1);
}
-function bufferSwap(n, buf, method) {
- for (var i = 1; i <= n; i++)
- buf[method]();
+function genMethod(method) {
+ const fnString =
+ 'return function ' + method + '(n, buf) {' +
+ ' for (var i = 0; i <= n; i++)' +
+ ' buf.' + method + '();' +
+ '}';
+ return (new Function(fnString))();
}
function main(conf) {
const method = conf.method;
const len = conf.len | 0;
const n = conf.n | 0;
- const buf = createBuffer(len);
+ const aligned = conf.aligned | 'true';
+ const buf = createBuffer(len, aligned === 'true');
+ const bufferSwap = genMethod(method);
+
+ v8.setFlagsFromString('--allow_natives_syntax');
+ eval('%OptimizeFunctionOnNextCall(bufferSwap)');
+
bench.start();
- bufferSwap(n, buf, method);
+ bufferSwap(n, buf);
bench.end(n);
}First thing is the optimization of the actual function call. Here's the before and after: Second thing is adding |
adb07aa to
f5e2215
Compare
|
Done, thanks. Somehow I'm no longer seeing the performance gain for aligned buffers... 6.2.0 and this are comparable. 😕 Need to dig to see what's going on. |
f5e2215 to
3ca2712
Compare
|
I get the following lint errors: Looking at the performance part now. |
3ca2712 to
c4cf316
Compare
|
FYI, updating cpplint. it's requiring the wrong header to be used. |
|
|
|
@zbjornson |
* Speed up buffer.swap16 and swap32 by using builtins. Up to ~6x gain. Drop transition point between JS and C++ implementations accordingly. Amount of performance improvement not only depends on buffer size but also memory alignment. * Fix tests: C++ impl tests were testing 0-filled buffers so were always passing. * Add similar buffer.swap64 method. * Make buffer-swap benchmark mirror JS impl. doc/api/buffer.markdown has an entry of "added: REPLACEME" that should be changed to the correct release number before tagged. Because node is currently using a very old version of cpplint.py it doesn't know that std::swap() has moved from <algorithm> to <utility> in c++11. So until cpplint.py is updated simply NOLINT the line. Technically it should be NOLINT(build/include_what_you_use), but that puts the line over 80 characters causing another lint error. PR-URL: #7157 Reviewed-By: Trevor Norris <trev.norris@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
|
Thanks much for the hard work. Landed in a1059af with minor adjustments and lint fixes. |
|
This is |
|
Yep. Semver minor
|
|
Does not cleanly apply on v6. I think this depends on #7082? (or at least I think the conflict comes from there) |
The conflict comes from there, although it’s only because of neighbouring lines being changed. There should be no logical dependency here; if it helps, I can do a quick backport PR. |
|
@addaleax If you can post what the patch should look like on v6 I can probably do the resolution. |
|
Or a PR may be easier, idk. Either or. |
* Speed up buffer.swap16 and swap32 by using builtins. Up to ~6x gain. Drop transition point between JS and C++ implementations accordingly. Amount of performance improvement not only depends on buffer size but also memory alignment. * Fix tests: C++ impl tests were testing 0-filled buffers so were always passing. * Add similar buffer.swap64 method. * Make buffer-swap benchmark mirror JS impl. doc/api/buffer.markdown has an entry of "added: REPLACEME" that should be changed to the correct release number before tagged. Because node is currently using a very old version of cpplint.py it doesn't know that std::swap() has moved from <algorithm> to <utility> in c++11. So until cpplint.py is updated simply NOLINT the line. Technically it should be NOLINT(build/include_what_you_use), but that puts the line over 80 characters causing another lint error. PR-URL: nodejs#7157 Reviewed-By: Trevor Norris <trev.norris@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
* Speed up buffer.swap16 and swap32 by using builtins. Up to ~6x gain. Drop transition point between JS and C++ implementations accordingly. Amount of performance improvement not only depends on buffer size but also memory alignment. * Fix tests: C++ impl tests were testing 0-filled buffers so were always passing. * Add similar buffer.swap64 method. * Make buffer-swap benchmark mirror JS impl. doc/api/buffer.markdown has an entry of "added: REPLACEME" that should be changed to the correct release number before tagged. Because node is currently using a very old version of cpplint.py it doesn't know that std::swap() has moved from <algorithm> to <utility> in c++11. So until cpplint.py is updated simply NOLINT the line. Technically it should be NOLINT(build/include_what_you_use), but that puts the line over 80 characters causing another lint error. PR-URL: #7157 Reviewed-By: Trevor Norris <trev.norris@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Backport-URL: #7546
Notable changes: * buffer: Added `buffer.swap64()` to compliment `swap16()` & `swap32()`. (Zach Bjornson) #7157 * build: New `configure` options have been added for building Node.js as a shared library. (Stefan Budeanu) #6994 - The options are: `--shared`, `--without-v8-platform` & `--without-bundled-v8`. * crypto: Root certificates have been updated. (Ben Noordhuis) #7363 * debugger: The server address is now configurable via `--debug=<address>:<port>`. (Ben Noordhuis) #3316 * npm: Upgraded npm to v3.10.3 (Kat Marchán) #7515 & (Rebecca Turner) #7410 * readline: Added the `prompt` option to the readline constructor. (Evan Lucas) #7125 * repl / vm: `sigint`/`ctrl+c` will now break out of infinite loops without stopping the Node.js instance. (Anna Henningsen) #6635 * src: - Added a `node::FreeEnvironment` public C++ API. (Cheng Zhao) #3098 - Refactored `require('constants')`, constants are now available directly from their respective modules. (James M Snell) #6534 * stream: Improved `readable.read()` performance by up to 70%. (Brian White) #7077 * timers: `setImmediate()` is now up to 150% faster in some situations. (Andras) #6436 * util: Added a `breakLength` option to `util.inspect()` to control how objects are formatted across lines. (cjihrig) #7499 * v8-inspector: Experimental support has been added for debugging Node.js over the inspector protocol. (Ali Ijaz Sheikh) #6792 - *Note: This feature is experimental, and it could be altered or removed.* - You can try this feature by running Node.js with the `--inspect` flag. Refs: #7441 PR-URL: #7550
Notable changes: * buffer: Added `buffer.swap64()` to compliment `swap16()` & `swap32()`. (Zach Bjornson) #7157 * build: New `configure` options have been added for building Node.js as a shared library. (Stefan Budeanu) #6994 - The options are: `--shared`, `--without-v8-platform` & `--without-bundled-v8`. * crypto: Root certificates have been updated. (Ben Noordhuis) #7363 * debugger: The server address is now configurable via `--debug=<address>:<port>`. (Ben Noordhuis) #3316 * npm: Upgraded npm to v3.10.3 (Kat Marchán) #7515 & (Rebecca Turner) #7410 * readline: Added the `prompt` option to the readline constructor. (Evan Lucas) #7125 * repl / vm: `sigint`/`ctrl+c` will now break out of infinite loops without stopping the Node.js instance. (Anna Henningsen) #6635 * src: - Added a `node::FreeEnvironment` public C++ API. (Cheng Zhao) #3098 - Refactored `require('constants')`, constants are now available directly from their respective modules. (James M Snell) #6534 * stream: Improved `readable.read()` performance by up to 70%. (Brian White) #7077 * timers: `setImmediate()` is now up to 150% faster in some situations. (Andras) #6436 * util: Added a `breakLength` option to `util.inspect()` to control how objects are formatted across lines. (cjihrig) #7499 * v8-inspector: Experimental support has been added for debugging Node.js over the inspector protocol. (Ali Ijaz Sheikh) #6792 - *Note: This feature is experimental, and it could be altered or removed.* - You can try this feature by running Node.js with the `--inspect` flag. Refs: #7441 PR-URL: #7550
Notable changes: * buffer: Added `buffer.swap64()` to compliment `swap16()` & `swap32()`. (Zach Bjornson) #7157 * build: New `configure` options have been added for building Node.js as a shared library. (Stefan Budeanu) #6994 - The options are: `--shared`, `--without-v8-platform` & `--without-bundled-v8`. * crypto: Root certificates have been updated. (Ben Noordhuis) #7363 * debugger: The server address is now configurable via `--debug=<address>:<port>`. (Ben Noordhuis) #3316 * npm: Upgraded npm to v3.10.3 (Kat Marchán) #7515 & (Rebecca Turner) #7410 * readline: Added the `prompt` option to the readline constructor. (Evan Lucas) #7125 * repl / vm: `sigint`/`ctrl+c` will now break out of infinite loops without stopping the Node.js instance. (Anna Henningsen) #6635 * src: - Added a `node::FreeEnvironment` public C++ API. (Cheng Zhao) #3098 - Refactored `require('constants')`, constants are now available directly from their respective modules. (James M Snell) #6534 * stream: Improved `readable.read()` performance by up to 70%. (Brian White) #7077 * timers: `setImmediate()` is now up to 150% faster in some situations. (Andras) #6436 * util: Added a `breakLength` option to `util.inspect()` to control how objects are formatted across lines. (cjihrig) #7499 * v8-inspector: Experimental support has been added for debugging Node.js over the inspector protocol. (Ali Ijaz Sheikh) #6792 - *Note: This feature is experimental, and it could be altered or removed.* - You can try this feature by running Node.js with the `--inspect` flag. Refs: #7441 PR-URL: #7550
### Notable changes * **buffer**: Added `buffer.swap64()` to compliment `swap16()` & `swap32()`. (Zach Bjornson) [#7157](nodejs/node#7157) * **build**: New `configure` options have been added for building Node.js as a shared library. (Stefan Budeanu) [#6994](nodejs/node#6994) - The options are: `--shared`, `--without-v8-platform` & `--without-bundled-v8`. * **crypto**: Root certificates have been updated. (Ben Noordhuis) [#7363](nodejs/node#7363) * **debugger**: The server address is now configurable via `--debug=<address>:<port>`. (Ben Noordhuis) [#3316](nodejs/node#3316) * **npm**: Upgraded npm to v3.10.3 (Kat Marchán) [#7515](nodejs/node#7515) & (Rebecca Turner) [#7410](nodejs/node#7410) * **readline**: Added the `prompt` option to the readline constructor. (Evan Lucas) [#7125](nodejs/node#7125) * **repl / vm**: `sigint`/`ctrl+c` will now break out of infinite loops without stopping the Node.js instance. (Anna Henningsen) [#6635](nodejs/node#6635) * **src**: - Added a `node::FreeEnvironment` public C++ API. (Cheng Zhao) [#3098](nodejs/node#3098) - Refactored `require('constants')`, constants are now available directly from their respective modules. (James M Snell) [#6534](nodejs/node#6534) * **stream**: Improved `readable.read()` performance by up to 70%. (Brian White) [#7077](nodejs/node#7077) * **timers**: `setImmediate()` is now up to 150% faster in some situations. (Andras) [#6436](nodejs/node#6436) * **util**: Added a `breakLength` option to `util.inspect()` to control how objects are formatted across lines. (cjihrig) [#7499](nodejs/node#7499) * **v8-inspector**: Experimental support has been added for debugging Node.js over the inspector protocol. (Ali Ijaz Sheikh) [#6792](nodejs/node#6792) - **Note: This feature is _experimental_, and it could be altered or removed.** - You can try this feature by running Node.js with the `--inspect` flag.
Said builtins are not supported by older versions of apple-gcc, breaking the build on OS X 10.8. Fixes: nodejs#7618 Refs: nodejs#4290 Refs: nodejs#7157
Checklist
Affected core subsystem(s)
Buffer (swap16 and swap32 methods)
Description of change
Float64Arrays, which I do routinely.)Tested with gcc/g++ 4.8.3, clang 3.4 and MSVC 2015 (i.e. something like #4284 shouldn't happen here). Also tested the macro definitions for the
#elseand__linux__cases on Ubuntu. I don't know if there's a non-gcc, non-clang linux compiler we need to support; if not, then the__linux__case could be removed.If/when this goes in, I plan to try using the new macros to speed up the buffer.read/write* methods.
Benchmark Results
16 bit types
32 bit types
64 bit types