-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hash: Add SHA-NI implementation of SHA-256 #15152
Conversation
@@ -33,7 +33,7 @@ else | |||
PHP_HASH_CFLAGS="$PHP_HASH_CFLAGS -I@ext_srcdir@/$SHA3_DIR -DKeccakP200_excluded -DKeccakP400_excluded -DKeccakP800_excluded -DZEND_ENABLE_STATIC_TSRMLS_CACHE=1" | |||
fi | |||
|
|||
EXT_HASH_SOURCES="hash.c hash_md.c hash_sha.c hash_ripemd.c hash_haval.c \ | |||
EXT_HASH_SOURCES="hash.c hash_md.c hash_sha.c hash_sha_sse2.c hash_sha_ni.c hash_ripemd.c hash_haval.c \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have support for this on Windows, too; maybe just try to add these files to ext/hash/config.w32 and see what CI makes of it. At least SSE2 shouldn't be an issue on Windows, since the build system defaults to this for many years (CI even requires AVX2 instructions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem are the static restrict
qualifiers when some array arguments are declared. Removing these makes the build pass. I've pushed a quick fix, but according to https://stackoverflow.com/questions/53863084/what-optimization-benefit-does-pointerrestrict-static-1-bring-when-declare-a this construct is supported by C, but not by C++, so the #ifdef
would need to be fixed. I'm not sure about that, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you been able to verify if the execution on Windows has become faster / uses the SSE / SHA-NI implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A default build runs the SSE2 code branch. An AVX2 build still does for me (not sure if I have support for Intel SHA instructions). I'll check that, but first a couple of Windows build issues need to be resolved (e.g. HAVE_IMMINTRISIC_H
is never defined, although that header file is apparently available (Windows SDK 10.0.20348.0).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, that whole AVX detection is apparently pretty much messed up in php-src. Prior to this PR, HAVE_IMMINTRISIC_H
is only used once in zend_portability.h, to conditionally define PHP_HAVE_AVX2
(but not for MSVC), which is then used to conditionally define ZEND_INTRIN_AVX2_RESOLVER
(which is always done on Windows except for ARM64).
If immintrin.h is included, it is guarded by __SSE2__
, __SSE3__
, __AVX__
, __AVX2__
, ZEND_INTRIN_AVX2_NATIVE
, or ZEND_INTRIN_AVX2_RESOLVER
. Not sure how to generally solve this, but maybe this PR should not use HAVE_IMMINTRIN_H
header at all. although it would be possible to define that symbol on Windows (probably unconditionally).
And there is another issue, namely in hash_sha_ni.c and php_hash_sha.h we have #if (defined(__i386__) || defined(__x86_64__)) && defined(HAVE_IMMINTRIN_H)
. but these are never defined by MSVC. Not sure if we have some portable symbols for that; otherwise __WIN32
and __WIN64
could be used (although the latter is also defined for ARM64 which is probably an issue).
Anyway, I've just skipped these checks for a test build, verified that SHA256_Transform_shani()
is actually called, and ext/hash/tests/sha256.phpt passed. \o/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, that whole AVX detection is apparently pretty much messed up in php-src.
Yeah, that's what I meant by the open task item in the original issue: I have no idea what I was doing here and the existing source files where the intrinsics are used didn't help either, because they are preprocessor hell.
So I just did what looked reasonable and worked for me, hoping for someone knowledgeable to advice.
@cperciva Thank you for your review, clarification and approval regarding the use of the SSE2 version. I've made the adjustments and also adjusted the SSE2 comment regarding cmb's modification to make it compile on Windows.
Yes, my fault. I didn't mention that in the email on Friday, because I've only decided today to include the SSE2 version, after realizing it was simple enough to include with the existing build system and given it provided a measurable improvement compared to the pure-C variant in my tests. I can't test it myself and thus didn't include it, but if we wanted to also add your ARM implementation in the future, how would the copyright note need to look like? |
Same as the SSE2 version; "Copyright 2021 Tarsnap Backup Inc." with the same BSD license header. That was also written by Graham. |
Wouldn't be better to do the same as Python and rather use OpenSSL implementation (which is already ASM optimized and doesn't require any extra maintenance) and use our own implementation only if OpenSSL is not available (not needing NI)? |
As far as these build system files are concerned no worries, all looks good here. I'll just probably sort them also for the ext/hash extension so the diffs in the future are simpler to handle :D. About the implementation itself, I have no idea if there's anything wrong here. I think it looks good so far. Yes, if this can be added to PHP-8.4 that would be really nice add-on. 👍 Considering that we've slightly missed the time frame for the BLAKE 3 hash in the 13194 PR by @divinity76. |
@bukka I don't see how this would cause any extra maintenance after the initial implementation / getting the feature detection to work correctly [1]. I'm intentionally using an existing implementation instead of writing my own, the CPU does the heavy lifting and for a cryptographic hash it's really immediately obvious when the algorithm is incorrectly implemented. I don't believe that using OpenSSL, which is optional for PHP, would result in any simpler implementation, given that we can't rely on it being there. This PR keeps everything reasonably self-contained. [1] The latter of which is something we need to understand anyways. |
@TimWolla The problem with shipping pre-existing implementations is that any security issues in these implementations become automatically security issues in PHP potentially impacting many users. That fully showed in Keccak ref implementation which needed following fix 248f647 . Python was impacted too but the amount of impacted users was minimal because they prefere OpenSSL implementation if avaialable (which is most installs and in PHP case it would be even bigger majority). OpenSSL was not impacted by this issue. So this is more about security and the fact that OpenSSL gets bigger review of the provided algorithms. |
const uint8_t block[PHP_STATIC_RESTRICT 64], uint32_t W[PHP_STATIC_RESTRICT 64], | ||
uint32_t S[PHP_STATIC_RESTRICT 8]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CS: spaces - I realise that this is take from elsewhere but I guess you converted the rest to tabs so makes sense to do this one as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made the changes to the header file, given it is PHP-specific.
In this case, the tabs and spaces appear as-is in the original and adjusting just some of the style needlessly adds changes without any value-add:
In this case the implementation just switches out the SHA256Transform function, which works with fixed-size buffers and doesn't use user-controlled inputs for control flow. This rules out the most common class of vulnerability. I'm positive that this implementation is much simpler and much easier to review compared to an implementation that dynamically uses the OpenSSL functionality when the OpenSSL extension is loaded. |
While I think that @bukka makes good arguments (especially regarding security), there could be a problem with the official Windows builds. ext/hash is statically built in, and so far we refrained from adding dependencies to OpenSSL from the core php8.dll (see #5129 (comment)). So in practice, Windows users wouldn't benefit from performance improvements. An option might be to add this implementation to winlibs, but then it might get even less attention. |
Ok fair enough. I'm cool with this ;) |
Shouldn't this be included in https://github.com/php/php-src/blob/master/README.REDIST.BINS ? |
If we are going to accept this PR, I can finish up the Windows integration (and do some performance tests), but I would like some clarification about #15152 (comment) and particularly about #15152 (comment). The former is only about the |
Thank you, added a reference to that file. |
Upstream made a small formatting fix after I reported it (Tarsnap/libcperciva@661752a). I've just synced the implementation and used the opportunity to clean up the commit history by squashing the commits and rebasing on the latest master. From my side this just leaves the Windows integration / the feature detection handling, which cmb mentioned. Unfortunately I can't advice there. |
I've just pushed a commit to replace the remaining uses of
|
@cmb69 Thank you. My understanding is that with the current state of this PR, the SSE2 implementation would be functional for Windows, but the SHA-NI implementation not? Is there anything else you'd like to do for now or do you believe that this PR is in a mergeable state and anything else could be adjusted in a follow-up? I believe it would be nice being able to include it in Beta 1 to give it some exposure. |
On Windows, you will usually get the SSE2 implementation (official builds already rely on SSE2; only custom builds could switch that off). And it seems to be impossible get the SHA-NI implementation (unless changing the source code) even if you do AVX2 builds (what we're doing in CI). Some quick performance tests (modified ext/hash/bench.php to only use "sha256") show a slight improvement for SSE2 (~4.2s), but a huge improvement for SHA-NI (~0.7s), compared to the unoptimized implementation (~4.8s). And with the SHA-NI implementation, "sha256" is way faster than "sha1" or even "md5" what appears to be a very good thing.
Definitely! Even as is, I think many users will profit from it, and with the official Windows builds SHA-NI wouldn't be supported anyway (we may consider to distribute AVX2 builds), and we can still catch up on fixing the AVX/AVX2 detection somewhat later. |
Yes, that was one of the goals of this patch: Give users more reasons to use a more secure hashing algorithm. I take your comment as a confirmation that you are happy with the PR / approve of it then? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take your comment as a confirmation that you are happy with the PR / approve of it then?
I can't really comment on the actual implementation, but assume that libcperciva got it right, so the only show-stopper would be the amount of code being used, and when I compare that with xxhash.h, I don't think anybody will object to merge this PR.
Note that I just filed #15292. Maybe the AVX detection works on Windows when this is properly fixed, but that can be checked after merging this.
Thanks for your work, and thanks to @cperciva for giving permission to use this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't check the implementation of the library code, but mostly looks good.
I only have one remark to add.
Implementation taken from Tarsnap/libcperciva@661752a. Co-authored-by: Christoph M. Becker <cmbecker69@gmx.de> Co-authored-by: Niels Dossche <7771979+nielsdos@users.noreply.github.com>
Implementation taken from Tarsnap/libcperciva@661752a. Co-authored-by: Christoph M. Becker <cmbecker69@gmx.de>
I've added both of your names to the NEWS entry: 6f4bc0a |
Not only MSVC doesn't support this construct, but apparently it is generally not supported by C++ compilers. Closes GH-15745.
This PR adds a SHA-NI implementation for SHA-256, greatly improving its performance (2× to 5×). It also adds a SSE2 implementation for CPUs that do not support SHA-NI, which improves the performance, but in a much less competitive fashion.
Both implementations are taken from Tarsnap/libcperciva. Before creating the implementation, I have reached out to the author Dr. Colin Percival (@cperciva) regarding the inclusion in PHP and how to give proper attribution. Before merging this PR, I'll reach out once more to get an official approval that the license comments look correct.
Open Tasks
Benchmarks
Test script:
Running on a Intel(R) Core(TM) i7-1365U PHP configured as
./configure --enable-zend-test --enable-option-checking=fatal --enable-phpdbg --enable-fpm
, using the SHA-NI implementation with the CPU support check.If you want to test this yourself, you can check whether your CPU supports SHA-NI by using:
Before
Full Execution
After
Full Execution
Direct Comparison
Full Execution