Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hash: Add SHA-NI implementation of SHA-256 #15152

Merged
merged 4 commits into from
Aug 8, 2024
Merged

Conversation

TimWolla
Copy link
Member

@TimWolla TimWolla commented Jul 29, 2024

This PR adds a SHA-NI implementation for SHA-256, greatly improving its performance (2× to 5×). It also adds a SSE2 implementation for CPUs that do not support SHA-NI, which improves the performance, but in a much less competitive fashion.

Both implementations are taken from Tarsnap/libcperciva. Before creating the implementation, I have reached out to the author Dr. Colin Percival (@cperciva) regarding the inclusion in PHP and how to give proper attribution. Before merging this PR, I'll reach out once more to get an official approval that the license comments look correct.

Open Tasks

  • I'm not at all sure if the way I've integrated this with the build process is any good. If someone more experienced with that could give me some helpful pointers, I'd appreciate that. @petk perhaps?

Benchmarks

Test script:

<?php

$algorithm = $_SERVER['argv'][1];
$data = str_repeat('x', $_SERVER['argv'][2]);

for ($i = 0; $i < $_SERVER['argv'][3]; $i++) {
    hash($algorithm, $data);
}

Running on a Intel(R) Core(TM) i7-1365U PHP configured as ./configure --enable-zend-test --enable-option-checking=fatal --enable-phpdbg --enable-fpm, using the SHA-NI implementation with the CPU support check.

If you want to test this yourself, you can check whether your CPU supports SHA-NI by using:

cat /proc/cpuinfo |grep sha_ni

Before

Summary
  /tmp/master test.php md5 16 999999 ran
    1.15 ± 0.04 times faster than /tmp/master test.php sha1 16 999999
    2.09 ± 0.04 times faster than /tmp/master test.php sha256 16 999999
    2.68 ± 0.06 times faster than /tmp/master test.php sha512 16 999999

Summary
  /tmp/master test.php md5 100 499999 ran
    1.10 ± 0.05 times faster than /tmp/master test.php sha1 100 499999
    1.71 ± 0.05 times faster than /tmp/master test.php sha512 100 499999
    2.21 ± 0.07 times faster than /tmp/master test.php sha256 100 499999

Summary
  /tmp/master test.php sha1 1024 99999 ran
    1.04 ± 0.04 times faster than /tmp/master test.php md5 1024 99999
    1.85 ± 0.07 times faster than /tmp/master test.php sha512 1024 99999
    2.55 ± 0.09 times faster than /tmp/master test.php sha256 1024 99999

Summary
  /tmp/master test.php sha1 10240 9999 ran
    1.07 ± 0.02 times faster than /tmp/master test.php md5 10240 9999
    1.78 ± 0.04 times faster than /tmp/master test.php sha512 10240 9999
    2.65 ± 0.05 times faster than /tmp/master test.php sha256 10240 9999
Full Execution
$ hyperfine -L algorithm md5,sha1,sha256,sha512 '/tmp/master test.php {algorithm} 16 999999'
  hyperfine -L algorithm md5,sha1,sha256,sha512 '/tmp/master test.php {algorithm} 100 499999'
  hyperfine -L algorithm md5,sha1,sha256,sha512 '/tmp/master test.php {algorithm} 1024 99999'
  hyperfine -L algorithm md5,sha1,sha256,sha512 '/tmp/master test.php {algorithm} 10240 9999'
Benchmark 1: /tmp/master test.php md5 16 999999
  Time (mean ± σ):     162.2 ms ±   2.9 ms    [User: 159.2 ms, System: 2.9 ms]
  Range (min … max):   160.6 ms … 172.8 ms    18 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: /tmp/master test.php sha1 16 999999
  Time (mean ± σ):     187.3 ms ±   5.9 ms    [User: 185.4 ms, System: 1.8 ms]
  Range (min … max):   183.3 ms … 206.9 ms    16 runs
 
Benchmark 3: /tmp/master test.php sha256 16 999999
  Time (mean ± σ):     338.7 ms ±   2.5 ms    [User: 336.3 ms, System: 2.0 ms]
  Range (min … max):   335.7 ms … 343.1 ms    10 runs
 
Benchmark 4: /tmp/master test.php sha512 16 999999
  Time (mean ± σ):     434.1 ms ±   4.8 ms    [User: 430.9 ms, System: 2.9 ms]
  Range (min … max):   429.5 ms … 444.3 ms    10 runs
 
Summary
  /tmp/master test.php md5 16 999999 ran
    1.15 ± 0.04 times faster than /tmp/master test.php sha1 16 999999
    2.09 ± 0.04 times faster than /tmp/master test.php sha256 16 999999
    2.68 ± 0.06 times faster than /tmp/master test.php sha512 16 999999
Benchmark 1: /tmp/master test.php md5 100 499999
  Time (mean ± σ):     128.5 ms ±   3.0 ms    [User: 126.1 ms, System: 2.1 ms]
  Range (min … max):   126.6 ms … 137.9 ms    21 runs
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (137.2 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
 
Benchmark 2: /tmp/master test.php sha1 100 499999
  Time (mean ± σ):     141.1 ms ±   4.8 ms    [User: 138.9 ms, System: 1.9 ms]
  Range (min … max):   138.8 ms … 161.0 ms    21 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 3: /tmp/master test.php sha256 100 499999
  Time (mean ± σ):     284.5 ms ±   5.0 ms    [User: 282.1 ms, System: 2.1 ms]
  Range (min … max):   282.0 ms … 298.5 ms    10 runs
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (298.5 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
 
Benchmark 4: /tmp/master test.php sha512 100 499999
  Time (mean ± σ):     220.4 ms ±   4.5 ms    [User: 217.6 ms, System: 2.5 ms]
  Range (min … max):   217.2 ms … 232.7 ms    13 runs
 
Summary
  /tmp/master test.php md5 100 499999 ran
    1.10 ± 0.05 times faster than /tmp/master test.php sha1 100 499999
    1.71 ± 0.05 times faster than /tmp/master test.php sha512 100 499999
    2.21 ± 0.07 times faster than /tmp/master test.php sha256 100 499999
Benchmark 1: /tmp/master test.php md5 1024 99999
  Time (mean ± σ):     159.0 ms ±   3.0 ms    [User: 156.6 ms, System: 2.1 ms]
  Range (min … max):   157.1 ms … 167.2 ms    18 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: /tmp/master test.php sha1 1024 99999
  Time (mean ± σ):     152.1 ms ±   4.8 ms    [User: 149.7 ms, System: 2.2 ms]
  Range (min … max):   148.6 ms … 170.2 ms    19 runs
 
Benchmark 3: /tmp/master test.php sha256 1024 99999
  Time (mean ± σ):     388.1 ms ±   5.6 ms    [User: 384.7 ms, System: 3.2 ms]
  Range (min … max):   384.4 ms … 402.8 ms    10 runs
 
Benchmark 4: /tmp/master test.php sha512 1024 99999
  Time (mean ± σ):     281.2 ms ±   5.3 ms    [User: 278.1 ms, System: 2.8 ms]
  Range (min … max):   277.7 ms … 294.2 ms    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  /tmp/master test.php sha1 1024 99999 ran
    1.04 ± 0.04 times faster than /tmp/master test.php md5 1024 99999
    1.85 ± 0.07 times faster than /tmp/master test.php sha512 1024 99999
    2.55 ± 0.09 times faster than /tmp/master test.php sha256 1024 99999
Benchmark 1: /tmp/master test.php md5 10240 9999
  Time (mean ± σ):     142.6 ms ±   1.3 ms    [User: 140.0 ms, System: 2.4 ms]
  Range (min … max):   141.9 ms … 148.2 ms    21 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: /tmp/master test.php sha1 10240 9999
  Time (mean ± σ):     133.9 ms ±   2.6 ms    [User: 131.4 ms, System: 2.2 ms]
  Range (min … max):   131.4 ms … 144.4 ms    22 runs
 
Benchmark 3: /tmp/master test.php sha256 10240 9999
  Time (mean ± σ):     354.3 ms ±   1.3 ms    [User: 352.5 ms, System: 1.7 ms]
  Range (min … max):   352.8 ms … 356.5 ms    10 runs
 
Benchmark 4: /tmp/master test.php sha512 10240 9999
  Time (mean ± σ):     238.0 ms ±   1.2 ms    [User: 235.1 ms, System: 2.5 ms]
  Range (min … max):   237.3 ms … 241.5 ms    12 runs
 
Summary
  /tmp/master test.php sha1 10240 9999 ran
    1.07 ± 0.02 times faster than /tmp/master test.php md5 10240 9999
    1.78 ± 0.04 times faster than /tmp/master test.php sha512 10240 9999
    2.65 ± 0.05 times faster than /tmp/master test.php sha256 10240 9999

After

Summary
  sapi/cli/php test.php sha256 16 999999 ran
    1.15 ± 0.04 times faster than sapi/cli/php test.php md5 16 999999
    1.30 ± 0.07 times faster than sapi/cli/php test.php sha1 16 999999
    2.96 ± 0.05 times faster than sapi/cli/php test.php sha512 16 999999

Summary
  sapi/cli/php test.php sha256 100 499999 ran
    1.31 ± 0.03 times faster than sapi/cli/php test.php md5 100 499999
    1.45 ± 0.12 times faster than sapi/cli/php test.php sha1 100 499999
    2.24 ± 0.06 times faster than sapi/cli/php test.php sha512 100 499999

Summary
  sapi/cli/php test.php sha256 1024 99999 ran
    1.88 ± 0.06 times faster than sapi/cli/php test.php sha1 1024 99999
    1.97 ± 0.06 times faster than sapi/cli/php test.php md5 1024 99999
    3.49 ± 0.10 times faster than sapi/cli/php test.php sha512 1024 99999

Summary
  sapi/cli/php test.php sha256 10240 9999 ran
    1.98 ± 0.05 times faster than sapi/cli/php test.php sha1 10240 9999
    2.11 ± 0.05 times faster than sapi/cli/php test.php md5 10240 9999
    3.52 ± 0.08 times faster than sapi/cli/php test.php sha512 10240 9999
Full Execution
$ hyperfine -L algorithm md5,sha1,sha256,sha512 'sapi/cli/php test.php {algorithm} 16 999999'
  hyperfine -L algorithm md5,sha1,sha256,sha512 'sapi/cli/php test.php {algorithm} 100 499999'
  hyperfine -L algorithm md5,sha1,sha256,sha512 'sapi/cli/php test.php {algorithm} 1024 99999'
  hyperfine -L algorithm md5,sha1,sha256,sha512 'sapi/cli/php test.php {algorithm} 10240 9999'
Benchmark 1: sapi/cli/php test.php md5 16 999999
  Time (mean ± σ):     169.0 ms ±   4.9 ms    [User: 167.3 ms, System: 1.5 ms]
  Range (min … max):   165.3 ms … 181.8 ms    17 runs
 
Benchmark 2: sapi/cli/php test.php sha1 16 999999
  Time (mean ± σ):     191.1 ms ±  10.4 ms    [User: 188.8 ms, System: 1.9 ms]
  Range (min … max):   184.0 ms … 221.3 ms    15 runs
 
Benchmark 3: sapi/cli/php test.php sha256 16 999999
  Time (mean ± σ):     146.6 ms ±   2.6 ms    [User: 144.4 ms, System: 2.0 ms]
  Range (min … max):   143.7 ms … 156.6 ms    20 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 4: sapi/cli/php test.php sha512 16 999999
  Time (mean ± σ):     434.0 ms ±   2.0 ms    [User: 432.6 ms, System: 1.3 ms]
  Range (min … max):   431.7 ms … 437.9 ms    10 runs
 
Summary
  sapi/cli/php test.php sha256 16 999999 ran
    1.15 ± 0.04 times faster than sapi/cli/php test.php md5 16 999999
    1.30 ± 0.07 times faster than sapi/cli/php test.php sha1 16 999999
    2.96 ± 0.05 times faster than sapi/cli/php test.php sha512 16 999999
Benchmark 1: sapi/cli/php test.php md5 100 499999
  Time (mean ± σ):     130.0 ms ±   3.1 ms    [User: 127.7 ms, System: 2.2 ms]
  Range (min … max):   127.6 ms … 138.3 ms    23 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: sapi/cli/php test.php sha1 100 499999
  Time (mean ± σ):     142.9 ms ±  11.9 ms    [User: 140.8 ms, System: 2.0 ms]
  Range (min … max):   138.8 ms … 191.9 ms    21 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 3: sapi/cli/php test.php sha256 100 499999
  Time (mean ± σ):      98.8 ms ±   1.1 ms    [User: 96.3 ms, System: 2.1 ms]
  Range (min … max):    97.7 ms … 101.8 ms    30 runs
 
Benchmark 4: sapi/cli/php test.php sha512 100 499999
  Time (mean ± σ):     221.7 ms ±   5.5 ms    [User: 219.2 ms, System: 2.2 ms]
  Range (min … max):   218.9 ms … 239.6 ms    13 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  sapi/cli/php test.php sha256 100 499999 ran
    1.31 ± 0.03 times faster than sapi/cli/php test.php md5 100 499999
    1.45 ± 0.12 times faster than sapi/cli/php test.php sha1 100 499999
    2.24 ± 0.06 times faster than sapi/cli/php test.php sha512 100 499999
Benchmark 1: sapi/cli/php test.php md5 1024 99999
  Time (mean ± σ):     159.3 ms ±   3.6 ms    [User: 156.5 ms, System: 2.6 ms]
  Range (min … max):   156.9 ms … 169.6 ms    19 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: sapi/cli/php test.php sha1 1024 99999
  Time (mean ± σ):     152.3 ms ±   3.7 ms    [User: 149.8 ms, System: 2.0 ms]
  Range (min … max):   148.5 ms … 163.4 ms    19 runs
 
Benchmark 3: sapi/cli/php test.php sha256 1024 99999
  Time (mean ± σ):      80.8 ms ±   1.5 ms    [User: 78.5 ms, System: 2.1 ms]
  Range (min … max):    79.7 ms …  86.8 ms    37 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 4: sapi/cli/php test.php sha512 1024 99999
  Time (mean ± σ):     281.8 ms ±   5.8 ms    [User: 278.2 ms, System: 3.3 ms]
  Range (min … max):   278.3 ms … 296.7 ms    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  sapi/cli/php test.php sha256 1024 99999 ran
    1.88 ± 0.06 times faster than sapi/cli/php test.php sha1 1024 99999
    1.97 ± 0.06 times faster than sapi/cli/php test.php md5 1024 99999
    3.49 ± 0.10 times faster than sapi/cli/php test.php sha512 1024 99999
Benchmark 1: sapi/cli/php test.php md5 10240 9999
  Time (mean ± σ):     142.9 ms ±   1.7 ms    [User: 139.1 ms, System: 3.4 ms]
  Range (min … max):   141.9 ms … 147.8 ms    20 runs
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (147.6 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
 
Benchmark 2: sapi/cli/php test.php sha1 10240 9999
  Time (mean ± σ):     133.8 ms ±   1.9 ms    [User: 131.0 ms, System: 2.4 ms]
  Range (min … max):   130.5 ms … 140.5 ms    22 runs
 
Benchmark 3: sapi/cli/php test.php sha256 10240 9999
  Time (mean ± σ):      67.6 ms ±   1.5 ms    [User: 65.0 ms, System: 2.5 ms]
  Range (min … max):    66.5 ms …  74.0 ms    44 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 4: sapi/cli/php test.php sha512 10240 9999
  Time (mean ± σ):     237.7 ms ±   1.3 ms    [User: 233.7 ms, System: 3.8 ms]
  Range (min … max):   236.3 ms … 241.1 ms    12 runs
 
Summary
  sapi/cli/php test.php sha256 10240 9999 ran
    1.98 ± 0.05 times faster than sapi/cli/php test.php sha1 10240 9999
    2.11 ± 0.05 times faster than sapi/cli/php test.php md5 10240 9999
    3.52 ± 0.08 times faster than sapi/cli/php test.php sha512 10240 9999

Direct Comparison

Summary
  sapi/cli/php test.php sha256 16 999999 ran
    2.32 ± 0.05 times faster than /tmp/master test.php sha256 16 999999

Summary
  sapi/cli/php test.php sha256 100 499999 ran
    2.86 ± 0.05 times faster than /tmp/master test.php sha256 100 499999

Summary
  sapi/cli/php test.php sha256 1024 99999 ran
    4.83 ± 0.17 times faster than /tmp/master test.php sha256 1024 99999

Summary
  sapi/cli/php test.php sha256 10240 9999 ran
    5.27 ± 0.24 times faster than /tmp/master test.php sha256 10240 9999
Full Execution
$ hyperfine 'sapi/cli/php test.php sha256 16 999999' '/tmp/master test.php sha256 16 999999'
  hyperfine 'sapi/cli/php test.php sha256 100 499999' '/tmp/master test.php sha256 100 499999'
  hyperfine 'sapi/cli/php test.php sha256 1024 99999' '/tmp/master test.php sha256 1024 99999'
  hyperfine 'sapi/cli/php test.php sha256 10240 9999' '/tmp/master test.php sha256 10240 9999'
Benchmark 1: sapi/cli/php test.php sha256 16 999999
  Time (mean ± σ):     147.8 ms ±   1.7 ms    [User: 145.5 ms, System: 2.1 ms]
  Range (min … max):   145.8 ms … 151.9 ms    19 runs
 
Benchmark 2: /tmp/master test.php sha256 16 999999
  Time (mean ± σ):     342.7 ms ±   6.6 ms    [User: 340.4 ms, System: 2.1 ms]
  Range (min … max):   335.7 ms … 354.7 ms    10 runs
 
Summary
  sapi/cli/php test.php sha256 16 999999 ran
    2.32 ± 0.05 times faster than /tmp/master test.php sha256 16 999999
Benchmark 1: sapi/cli/php test.php sha256 100 499999
  Time (mean ± σ):      99.1 ms ±   1.7 ms    [User: 96.6 ms, System: 2.3 ms]
  Range (min … max):    97.5 ms … 104.2 ms    30 runs
 
Benchmark 2: /tmp/master test.php sha256 100 499999
  Time (mean ± σ):     283.6 ms ±   2.1 ms    [User: 281.5 ms, System: 2.0 ms]
  Range (min … max):   281.6 ms … 287.7 ms    10 runs
 
Summary
  sapi/cli/php test.php sha256 100 499999 ran
    2.86 ± 0.05 times faster than /tmp/master test.php sha256 100 499999
Benchmark 1: sapi/cli/php test.php sha256 1024 99999
  Time (mean ± σ):      81.1 ms ±   2.4 ms    [User: 78.3 ms, System: 2.6 ms]
  Range (min … max):    79.7 ms …  92.5 ms    37 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: /tmp/master test.php sha256 1024 99999
  Time (mean ± σ):     391.6 ms ±   8.0 ms    [User: 388.9 ms, System: 2.5 ms]
  Range (min … max):   384.8 ms … 408.8 ms    10 runs
 
Summary
  sapi/cli/php test.php sha256 1024 99999 ran
    4.83 ± 0.17 times faster than /tmp/master test.php sha256 1024 99999
Benchmark 1: sapi/cli/php test.php sha256 10240 9999
  Time (mean ± σ):      68.4 ms ±   2.1 ms    [User: 65.5 ms, System: 2.5 ms]
  Range (min … max):    67.0 ms …  76.2 ms    44 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: /tmp/master test.php sha256 10240 9999
  Time (mean ± σ):     360.7 ms ±  11.9 ms    [User: 357.5 ms, System: 2.9 ms]
  Range (min … max):   352.9 ms … 386.8 ms    10 runs
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (386.8 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
 
Summary
  sapi/cli/php test.php sha256 10240 9999 ran
    5.27 ± 0.24 times faster than /tmp/master test.php sha256 10240 9999

@@ -33,7 +33,7 @@ else
PHP_HASH_CFLAGS="$PHP_HASH_CFLAGS -I@ext_srcdir@/$SHA3_DIR -DKeccakP200_excluded -DKeccakP400_excluded -DKeccakP800_excluded -DZEND_ENABLE_STATIC_TSRMLS_CACHE=1"
fi

EXT_HASH_SOURCES="hash.c hash_md.c hash_sha.c hash_ripemd.c hash_haval.c \
EXT_HASH_SOURCES="hash.c hash_md.c hash_sha.c hash_sha_sse2.c hash_sha_ni.c hash_ripemd.c hash_haval.c \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have support for this on Windows, too; maybe just try to add these files to ext/hash/config.w32 and see what CI makes of it. At least SSE2 shouldn't be an issue on Windows, since the build system defaults to this for many years (CI even requires AVX2 instructions).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the suggested change and pushed. I don't use Windows myself, though. Please check yourself and push any necessary fixes into my branch (pushing is enabled for maintainers):

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem are the static restrict qualifiers when some array arguments are declared. Removing these makes the build pass. I've pushed a quick fix, but according to https://stackoverflow.com/questions/53863084/what-optimization-benefit-does-pointerrestrict-static-1-bring-when-declare-a this construct is supported by C, but not by C++, so the #ifdef would need to be fixed. I'm not sure about that, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you been able to verify if the execution on Windows has become faster / uses the SSE / SHA-NI implementation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A default build runs the SSE2 code branch. An AVX2 build still does for me (not sure if I have support for Intel SHA instructions). I'll check that, but first a couple of Windows build issues need to be resolved (e.g. HAVE_IMMINTRISIC_H is never defined, although that header file is apparently available (Windows SDK 10.0.20348.0).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, that whole AVX detection is apparently pretty much messed up in php-src. Prior to this PR, HAVE_IMMINTRISIC_H is only used once in zend_portability.h, to conditionally define PHP_HAVE_AVX2 (but not for MSVC), which is then used to conditionally define ZEND_INTRIN_AVX2_RESOLVER (which is always done on Windows except for ARM64).

If immintrin.h is included, it is guarded by __SSE2__, __SSE3__, __AVX__, __AVX2__, ZEND_INTRIN_AVX2_NATIVE, or ZEND_INTRIN_AVX2_RESOLVER. Not sure how to generally solve this, but maybe this PR should not use HAVE_IMMINTRIN_H header at all. although it would be possible to define that symbol on Windows (probably unconditionally).

And there is another issue, namely in hash_sha_ni.c and php_hash_sha.h we have #if (defined(__i386__) || defined(__x86_64__)) && defined(HAVE_IMMINTRIN_H). but these are never defined by MSVC. Not sure if we have some portable symbols for that; otherwise __WIN32 and __WIN64 could be used (although the latter is also defined for ARM64 which is probably an issue).

Anyway, I've just skipped these checks for a test build, verified that SHA256_Transform_shani() is actually called, and ext/hash/tests/sha256.phpt passed. \o/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, that whole AVX detection is apparently pretty much messed up in php-src.

Yeah, that's what I meant by the open task item in the original issue: I have no idea what I was doing here and the existing source files where the intrinsics are used didn't help either, because they are preprocessor hell.

So I just did what looked reasonable and worked for me, hoping for someone knowledgeable to advice.

@cperciva
Copy link

@TimWolla I didn't realize you were taking the sse2 version as well as the shani version. No objection and it's still owned by Tarsnap Backup Inc. but if you're keeping track of authors anywhere the sse2 version was written by @gperciva.

@TimWolla
Copy link
Member Author

@cperciva Thank you for your review, clarification and approval regarding the use of the SSE2 version. I've made the adjustments and also adjusted the SSE2 comment regarding cmb's modification to make it compile on Windows.

I didn't realize you were taking the sse2 version as well as the shani version.

Yes, my fault. I didn't mention that in the email on Friday, because I've only decided today to include the SSE2 version, after realizing it was simple enough to include with the existing build system and given it provided a measurable improvement compared to the pure-C variant in my tests.


I can't test it myself and thus didn't include it, but if we wanted to also add your ARM implementation in the future, how would the copyright note need to look like?

@cperciva
Copy link

I can't test it myself and thus didn't include it, but if we wanted to also add your ARM implementation in the future, how would the copyright note need to look like?

Same as the SSE2 version; "Copyright 2021 Tarsnap Backup Inc." with the same BSD license header. That was also written by Graham.

@bukka
Copy link
Member

bukka commented Jul 29, 2024

Wouldn't be better to do the same as Python and rather use OpenSSL implementation (which is already ASM optimized and doesn't require any extra maintenance) and use our own implementation only if OpenSSL is not available (not needing NI)?

@petk
Copy link
Member

petk commented Jul 29, 2024

I'm not at all sure if the way I've integrated this with the build process is any good. If someone more experienced with that could give me some helpful pointers, I'd appreciate that. @petk perhaps?

As far as these build system files are concerned no worries, all looks good here. I'll just probably sort them also for the ext/hash extension so the diffs in the future are simpler to handle :D. About the implementation itself, I have no idea if there's anything wrong here. I think it looks good so far. Yes, if this can be added to PHP-8.4 that would be really nice add-on. 👍 Considering that we've slightly missed the time frame for the BLAKE 3 hash in the 13194 PR by @divinity76.

@TimWolla
Copy link
Member Author

@bukka I don't see how this would cause any extra maintenance after the initial implementation / getting the feature detection to work correctly [1]. I'm intentionally using an existing implementation instead of writing my own, the CPU does the heavy lifting and for a cryptographic hash it's really immediately obvious when the algorithm is incorrectly implemented.

I don't believe that using OpenSSL, which is optional for PHP, would result in any simpler implementation, given that we can't rely on it being there. This PR keeps everything reasonably self-contained.

[1] The latter of which is something we need to understand anyways.

@bukka
Copy link
Member

bukka commented Jul 30, 2024

@TimWolla The problem with shipping pre-existing implementations is that any security issues in these implementations become automatically security issues in PHP potentially impacting many users. That fully showed in Keccak ref implementation which needed following fix 248f647 . Python was impacted too but the amount of impacted users was minimal because they prefere OpenSSL implementation if avaialable (which is most installs and in PHP case it would be even bigger majority). OpenSSL was not impacted by this issue. So this is more about security and the fact that OpenSSL gets bigger review of the provided algorithms.

Comment on lines +202 to +203
const uint8_t block[PHP_STATIC_RESTRICT 64], uint32_t W[PHP_STATIC_RESTRICT 64],
uint32_t S[PHP_STATIC_RESTRICT 8])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CS: spaces - I realise that this is take from elsewhere but I guess you converted the rest to tabs so makes sense to do this one as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the changes to the header file, given it is PHP-specific.

In this case, the tabs and spaces appear as-is in the original and adjusting just some of the style needlessly adds changes without any value-add:

https://github.com/Tarsnap/libcperciva/blob/af0ca13747e3a1250487bfe534c786dffa32cd54/alg/sha256_sse2.c#L182-L185

@TimWolla
Copy link
Member Author

The problem with shipping pre-existing implementations is that any security issues in these implementations become automatically security issues in PHP potentially impacting many users.

In this case the implementation just switches out the SHA256Transform function, which works with fixed-size buffers and doesn't use user-controlled inputs for control flow. This rules out the most common class of vulnerability.

I'm positive that this implementation is much simpler and much easier to review compared to an implementation that dynamically uses the OpenSSL functionality when the OpenSSL extension is loaded.

@cmb69
Copy link
Member

cmb69 commented Jul 30, 2024

While I think that @bukka makes good arguments (especially regarding security), there could be a problem with the official Windows builds. ext/hash is statically built in, and so far we refrained from adding dependencies to OpenSSL from the core php8.dll (see #5129 (comment)). So in practice, Windows users wouldn't benefit from performance improvements. An option might be to add this implementation to winlibs, but then it might get even less attention.

@bukka
Copy link
Member

bukka commented Jul 30, 2024

Ok fair enough. I'm cool with this ;)

@nielsdos
Copy link
Member

Shouldn't this be included in https://github.com/php/php-src/blob/master/README.REDIST.BINS ?

@cmb69
Copy link
Member

cmb69 commented Jul 31, 2024

If we are going to accept this PR, I can finish up the Windows integration (and do some performance tests), but I would like some clarification about #15152 (comment) and particularly about #15152 (comment). The former is only about the #ifdef PHP_WIN32 which might be plain wrong (possibly it should be something like #ifdef __cplusplus); the latter is likely more complex a subject.

@TimWolla
Copy link
Member Author

TimWolla commented Aug 1, 2024

Shouldn't this be included in https://github.com/php/php-src/blob/master/README.REDIST.BINS ?

Thank you, added a reference to that file.

@TimWolla
Copy link
Member Author

TimWolla commented Aug 1, 2024

Upstream made a small formatting fix after I reported it (Tarsnap/libcperciva@661752a). I've just synced the implementation and used the opportunity to clean up the commit history by squashing the commits and rebasing on the latest master.

From my side this just leaves the Windows integration / the feature detection handling, which cmb mentioned. Unfortunately I can't advice there.

@cmb69
Copy link
Member

cmb69 commented Aug 7, 2024

I've just pushed a commit to replace the remaining uses of static restrict with the PHP_STATIC_RESTRICT macro.

static restrict is definitely not supported by latest MSVC (at least not in this context) even with /std:c11. The #ifdef PHP_WIN32 is likely too strict, since at least in theory it should still possible to build with clang on Windows (although I did not try that for years, and #11313 has been closed without further notice), but we can still fix the condition later on, if necessary.

@TimWolla
Copy link
Member Author

TimWolla commented Aug 8, 2024

@cmb69 Thank you. My understanding is that with the current state of this PR, the SSE2 implementation would be functional for Windows, but the SHA-NI implementation not?

Is there anything else you'd like to do for now or do you believe that this PR is in a mergeable state and anything else could be adjusted in a follow-up?

I believe it would be nice being able to include it in Beta 1 to give it some exposure.

@cmb69
Copy link
Member

cmb69 commented Aug 8, 2024

On Windows, you will usually get the SSE2 implementation (official builds already rely on SSE2; only custom builds could switch that off). And it seems to be impossible get the SHA-NI implementation (unless changing the source code) even if you do AVX2 builds (what we're doing in CI).

Some quick performance tests (modified ext/hash/bench.php to only use "sha256") show a slight improvement for SSE2 (~4.2s), but a huge improvement for SHA-NI (~0.7s), compared to the unoptimized implementation (~4.8s). And with the SHA-NI implementation, "sha256" is way faster than "sha1" or even "md5" what appears to be a very good thing.

I believe it would be nice being able to include it in Beta 1 to give it some exposure.

Definitely! Even as is, I think many users will profit from it, and with the official Windows builds SHA-NI wouldn't be supported anyway (we may consider to distribute AVX2 builds), and we can still catch up on fixing the AVX/AVX2 detection somewhat later.

@TimWolla
Copy link
Member Author

TimWolla commented Aug 8, 2024

And with the SHA-NI implementation, "sha256" is way faster than "sha1" or even "md5" what appears to be a very good thing.

Yes, that was one of the goals of this patch: Give users more reasons to use a more secure hashing algorithm.


I take your comment as a confirmation that you are happy with the PR / approve of it then?

Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take your comment as a confirmation that you are happy with the PR / approve of it then?

I can't really comment on the actual implementation, but assume that libcperciva got it right, so the only show-stopper would be the amount of code being used, and when I compare that with xxhash.h, I don't think anybody will object to merge this PR.

Note that I just filed #15292. Maybe the AVX detection works on Windows when this is properly fixed, but that can be checked after merging this.

Thanks for your work, and thanks to @cperciva for giving permission to use this code.

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't check the implementation of the library code, but mostly looks good.
I only have one remark to add.

TimWolla and others added 4 commits August 8, 2024 21:31
Implementation taken from
Tarsnap/libcperciva@661752a.

Co-authored-by: Christoph M. Becker <cmbecker69@gmx.de>
Co-authored-by: Niels Dossche <7771979+nielsdos@users.noreply.github.com>
Implementation taken from
Tarsnap/libcperciva@661752a.

Co-authored-by: Christoph M. Becker <cmbecker69@gmx.de>
@TimWolla
Copy link
Member Author

TimWolla commented Aug 8, 2024

but if you're keeping track of authors anywhere

I've added both of your names to the NEWS entry: 6f4bc0a

@TimWolla TimWolla merged commit 6eca783 into php:master Aug 8, 2024
9 of 11 checks passed
@TimWolla TimWolla deleted the sha-ni branch August 8, 2024 20:19
cmb69 referenced this pull request Sep 9, 2024
Not only MSVC doesn't support this construct, but apparently it is
generally not supported by C++ compilers.

Closes GH-15745.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants