Skip to content

Conversation

@TysonAndre
Copy link
Contributor

Previously, PHP's unserialize() would unconditionally convert a packed array to
an associative(hashed) array to solve the problem of [0 => $v1, 'key' => $v2]
starting out as a packed array but becoming an associative array,
causing the raw zval pointer to the value for $v1 to have changed.

  • This can only happen when there's at least two elements, though.

Additionally, reduce memory usage when calling __unserialize with arrays of
size 0 or 1
(e.g. for user-defined functions that store the passed in array as a property)

I looked into repacking arrays of size 2+ after the unserializer no longer needed them, but ran into these issues:

  • (Probably) Worse cache locality of resulting mix of objects/arrays
  • Potential unsafety if unserialization handlers had access to the arrays
  • Longer time to unserialize overall, which would be worse for short-lived arrays

Benchmark

Before (PHP 8.2)

          bench_single_element(1000000): memory=417943120 bytes, create time 0.203, read time 0.043, free time 0.036, total time 0.282 result 499999500000
       bench_magic_unserialize(1000000): memory=482331728 bytes, create time 0.417, read time 0.114, free time 0.067, total time 0.598 result 499999500000

After (PHP 8.2)

          bench_single_element(1000000): memory=257943120 bytes, create time 0.156, read time 0.024, free time 0.027, total time 0.208 result 499999500000
       bench_magic_unserialize(1000000): memory=322331728 bytes, create time 0.368, read time 0.082, free time 0.047, total time 0.497 result 499999500000
<?php
function bench_single_element(int $N) {
    $values = [];
    for ($i = 0; $i < $N; $i++) {
        $values[] = [$i];
    }
    $ser = serialize($values);
    $start_time = hrtime(true)/1e9;
    $start_memory = memory_get_usage();
    $copy = unserialize($ser);
    $end_time = hrtime(true)/1e9;
    $end_memory = memory_get_usage();

    $total = 0;
    foreach ($copy as [$v]) {
        $total += $v;
    }
    $read_time = hrtime(true)/1e9;
    unset($copy);

    $free_time = hrtime(true)/1e9;
    $free_memory = memory_get_usage();

    printf("%30s(%d): memory=%d bytes, create time %.3f, read time %.3f, free time %.3f, total time %.3f result %d\n", __FUNCTION__, $N, $end_memory - $start_memory, $end_time-$start_time, $read_time-$end_time, $free_time - $read_time, $free_time - $start_time, $total);
}

class ListWrapper {
    public function __construct(public array $list) { }
    public function __serialize(): array { return $this->list; }
    public function __unserialize(array $list) { $this->list = $list; }
}

function bench_magic_unserialize(int $N) {
    $values = [];
    for ($i = 0; $i < $N; $i++) {
        $values[] = new ListWrapper([$i]);
    }
    $ser = serialize($values);
    $start_time = hrtime(true)/1e9;
    $start_memory = memory_get_usage();
    $copy = unserialize($ser);
    $end_time = hrtime(true)/1e9;
    $end_memory = memory_get_usage();

    $total = 0;
    foreach ($copy as $v) {
        $total += $v->list[0];
    }
    $read_time = hrtime(true)/1e9;
    unset($copy);

    $free_time = hrtime(true)/1e9;
    $free_memory = memory_get_usage();

    printf("%30s(%d): memory=%d bytes, create time %.3f, read time %.3f, free time %.3f, total time %.3f result %d\n", __FUNCTION__, $N, $end_memory - $start_memory, $end_time-$start_time, $read_time-$end_time, $free_time - $read_time, $free_time - $start_time, $total);
}
ini_set('memory_limit', '2G');
$N = 1000000;
bench_single_element($N);
bench_magic_unserialize($N);

Previously, PHP's unserialize() would unconditionally convert a packed array to
an associative(hashed) array to solve the problem of `[0 => $v1, 'key' => $v2]`
starting out as a packed array but becoming an associative array,
causing the raw zval pointer to the value for $v1 to have changed.

- This can only happen when there's at least two elements, though.

Additionally, reduce memory usage when calling `__unserialize` with arrays of
size 0 or 1
(e.g. for user-defined functions that store the passed in array as a property)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants