test: make timers-blocking-callback more reliable#14831
test: make timers-blocking-callback more reliable#14831Trott wants to merge 5 commits intonodejs:masterfrom
Conversation
test-timers-blocking-callback may fail erroneously on resource-constrained machines due to the timing nature of the test. There is likely no way around the timing issue. This change tries to decrease the probability of the test failing erroneously by having it retry a small number of times on failure. Tested on 0.10.38 (which has a bug that this test was written for) and (modifying the test slightly to remove ES6 stuff) the test still seems to fail 100% of the time there, which is what we want/expect. Fixes: nodejs#14792
|
/cc @aqrln @misterdjules |
|
For anyone else wanting to verify this still fails 100% in Node.js 0.10.38 and passes 100% in Node.js 0.10.39, here's the version of this test that I used with ES6-isms removed: Details'use strict';
/*
* This is a regression test for
* https://github.com/nodejs/node-v0.x-archive/issues/15447 and
* and https://github.com/nodejs/node-v0.x-archive/issues/9333.
*
* When a timer is added in another timer's callback, its underlying timer
* handle was started with a timeout that was actually incorrect.
*
* The reason was that the value that represents the current time was not
* updated between the time the original callback was called and the time
* the added timer was processed by timers.listOnTimeout. That led the
* logic in timers.listOnTimeout to do an incorrect computation that made
* the added timer fire with a timeout of scheduledTimeout +
* timeSpentInCallback.
*
* This test makes sure that a timer added by another timer's callback
* fires with the expected timeout.
*
* It makes sure that it works when the timers list for a given timeout is
* empty (see testAddingTimerToEmptyTimersList) and when the timers list
* is not empty (see testAddingTimerToNonEmptyTimersList).
*/
var assert = require('assert');
var Timer = process.binding('timer_wrap').Timer;
var TIMEOUT = 100;
var nbBlockingCallbackCalls;
var latestDelay;
var timeCallbackScheduled;
// These tests are somewhat probablistic so they may fail even when the bug is
// not present. However, they fail 100% of the time when the bug *is* present,
// so to increase reliability, allow for a small number of retries. (Keep it
// small because as currently written, one failure could result in multiple
// simultaneous retries of the test. Don't want to timer-bomb ourselves.
// Observed failures are infrequent anyway, so only a small number of retries
// is hopefully more than sufficient.)
var retries = 2;
function busyLoop(time) {
var startTime = Timer.now();
var stopTime = startTime + time;
while (Timer.now() < stopTime) {}
};
function initTest() {
nbBlockingCallbackCalls = 0;
latestDelay = 0;
timeCallbackScheduled = 0;
}
function blockingCallback(retry, callback) {
++nbBlockingCallbackCalls;
if (nbBlockingCallbackCalls > 1) {
latestDelay = Timer.now() - timeCallbackScheduled;
// Even if timers can fire later than when they've been scheduled
// to fire, they shouldn't generally be more than 100% late in this case.
// But they are guaranteed to be at least 100ms late given the bug in
// https://github.com/nodejs/node-v0.x-archive/issues/15447 and
// https://github.com/nodejs/node-v0.x-archive/issues/9333.
if (latestDelay > TIMEOUT * 2) {
if (retries > 0) {
retries--;
return retry(callback);
}
assert.fail('timeout delayed by more than 100ms (${latestDelay}ms)');
}
if (callback)
return callback();
} else {
// block by busy-looping to trigger the issue
busyLoop(TIMEOUT);
timeCallbackScheduled = Timer.now();
setTimeout(blockingCallback.bind(null, retry, callback), TIMEOUT);
}
}
function testAddingTimerToEmptyTimersList(callback) {
initTest();
// Call setTimeout just once to make sure the timers list is
// empty when blockingCallback is called.
setTimeout(
blockingCallback.bind(null, testAddingTimerToEmptyTimersList, callback),
TIMEOUT
);
}
function testAddingTimerToNonEmptyTimersList() {
initTest();
// Call setTimeout twice with the same timeout to make
// sure the timers list is not empty when blockingCallback is called.
setTimeout(
blockingCallback.bind(null, testAddingTimerToNonEmptyTimersList),
TIMEOUT
);
setTimeout(
blockingCallback.bind(null, testAddingTimerToNonEmptyTimersList),
TIMEOUT
);
}
// Run the test for the empty timers list case, and then for the non-empty
// timers list one.
testAddingTimerToEmptyTimersList(
testAddingTimerToNonEmptyTimersList
); |
| retries--; | ||
| return retry(callback); | ||
| } | ||
| assert.fail(`timeout delayed by more than 100ms (${latestDelay}ms)`); |
There was a problem hiding this comment.
Maybe
assert.fail(`timeout delayed by more than 100% (${latestDelay}ms)`);There was a problem hiding this comment.
I'm pretty sure the bug that this is a regression test for was one that delayed callbacks by 100ms, not 100%. But I can't find the info. Maybe @misterdjules knows if I'm mistaken about that or not.
(It does suggest that the TIMEOUT * 2 line might be better as TIMEOUT + 100. And maybe this needs a comment explaining that TIMEOUT needs to be short.)
There was a problem hiding this comment.
I thought TIMEOUT * 2 at first, but then read L57, also the test is (latestDelay > TIMEOUT * 2) implies 100%
There was a problem hiding this comment.
Nope, I'm wrong, it really should be 100%. Will change, thanks.
| TIMEOUT | ||
| ); | ||
| setTimeout( | ||
| blockingCallback.bind(null, testAddingTimerToNonEmptyTimersList), |
There was a problem hiding this comment.
CR: could you add a common.mustCall() to assert the async task actually runs
There was a problem hiding this comment.
That's not a callback. That's a "use this as a retry function if the test fails". So it almost never gets called.
EDIT: In the above comment, I assume you mean a common.mustCall() around testAddingTimerToNonEmptyTimersList. If you meant around blockingCallback.bind(...), I'd rather not. setTimeout() holds the event loop open (and thus the process running) until the callback is called. If it fails to run, that's a The Whole World Is Broken And On Fire kind of problem and I don't think we need to flag it in this test. I don't think we tend to wrap setTimeout() stuff in common.mustCall() generally and I wouldn't want to create an impression that moving in that direction is desirable.
There was a problem hiding this comment.
blockingCallback takes a 2nd arg callback
There was a problem hiding this comment.
Agree that it's a "Whole World Is Broken And On Fire" sitch, but adding it might help readability in that it will flag that this should be the end of the test.
|
probably cc @misterdjules tbh |
| let latestDelay; | ||
| let timeCallbackScheduled; | ||
|
|
||
| // These tests are somewhat probablistic so they may fail even when the bug is |
There was a problem hiding this comment.
nit: somehow the term "probabilistic" seems a bit vague, and doesn't seem to add to the rest of the sentence ("they may fail even when the bug is not present"), so I'd be inclined to remove that term. Not a big deal though.
There was a problem hiding this comment.
@misterdjules I updated the comment to remove the term "probablistic" and to hopefully add a bit more clarity about what the problem is.
| // https://github.com/nodejs/node-v0.x-archive/issues/9333.. | ||
| assert(latestDelay < TIMEOUT * 2); | ||
| // https://github.com/nodejs/node-v0.x-archive/issues/9333. | ||
| if (latestDelay > TIMEOUT * 2) { |
| // not present. However, they fail 100% of the time when the bug *is* present, | ||
| // so to increase reliability, allow for a small number of retries. (Keep it | ||
| // small because as currently written, one failure could result in multiple | ||
| // simultaneous retries of the test. Don't want to timer-bomb ourselves. |
There was a problem hiding this comment.
as currently written, one failure could result in multiple simultaneous retries of the test
Is there a way to rewrite that part of the test so that's not the case anymore?
There was a problem hiding this comment.
I think I've fixed that now. Will run another stress test, but in the meantime, PTAL. Thanks!
|
@refack Updated the message per your nit. LGTY? |
|
FreeBSD CI failure is unrelated. CI is good. @refack Is this good by you? |
|
Landed in c473d80 |
test-timers-blocking-callback may fail erroneously on resource-constrained machines due to the timing nature of the test. There is likely no way around the timing issue. This change tries to decrease the probability of the test failing erroneously by having it retry a small number of times on failure. Tested on 0.10.38 (which has a bug that this test was written for) and (modifying the test slightly to remove ES6 stuff) the test still seems to fail 100% of the time there, which is what we want/expect. PR-URL: #14831 Fixes: #14792 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
test-timers-blocking-callback may fail erroneously on resource-constrained machines due to the timing nature of the test. There is likely no way around the timing issue. This change tries to decrease the probability of the test failing erroneously by having it retry a small number of times on failure. Tested on 0.10.38 (which has a bug that this test was written for) and (modifying the test slightly to remove ES6 stuff) the test still seems to fail 100% of the time there, which is what we want/expect. PR-URL: #14831 Fixes: #14792 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
test-timers-blocking-callback may fail erroneously on resource-constrained machines due to the timing nature of the test. There is likely no way around the timing issue. This change tries to decrease the probability of the test failing erroneously by having it retry a small number of times on failure. Tested on 0.10.38 (which has a bug that this test was written for) and (modifying the test slightly to remove ES6 stuff) the test still seems to fail 100% of the time there, which is what we want/expect. PR-URL: #14831 Fixes: #14792 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
test-timers-blocking-callback may fail erroneously on resource-constrained machines due to the timing nature of the test. There is likely no way around the timing issue. This change tries to decrease the probability of the test failing erroneously by having it retry a small number of times on failure. Tested on 0.10.38 (which has a bug that this test was written for) and (modifying the test slightly to remove ES6 stuff) the test still seems to fail 100% of the time there, which is what we want/expect. PR-URL: #14831 Fixes: #14792 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
test-timers-blocking-callback may fail erroneously on
resource-constrained machines due to the timing nature of the test.
There is likely no way around the timing issue. This change tries to
decrease the probability of the test failing erroneously by having it
retry a small number of times on failure.
Tested on 0.10.38 (which has a bug that this test was written for) and
(modifying the test slightly to remove ES6 stuff) the test still seems
to fail 100% of the time there, which is what we want/expect.
Fixes: #14792
Checklist
make -j4 test(UNIX), orvcbuild test(Windows) passesAffected core subsystem(s)
test timers