Skip to content

Commit bae45d9

Browse files
committed
scal benchmark: eliminate y, move init/timing out of loop
Removing y avoids cache effects (if y is the size of the L1 cache, the main array x is removed from it). Moving init and timing out of the loop makes the scal benchmark behave like the gemm benchmark, and allows higher accuracy for smaller test cases since the loop overhead is much smaller than the timing overhead. Example: OPENBLAS_LOOPS=10000 ./dscal.goto 1024 8192 1024 on AMD Zen2 (7532) with 32k (4k doubles) L1 cache per core. Before From : 1024 To : 8192 Step = 1024 Inc_x = 1 Inc_y = 1 Loops = 10000 SIZE Flops 1024 : 5627.08 MFlops 0.000000 sec 2048 : 5907.34 MFlops 0.000000 sec 3072 : 5553.30 MFlops 0.000001 sec 4096 : 5446.38 MFlops 0.000001 sec 5120 : 5504.61 MFlops 0.000001 sec 6144 : 5501.80 MFlops 0.000001 sec 7168 : 5547.43 MFlops 0.000001 sec 8192 : 5548.46 MFlops 0.000001 sec After From : 1024 To : 8192 Step = 1024 Inc_x = 1 Inc_y = 1 Loops = 10000 SIZE Flops 1024 : 6310.28 MFlops 0.000000 sec 2048 : 6396.29 MFlops 0.000000 sec 3072 : 6439.14 MFlops 0.000000 sec 4096 : 6327.14 MFlops 0.000001 sec 5120 : 5628.24 MFlops 0.000001 sec 6144 : 5616.41 MFlops 0.000001 sec 7168 : 5553.13 MFlops 0.000001 sec 8192 : 5600.88 MFlops 0.000001 sec We can see the L1->L2 switchover point is now where it should be, and the number of flops for L1 is more accurate.
1 parent e9a911f commit bae45d9

File tree

1 file changed

+9
-23
lines changed

1 file changed

+9
-23
lines changed

benchmark/scal.c

+9-23
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
4545

4646
int main(int argc, char *argv[]){
4747

48-
FLOAT *x, *y;
48+
FLOAT *x;
4949
FLOAT alpha[2] = { 2.0, 2.0 };
5050
blasint m, i;
5151
blasint inc_x=1,inc_y=1;
@@ -74,10 +74,6 @@ int main(int argc, char *argv[]){
7474
fprintf(stderr,"Out of Memory!!\n");exit(1);
7575
}
7676

77-
if (( y = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_y) * COMPSIZE)) == NULL){
78-
fprintf(stderr,"Out of Memory!!\n");exit(1);
79-
}
80-
8177
#ifdef __linux
8278
srandom(getpid());
8379
#endif
@@ -91,30 +87,20 @@ int main(int argc, char *argv[]){
9187

9288
fprintf(stderr, " %6d : ", (int)m);
9389

90+
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
91+
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
92+
}
9493

94+
begin();
9595
for (l=0; l<loops; l++)
9696
{
97-
98-
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
99-
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
100-
}
101-
102-
for(i = 0; i < m * COMPSIZE * abs(inc_y); i++){
103-
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
104-
}
105-
begin();
106-
10797
SCAL (&m, alpha, x, &inc_x);
98+
}
99+
end();
108100

109-
end();
110-
111-
time1 = getsec();
112-
113-
timeg += time1;
114-
115-
}
101+
time1 = getsec();
116102

117-
timeg /= loops;
103+
timeg = time1 / loops;
118104

119105
#ifdef COMPLEX
120106
fprintf(stderr, " %10.2f MFlops %10.6f sec\n", 6. * (double)m / timeg * 1.e-6, timeg);

0 commit comments

Comments
 (0)