Order of reference can make a major difference
in execution time. Vary right index fastest.
This program can be accessed at:
ftp://lt.tucson.az.us/pub/c/cache.c.
/* Louis Taber
*
* Feb 10, 2001
*
* Program to demonstrate the effect
* of the cache memory system
* and array addressing order.
*/
#include <stdio.h>
#include <time.h>
#define LINE 512
#define MAX_ARRAY_SIZE 64*1024*1024 /* 64 Mbytes */
#define MAX_ROW ( MAX_ARRAY_SIZE / LINE )
char array[MAX_ROW][LINE]; /* Test array */
int i,j; /* Index values */
int size, row; /* Array indexes */
int sum, sum1, sum2;
clock_t start, filled_time, first_fetch, second_fetch;
float f_t, t_1, t_2, ratio;
int
main()
{
printf("Execution Times LINE= %d \n\n", LINE);
printf(" Array Fill Row Column C/R\n");
printf(" Size Time Fetch Fetch Ratio\n");
size=MAX_ARRAY_SIZE;
while( size >= 32 * 1024 )
{
row=size/LINE;
sum=sum1=sum2=0;
start= clock();
/* Fill array */
for( i=0; i<row; i++ )
for( j=0; j<LINE; j++)
sum += (int)(array[i][j] = (char)((i+j)%127) );
filled_time = clock();
/* Access all of array right index fastest */
for( i=0; i<row; i++ )
for( j=0; j<LINE; j++)
sum1 += (int)array[i][j];
first_fetch = clock();
/* Access all of array left index fastest */
for( j=0; j<LINE; j++)
for( i=0; i<row; i++ )
sum2 += (int)array[i][j];
second_fetch = clock();
f_t = (float)(filled_time-start)/CLOCKS_PER_SEC;
t_1 = (float)(first_fetch-filled_time)/CLOCKS_PER_SEC;
t_2 = (float)(second_fetch-first_fetch)/CLOCKS_PER_SEC;
ratio= t_2/t_1;
printf("%10d %6.2f %6.2f %6.2f %6.2f\n",
size, f_t,t_1,t_2, ratio);
/* check to see of the sums are all the same */
if( sum != sum1 || sum != sum2 )
{
printf(" Sums do not match\n");
printf(" %d %d %d \n", sum, sum1, sum2);
}
size >>=1; /* Cut size of array in half. */
}
return 0;
}
Intel Pentium, Dual 100MHz, 128 MBytes
Execution Times LINE= 512
Array Fill Row Column C/R
Size Time Fetch Fetch Ratio
67108864 55.32(1) 14.03(1) 30.68 2.19(3)
33554432 27.09 7.02 15.33 2.18
16777216 13.67 3.51 7.67 2.19
8388608 6.94 1.76 3.82 2.17
4194304 3.59 0.88 1.92 2.18
2097152 1.91 0.44 0.95 2.16
1048576 1.06 0.22 0.48 2.18
524288 0.64 0.11 0.21 1.91
262144 0.45 0.06 0.09 1.50
131072 0.34 0.02 0.04 2.00
65536 0.20 0.01 0.12 12.00(4)
32768 0.10 0.01 0.09 9.00
Intel Pentium II, 233 MHz, 386 Mbytes (gort)
Execution Times LINE= 512
Array Fill Row Column C/R
Size Time Fetch Fetch Ratio
67108864 13.31 3.27 11.50 3.52
33554432 6.34 1.64 5.73 3.49
16777216 3.17 0.82 2.84 3.46
8388608 1.59 0.41 1.41 3.44
4194304 0.80 0.20 0.71 3.55
2097152 0.40 0.10 0.36 3.60
1048576 0.20 0.05 0.17 3.40
524288 0.10 0.02 0.07 3.50
262144 0.05 0.01 0.02 2.00
131072 0.03 0.00 0.01 inf
65536 0.01 0.00 0.01 inf
32768 0.01 0.00 0.00 nan
AMD K6-2, 500MHz, 128MBytes (lt)
Execution Times LINE= 512
Array Fill Row Column C/R
Size Time Fetch Fetch Ratio
67108864 7.56 1.97 19.54 9.92(3)
33554432 3.34 0.98 9.77 9.97
16777216 1.67 0.50 4.87 9.74
8388608 0.84 0.24 2.44 10.17
4194304 0.42 0.12 1.22 10.17
2097152 0.21 0.06 0.60 10.00
1048576 0.11 0.03 0.28 9.33
524288 0.05 0.01 0.12 12.00
262144 0.02 0.01 0.05 5.00
131072 0.01 0.01 0.02 2.00
65536 0.01 0.00 0.01 inf
32768 0.00 0.00 0.00 nan
AMD K6-2, 233MHz, 64MBytes (amd)
Execution Times LINE= 512
Array Fill Row Column C/R
Size Time Fetch Fetch Ratio
67108864 17.46 3.29 355.67 108.11(5)
33554432 7.57 2.50 16.49 6.60
16777216 3.94 1.30 8.24 6.34
8388608 2.00 0.65 4.11 6.32
4194304 0.98 0.33 2.04 6.18
2097152 0.49 0.16 1.01 6.31
1048576 0.24 0.08 0.48 6.00
524288 0.13 0.04 0.17 4.25
262144 0.06 0.02 0.08 4.00
131072 0.03 0.01 0.03 3.00
65536 0.01 0.00 0.02 inf
32768 0.01 0.00 0.00 nan
AMD Athelon, 850MHz, 512MBytes (thunder)
Execution Times LINE= 512
Array Fill Row Column C/R
Size Time Fetch Fetch Ratio
67108864 4.91 1.05 7.52 7.16
33554432 2.39 0.52 3.76 7.23
16777216 1.19 0.26 1.88 7.23
8388608 0.60 0.13 0.94 7.23
4194304 0.29 0.07 0.46 6.57
2097152 0.15 0.04 0.22 5.50
1048576 0.08 0.02 0.10 5.00
524288 0.03 0.01 0.05 5.00
262144 0.02 0.01 0.00 0.00
131072 0.01 0.00 0.01 inf
65536 0.00 0.00 0.00 nan
32768 0.01 0.00 0.00 nan
This "benchmark"
was using the FSF gcc
compiler running Linux.
Notes on cache.c
program output:
- Fill and row fetch times are mostly related to
processor speed.
- Column fetch times are MUCH slower.
- On the AMD processor the ratios were higher.
- I can't explain this one.
- Performance is REALLY bad if you need to use
virtual memory. Note the run using a 64MBytes
array on a 64MBytes system. This time may have
wrapped. I was off at the swap meet. It wraps every
72 min. Also note that the "row" sequence did not
take much longer. Swapping does work.