From Xuan Xuan's introduction in the beginning, I think the first data is
average value of the test results and the second data is the standard
deviation.
>
So why skip the numbers for the first round test? Isn't that what
real users see, the first round? Sure, it will be slower as code is
loaded into memory, files read from disk, etc. But the same thing
happens for users.
Also, I think the interesting 2nd number is the "standard error of the
mean", which == std deviation / sqrt(count of measurements). That is
what gives the error bars (confidence interval) on the measurement.
For example, 95% confidence limits on a measurement would be:
lower bound = mean - 1.96*standard_error
upper bound = mean + 1.96*standard_error
And easy "rule of thumb" is to compare the "before" and "after"
measures and see if there is overlap in the intervals.
For example:
Before interval: (1.0, 2.0)
After interval: (1.5, 2.5)
Because the intervals overlap, there might not be a significant
difference between the two.
But:
Before interval: (1.0, 2.0)
After interval: (2.5, 3.5)
In this case there is a clear difference, because the confidence
intervals do not overlap.
A t-test could also be used here, but the above approach works well in
Calc if you use the "stock 2" type chart. This has series for
high/low/close/open. So you could do something where the high and low
values are 95% confidence intervals. This makes it easy to tell what
is important from a glance.
