This section contains hints for doing proper micro-benchmarking on FreeBSD or of
FreeBSD itself.
It is not possible to use all of the suggestions below every single time, but the more
used, the better the benchmark's ability to test small differences will be.
-
Disable APM and any other kind of clock fiddling
(ACPI ?).
-
Run in single user mode. E.g. cron(8), and and other
daemons only add noise. The sshd(8) daemon can
also cause problems. If ssh access is required during test either disable the SSHv1 key
regeneration, or kill the parent sshd daemon during the
tests.
-
Do not run ntpd(8).
-
If syslog(3) events are
generated, run syslogd(8) with an
empty /etc/syslogd.conf, otherwise, do not run it.
-
Minimize disk-I/O, avoid it entirely if possible.
-
Do not mount file systems that are not needed.
-
Mount /, /usr, and any other file
system as read-only if possible. This removes atime updates to disk (etc.) from the I/O
picture.
-
Reinitialize the read/write test file system with newfs(8) and populate
it from a tar(1) or dump(8) file before
every run. Unmount and mount it before starting the test. This results in a consistent
file system layout. For a worldstone test this would apply to /usr/obj (just reinitialize with newfs and
mount). To get 100% reproducibility, populate the file system from a dd(1) file (i.e.: dd if=myimage of=/dev/ad0s1h bs=1m)
-
Use malloc backed or preloaded md(4) partitions.
-
Reboot between individual iterations of the test, this gives a more consistent
state.
-
Remove all non-essential device drivers from the kernel. For instance if USB is not
needed for the test, do not put USB in the kernel. Drivers which attach often have
timeouts ticking away.
-
Unconfigure hardware that are not in use. Detach disks with atacontrol(8) and camcontrol(8) if the
disks are not used for the test.
-
Do not configure the network unless it is being tested, or wait until after the test
has been performed to ship the results off to another computer.
If the system must be connected to a public network, watch out for spikes of broadcast
traffic. Even though it is hardly noticeable, it will take up CPU cycles. Multicast has
similar caveats.
-
Put each file system on its own disk. This minimizes jitter from head-seek
optimizations.
-
Minimize output to serial or VGA consoles. Running output into files gives less
jitter. (Serial consoles easily become a bottleneck.) Do not touch keyboard while the
test is running, even space or back-space
shows up in the numbers.
-
Make sure the test is long enough, but not too long. If the test is too short,
timestamping is a problem. If it is too long temperature changes and drift will affect
the frequency of the quartz crystals in the computer. Rule of thumb: more than a minute,
less than an hour.
-
Try to keep the temperature as stable as possible around the machine. This affects
both quartz crystals and disk drive algorithms. To get real stable clock, consider
stabilized clock injection. E.g. get a OCXO + PLL, inject output into clock circuits
instead of motherboard xtal. Contact Poul-Henning Kamp <phk@FreeBSD.org> for more information about
this.
-
Run the test at least 3 times but it is better to run more than 20 times both for
``before'' and ``after'' code. Try to interleave if possible (i.e.: do not run 20 times
before then 20 times after), this makes it possible to spot environmental effects. Do not
interleave 1:1, but 3:3, this makes it possible to spot interaction effects.
A good pattern is: bababa{bbbaaa}*. This gives hint after
the first 1+1 runs (so it is possible to stop the test if it goes entirely the wrong
way), a standard deviation after the first 3+3 (gives a good indication if it is going to
be worth a long run) and trending and interaction numbers later on.
-
Use usr/src/tools/tools/ministat to see if the numbers are
significant. Consider buying ``Cartoon guide to statistics'' ISBN: 0062731025, highly
recommended, if you have forgotten or never learned about standard deviation and
Student's T.
-
Do not use background fsck(8) unless the
test is a benchmark of background fsck. Also, disable background_fsck in /etc/rc.conf unless
the benchmark is not started at least 60+``fsck runtime''
seconds after the boot, as rc(8) wakes up and
checks if fsck needs to run on any file systems when background
fsck is enabled. Likewise, make sure there are no snapshots
lying around unless the benchmark is a test with snapshots.
-
If the benchmark show unexpected bad performance, check for things like high interrupt
volume from an unexpected source. Some versions of ACPI have been reported to ``misbehave'' and generate excess
interrupts. To help diagnose odd test results, take a few snapshots of vmstat -i and look for anything unusual.
-
Make sure to be careful about optimization parameters for kernel and userspace,
likewise debugging. It is easy to let something slip through and realize later the test
was not comparing the same thing.
-
Do not ever benchmark with the WITNESS and INVARIANTS kernel options enabled unless the test is interested to
benchmarking those features. WITNESS can cause 400%+ drops in
performance. Likewise, userspace malloc(3) parameters
default differently in -CURRENT from the way they ship in production releases.