Bullet Cache is made to run primarily on FreeBSD but I try to keep up Linux compatibility (which is actually pretty easy as both are reasonably POSIX friendly). As the first step, fetch the latest source archive, unpack it and run make:
$ tar xzf mdcached-1.0b2.tgz
$ cd mdcached-1.0b2
There is no "./configure" step as the source code is not (yet) complex enough to require it. The default Makefile target creates the following components (among others):
- mdcached - the server executable
- libmdcached.a - the C client static library
- libmdcached.so and libmdcached.so.1 - the C client dynamic library
- test_mdcached - the C client testing executable
- bench_mdcached and bench_mdcached_async - C client benchmarks
The default "make install" target will install the server executable, the C client libraries and headers in the file system hierarchy set by the PREFIX variable. The built-in PREFIX defaults to "/usr/local" and should be changed to "/usr" for Linux systems (i.e. run make PREFIX=/usr install).
The server executable has several command line options:
$ ./mdcached -h
Bullet Cache server version: 1, protocol version: 1
DataStore is created from 256 item buckets and 64 tag buckets
usage: ./mdcached [-h] [-d] [-v #] [-t #] [-n #] [-b #] [-f file] [-r] [-i]
-h Show this help message
-d Debug mode
-v # Set minimum log (verbosity) level to #
-t # Set number of worker threads to # (default=0)
-n # Set number of network thread to # (default=2)
-k Bind network threads to worker threads
-f file Set dump file name (for use with SIGUSR1)
-r Read cache data from the dump file (prewarm the cache)
-i # Automatically create a cache dump every # seconds
Most of these options will be discussed in later posts. For now, the most important high-level option is the "-n" which configures the number of threads (and by extension CPU cores) the server will use. If the system is dedicated as a cache server, this option should be set to a number close to the number of logical cores in the system. Otherwise, it should be set to a reasonable value. For example, if benchmarking on a single server, both the server and the benchmark client number of threads should slightly exceed half the available CPU cores (e.g. 5 on an 8-core system). The default value (2) is more of an example than really usable.
Next, start the server and you can run the test executable to verify it is working:
Structure size survey: mc_header(8) mc_handshake(16) mc_data_entry(20)
Connection ok (with handshake). Testing handshake speed.
1000 handshakes took 0.0 seconds: 95942.2 handshakes/s
Testing ADD operation
1000 additions took 0.0 seconds: 57527.1 additions/s
Testing GET operation
1000 retrievals took 0.0 seconds: 83493.7 additions/s
Testing ADD with TAGS operation
1000 additions took 0.0 seconds: 71952.1 additions/s
Testing TAGS retrieval
Testing TAGS deletion
Testing TSTACK operations
The test_mdcached program runs a few simple regression tests in the role of a client application connecting to the server. The exact tests run vary from version to version and the output you see might be a bit different than this one, but the most important line is the last one - if it doesn't contain the "Ok." string, some of the tests have failed. If you encounter this, please contact me as it is probably something I will want to fix.
The recommended benchmark is the bench_mdcached_async, which uses asynchronous IO together with multithreading to reduce client-side overhead while testing the server. An example run of this benchmark could be as follows:
$ ./bench_mdcached_async -t 2
Generating 30000 data records.
Generated 3199 kB (487 kB keys, 2712 kB data). Starting 10 clients in 2 threads.
Minimum data size: 10, Maximum data size: 103
Average key size: 16 bytes.
Average data size: 92 bytes.
Created thread 0x800cfe490 (including handshake) with 5 clients
Created thread 0x800cfe4a8 (including handshake) with 5 clients
Thread 0x800cfe490 ends: 2000001 events in 1091625 iterations: 1 events/iteration
Thread 0x800cfe4a8 ends: 2000001 events in 1059869 iterations: 1 events/iteration
10 clients * 400000 requests = 4000000 requests. That's 271363.1 requests/s
30000 records have been put.
Note that we instructed it to create the same number of threads as the server did (2) with the "-t" argument. This particular output was generated by running both the client and the server on the same system, a somewhat old 4-core desktop machine - much better results can be achieved on recent hardware.
The next post will introduce the C client API.