The arrow of time

Ivan Voras' blog

Writing a GEOM GATE module, part 4

To wrap up this small tutorial on writing ggate modules, I'll describe in brief how the simple implementation of ggvd works. Unfortunately, it has a major limitation in functionality due to a FreeBSD bug but it still may be good enough as an example.

Here's a list of tutorial parts:

  1. First part - Describes what GEOM and GEOM GATE are
  2. Second part - Describes some more on how GEOM and GEOM GATE work, discusses sector sizes and dissects ggatel.
  3. Third part - Shows how to use ggatel, describes the idea behind the new module which will be written in the tutorial
  4. Fourth part - Analyses ggvd, wraps up the tutorial

The useful part of this tutorial is about writing a simple "virtual drive" module which stores sectors in a key-value database. Unfortunately, while doing this, I've stumbled upon the infamous "wdrain" bug which severly limits what can be done inside a ggate module. Basically, all IO that the module does to file systems (including NFS...) must be done with O_DIRECT | O_FSYNC flags (as ggatel correctly does), or else there is an easy-to-trigger deadlock in the "wdrain" state as now each IO operation is encountered twice by the kernel. This means that no "stock" key-value library (including the built-in BDB) will work when used in a ggate module, which is a shame.

Nevertheless, this work may serve as a tutorial for writing other ggate modules, and here's how ggvd works right now, with the Kyoto Cabinet database:

1. Opening the database

The database is opened (and/or created) in the ggvd_create() function:

static void
ggvd_create(void)
{
struct g_gate_ctl_create ggioc;
KCDB *db = kcdbnew();

if (!kcdbopen(db, path, g_gate_openflags(flags) | KCOCREATE))
err(EXIT_FAILURE, "Cannot open %s", path);

ggioc.gctl_version = G_GATE_VERSION;
ggioc.gctl_unit = unit;
if (mediasize % sectorsize != 0)
errx(1, "Device size is not a multiple of sector size: "
"%jd, %u", (intmax_t)ggioc.gctl_mediasize, sectorsize);
ggioc.gctl_mediasize = mediasize;
ggioc.gctl_sectorsize = sectorsize;
ggioc.gctl_timeout = timeout;
ggioc.gctl_flags = flags;
ggioc.gctl_maxcount = 0;
strlcpy(ggioc.gctl_info, path, sizeof(ggioc.gctl_info));
g_gate_ioctl(G_GATE_CMD_CREATE, &ggioc);
if (unit == -1)
printf("%s%u\n", G_GATE_PROVIDER_NAME, ggioc.gctl_unit);
unit = ggioc.gctl_unit;
ggvd_serve(db);
}

The database work is done by the kcdbopen() call, which is very straight-forward. The g_gate_openflags() function translated ggate open flags into database-specific flags (KCOREADER and/or KCOWRITER).

The new device is created with the given media size and sector size (which can be altered by command-line arguments), and after some more book-keeping work, the device is created by the g_gate_ioctl(G_GATE_CMD_CREATE...) call.

2. Performing IO requests

The ggvd_serve() function is what serves IO coming in from the kernel and passes it to the database.

The crucial part of the ggvd_serve() is the ioctl call and its switch() structure:

        for (;;) {
...
ggio.gctl_length = bsize;
ggio.gctl_error = 0;
g_gate_ioctl(G_GATE_CMD_START, &ggio);
error = ggio.gctl_error;
switch (error) {
...
}

switch (ggio.gctl_cmd) {
case BIO_READ:
error = ggvd_read(db, &ggio, &bsize);
break;
case BIO_WRITE:
error = ggvd_write(db, &ggio, &bsize);
break;
case BIO_DELETE:
error = ggvd_delete(db, &ggio, &bsize);
break;
default:
error = EOPNOTSUPP;
}

ggio.gctl_error = error;
g_gate_ioctl(G_GATE_CMD_DONE, &ggio);
}

This part of the code basically accepts "start IO" messages from the kernel, performs work on them and then signals them as done. The ggvd_read(), ggvd_write() and ggvd_delete() functions handle specific message types.

These functions are very similar so I'll just explain how ggvd_write() works:

static int
ggvd_write(KCDB *db, struct g_gate_ctl_io *ggio, size_t *bsize)
{
size_t togo = ggio->gctl_length;
off_t off = ggio->gctl_offset;
char *data = ggio->gctl_data;
char dbkey[20];
int ksize;

kcdbbegintran(db, 1);
while (togo > 0) {
ksize = format_sector_key(dbkey, off) + 1;
if (!kcdbset(db, dbkey, ksize, data, sectorsize)) {
errx(1, "Error in kcdbset");
}
togo -= sectorsize;
data += sectorsize;
off += sectorsize;
}
kcdbendtran(db, 1);

return (0);
}

This function divides the incoming IO request into sector-sized chunks, then writes the data as database records. To ensure data consistency as well as performance, it wraps the database operations into a transaction (playing with transaction parameters will not avoid the "wdrain" bug).

3. Using ggvd

ggvd can be built by issuing a common "make" command and is very simple to use. Here's an example:

# ./ggvd create myfile.kch
ggate0
# newfs -U /dev/ggate0
# mount /dev/ggate0 /mnt

Note that Kyoto Cabinet is filename-sensitive so the file extension (like ".kch") controls which type of database it will create.

The ggvd project as currently stands will work for very light IO loads, but as soon as a big IO request comes along, the kernel will choke on it and deadlock.

4. What next?

The ggvd project may be used as a skeleton for future, more sophisticated modules, or just as example code for learning ggate. When the "wdrain" bug gets fixed (which will probably not be soon as it's fairly complex), this code will automagically work out of the box.

#1 Re: Writing a GEOM GATE module, part 4

Added on 2012-07-07T18:56 by Vjacheslav Borisov

Thank you.

ggvd work fine.

How do I look "wdrain" bug?

#2 Re: Writing a GEOM GATE module, part 4

Added on 2012-07-07T20:28 by

http://www.google.hr/search?q=freebsd+wdrain+hang

Post your comment here!

Your name:
Comment title:
Text:
Type "xxx" here:

Comments are subject to moderation and will be deleted if deemed inappropriate. All content is © Ivan Voras. Comments are owned by their authors... who agree to basically surrender all rights by publishing them here :)