Not enough room for wired pages?

One of my systems is running FreeBSD/amd64 9.0-STABLE and is using ZFS for all its worth. The amount of installed memory in this system is 8 gibibytes, which should be enough to please even ZFS’ ARC. After a few hours of uptime not everyone are happy, and one of those unhappy guys are GnuPG.

The mlock(2) system call comes in handy if you want to prevent your pages from being paged out. GnuPG tries to use the mlock(2) system call to secure a few pages of memory, memory it will use for storing sensitive stuff such as decrypted private keys, etc.

As you might have guessed, there is a potential for abuse. Only a user with a uid of 0 (zero) is allowed to use the mlock(2) system call, and this is the reason why some of us like to have the gpg2 executable marked as “setuid root”. GnuPG will change to the uid of the real user once some housekeeping is all set, including the call to mlock(2).

There exist a sysctl(3) oid named vm.max_wired controlling how many pages are allowed to be wired at the same time, i.e. the number of pages secured from being paged out due to the use of the mlock(2) system call. The value for this sysctl(3) is normally derived at boot time and is set to 1/3 the amount of physical memory. See the function vm_pageout() in the stable/9/sys/vm/vm_pageout.c file, from around line 1477 onwards.

Things get more complicated as it turns out that the memory used by the ZFS’ ARC are also wired, and is counted against the vm.max_wired sysctl(3).

I figured allowing for up to 7 gibibytes of wired memory should be enough to please the needs of both the kernel and the userland, while preventing the entire physical memory from being wired, which in turn would prevent any paging and swapping from occuring at all.

Do note the value for the vm.max_wired sysctl(3) is measured in 4K pages, where 4K equals 4096 bytes.

7 GiB equals 7168 MiB (multiply by 1024), which equals 7340032 KiB (again, multiply by 1024). Divide 7340032 KiB by 4 KiB and you’ll get 1835008 4K pages.

I added these lines to the /etc/sysctl.conf file:

# Allow for up to 7 GiB = 1835008 4K pages of wired memory:
vm.max_wired=1835008

Unless you want to reboot your system at this time, you can also run the following command as root:

root@enterprise:~>sysctl vm.max_wired=1835008

I guess not everyone will agree with me, but I propose the logic for the vm.max_wired sysctl(3) be changed so that wired pages belonging to the kernel is exempted from being counted as such. Thus, the vm.max_wired sysctl(3) will only control the amount of userland wired pages.

The kernel is after all limited to the amount of memory dictated by the vm.kmem_size_max tuneable, and the ZFS’ ARC’s use of memory is limited to the vfs.zfs.arc_max tuneable, which in turn must be smaller than the vm.kmem_size_max tuneable.

Here’s the source for a small test program I created to debug this issue:

#include <sys/mman.h>

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main(int argc, char **argv)
{
  void *p;
  long s = sysconf(_SC_PAGESIZE);

  if ( (p = malloc(s)) == NULL) {
    fprintf(stderr,
            "%s: malloc(s) => malloc(%ld): %s (%d)\n",
            argv[0], s, strerror(errno), errno);
    return 1;
  } // if

  if (mlock(p, s) == -1) {
    fprintf(stderr,
            "%s: mlock(p, s) => mlock(%p, %ld): %s (%d)\n",
            argv[0], p, s, strerror(errno), errno);
    return 1;
  } // if

  return 0;
} // main()

// mlocktest.c

Running this program as a regular user would produce:

./mlocktest: mlock(p, s) => mlock(0x800c08000, 4096): Operation not permitted (1)

Before tweaking the vm.max_wired sysctl(3), this program would sometimes produce the following when run as root, depending on the uptime and the size of the ZFS’ ARC:

./mlocktest: mlock(p, s) => mlock(0x800c08000, 4096): Resource temporarily unavailable (35)