This is not the current version of the class.

Lecture 18: System call costs

System call costs experiment

The following program is a simple system call benchmarker. It calls zero or more system calls 1 million times in a tight loop; which system call(s) depends on the option argument. Paste the code into syscall.cc (or download it), then compile with c++ -O2 syscall.cc -o syscall (or c++ -std=gnu++11 -O2 syscall.cc -o syscall) and run with (for example) ./syscall -0 or ./syscall -o.

#include <unistd.h>
#include <time.h>
#include <stdio.h>
#include <inttypes.h>
#include <fcntl.h>
#include <stdlib.h>

static void* initial_brk;

unsigned f_return0() {
    return 0;
}

unsigned f_getpid() {
    return getpid();
}

unsigned f_getppid() {
    return getppid();
}

unsigned f_time() {
    return time(nullptr);
}

unsigned f_clock_gettime() {
    timespec ts;
    clock_gettime(CLOCK_REALTIME, &ts);
    return ts.tv_nsec;
}

unsigned f_clock_gettime_coarse() {
    timespec ts;
    clock_gettime(CLOCK_REALTIME_COARSE, &ts);
    return ts.tv_nsec;
}

unsigned f_sbrk() {
    return reinterpret_cast<uintptr_t>(sbrk(0));
}

unsigned f_brk() {
    return brk(initial_brk);
}

unsigned f_open_close() {
    int fd = open("/dev/null", O_RDONLY);
    close(fd);
    return fd;
}

unsigned f_close() {
    return close(0);
}

void usage() {
    fprintf(stderr, "Usage: ./syscall [-0pPtcCsbox]\n");
    exit(1);
}

int main(int argc, char** argv) {
    unsigned (*volatile f)(void) = f_getpid;
    initial_brk = sbrk(0);

    int opt;
    while ((opt = getopt(argc, argv, "0pPtcCsbox")) != -1) {
        switch (opt) {
        case '0':
            f = f_return0;
            break;
        case 'p':
            f = f_getpid;
            break;
        case 'P':
            f = f_getppid;
            break;
        case 't':
            f = f_time;
            break;
        case 'c':
            f = f_clock_gettime;
            break;
        case 'C':
            f = f_clock_gettime_coarse;
            break;
        case 's':
            f = f_sbrk;
            break;
        case 'b':
            f = f_brk;
            break;
        case 'o':
            f = f_open_close;
            break;
        case 'x':
            f = f_close;
            break;
        default:
            usage();
        }
    }

    if (optind != argc) {
        usage();
    }

    timespec ts0;
    clock_gettime(CLOCK_REALTIME, &ts0);
    unsigned long n = 0;

    for (unsigned i = 0; i != 1000000; ++i) {
        n += f();
    }

    timespec ts1;
    clock_gettime(CLOCK_REALTIME, &ts1);

    double t0 = ts0.tv_sec + ts0.tv_nsec / 1e9;
    double t1 = ts1.tv_sec + ts1.tv_nsec / 1e9;
    printf("result: %lu in %.06fs\n", n, t1 - t0);
}

Run this program with the different arguments.

  1. Which system calls are most expensive and which least expensive? Do system call costs divide into classes?
  2. Try to explain why some system calls are so much more expensive than others. For instance, use programs like gdb and/or strace and/or /usr/bin/time -v and/or perf to trace the operation of the program.

The C10K problem

Much OS research in the late 1990s and early 2000s was concerned with addressing performance problems with a specific set of system calls, namely network connection system calls. Wide-area network connections have high latency, so a server (such as a web or FTP server) that services wide-area clients will often be waiting for those clients. A single server in the late 1990s could theoretically service the active workload provided by 10,000 simultaneous network clients, but in practice many servers broke down well before that level, because certain critical system calls had design flaws that prevented them from scaling. This “10,000 connection” problem became known as the “C10K problem.”

This paper traced the C10K problem to two specific kernel functions, select (a system call) and ufalloc (a kernel function for allocating the numerically smallest unused file descriptor, invoked by open, socket, accept, etc.):

“Scalable Kernel Performance for Internet Servers Under Realistic Loads.” Gaurav Banga and Jeffrey C. Mogul. In Proc. USENIX ATC 1998. Link

Read the manual page for select and/or poll and try to figure out, from these system calls’ specifications, why they perform badly on servers with 10,000 or more open, but mostly idle, connections. Develop your own hypothesis and then check it out by talking with other groups or reading the paper. Then sketch a solution for this problem: system calls that serve the same need as select, but that could scale to large numbers of idle connections.

“A Scalable and Explicit Event Delivery Mechanism for UNIX.” Gaurav Banga, Jeffrey C. Mogul, and Peter Druschel, In Proc. USENIX ATC 1999. Link