Lecture 9: Blocking and wait queues

Blocking

A thread is blocked when it takes up no CPU time
Managing blocked threads is an important operating system task
- Threads frequently need to wait for an event or state to change
- It’s bad to waste CPU time on threads that have no actual work to do
- Wastes energy/battery life, leaves less CPU time for actual work
Alternative to blocking: polling
- Thread remains runnable
- “Are we there yet? Are we there yet?”

Is this process blocked?

void process_main() {
    sys_sleep_for_100_seconds();
}

...

uintptr_t proc::syscall(regstate* regs) {
    case SYSCALL_SLEEP_FOR_100_SECONDS: {
        unsigned long endticks = ticks + 100 * HZ;
        while (long(endticks - ticks) > 0) {
            this->yield();
        }
        return 0;
     }

It depends on which thread you mean!

The user context is blocked for 100 seconds
- The CPU never runs in user mode on behalf of the process
The kernel task context is never blocked
- The proc remains runnable throughout
Advantages and disadvantages?

Blocking goals

Blocking should be efficient
- Cheap to block, cheap to wake up
Blocking should be fine-grained
- Frequently a thread wants to block until some state changes
- Examples: reading from a network connection, waiting for a child process to exit
- Want threads to remain blocked until it’s very likely the state has changed
- Coarse-grained blocking means frequent spurious wakeups: a thread runs even though the state it’s waiting for has not changed
Blocking should be comprehensive
- When a user context has no work to do, the corresponding kernel task should also block
Blocking must be reliable
- No lost wakeups: any thread that should be runnable must eventually run

System calls to illustrate blocking

Kernel maintains an unsigned integer stage that only increases
sys_set_stage(i): Set stage to at least i
- The new stage is ≥ i and ≥ the old stage
sys_block_until(i): Block this process until stage ≥ i

Bad blocking

// helpers
void proc::block() {
    this->pstate_ = ps_blocked;
    this->yield();
}

void proc::unblock() {
    this->pstate_ = ps_runnable;
    cpus[runq_cpu_].enqueue(this);
}

...

std::atomic<uint64_t> stage;

    case SYSCALL_SET_STAGE: {
        stage = max(stage.load(), egs->reg_rdi);
        spinlock_guard guard(ptable_lock);
        for (int i = 1; i != NPROC; ++i) {
            if (ptable[i])
                ptable[i]->unblock();
        }
        return 0;
    }

    case SYSCALL_BLOCK_UNTIL:
        while (stage < regs->reg_rdi) {
            this->block();
        }
        return 0;

Sleep–wakeup race
- A race condition between the code that blocks (“sleep”) and the code that unblocks (“wakeup”) causes a lost wakeup, and a task blocks forever

Key problem: Atomic blocking and unlocking

while (stage < regs->reg_rdi) {
    ********** THE CRITICAL MOMENT **********
    this->pstate_ = ps_blocked;
    this->yield();
}

Lost wakeup when p->unblock() happens on another CPU at the critical moment
Despite wakeup and state change, task marks itself blocked and yields, blocking potentially forever

Synchronization invariants for blocking

A proc may be on at most one CPU’s run queue.
A CPU’s run queue is protected by c->runq_lock_. cpustate::enqueue() acquires that lock.
Any context may change any task’s p->pstate_ from proc::ps_blocked to proc::ps_runnable.
Only task p may change p->pstate_ from proc::ps_runnable to proc::ps_blocked.
The proc::yield() function must be called with no spinlocks held.

Check twice?

while (stage < regs->reg_rdi) {
    this->pstate_ = ps_blocked;
    std::atomic_thread_fence();
    if (stage >= regs->reg_rdi) {
        this->pstate_ = ps_runnable;
    }
    this->yield();
}

Idea 1: Block, then check

Mark self as blocked before checking the wakeup condition
Update wakeup state before searching for blocked tasks
Locks or atomics can provide ordering

Avoid spurious wakeups

sys_set_stage() examines every process, blocked or not
Wakes up every blocked process, regardless of what it’s waiting for
Prefer \(O(W)\) work, where \(W\) is the number of kernel tasks blocked in sys_block_until()

Idea 2: Associate each condition with a list of blocked tasks

Linked list of waiters

spinlock waiters_lock;
list<proc, blocking_links_> waiters;

    case SYSCALL_SET_STAGE: {
        stage = max(stage.load(), regs->reg_rdi);
        spinlock_guard guard(waiters_lock);
        while (auto p = waiters.pop_front()) {
            p->unblock();
        }
        return 0;
    }

    case SYSCALL_BLOCK_UNTIL:
        assert(!this->blocking_links_.is_linked());
        while (stage < regs->reg_rdi) {
            spinlock_guard guard(waiters_lock);
            waiters.push_back(this);
            this->pstate_ = ps_blocked;
            this->yield();
        }

Linked list of waiters, attempt 2

    case SYSCALL_BLOCK_UNTIL:
        assert(!this->blocking_links_.is_linked());
        while (stage < regs->reg_rdi) {
            spinlock_guard guard(waiters_lock);
            if (stage < regs->reg_rdi) {
                waiters.push_back(this);
                this->pstate_ = ps_blocked;
            }
            this->yield();
        }

Linked list of waiters, attempt 3

    case SYSCALL_BLOCK_UNTIL:
        assert(!this->blocking_links_.is_linked());
        while (stage < regs->reg_rdi) {
            spinlock_guard guard(waiters_lock);
            if (stage < regs->reg_rdi) {
                waiters.push_back(this);
                this->pstate_ = ps_blocked;
            }
            guard.unlock();
            this->yield();
        }

Waiting on multiple conditions

What if a task wants to block until one of several conditions occurs?
- There’s only one blocking_links_ per proc
- proc can be a member of at most one linked list at a time
When are the links needed?
- Only while a kernel task is blocked
What space is available while the kernel task is blocked?

Idea 3: Store `p`’s blocking-related links as a local variable on `p`’s kernel task stack

Local variables live in kernel task stacks
Kernel task stacks are preserved while a task is blocked

Wait queues

The wait queue abstraction represents these ideas

`struct waiter`, `struct wait_queue`

struct waiter {
    proc* p_;
    wait_queue* wq_;
    list_links links_;
};

struct wait_queue {
    list<waiter, &waiter::links_> q_;
    spinlock lock_;
};

Using `wait_queue`

wait_queue stage_waiters;

    case SYSCALL_SET_STAGE: {
        stage = max(stage.load(), regs->reg_rdi);
        stage_waiters.wake_all();
        return 0;
    }

    case SYSCALL_BLOCK_UNTIL: {
        waiter w;
        while (true) {
            w.prepare(stage_waiters);
                // expands to `stage_waiters.push_back(&w)`
                // + `this->pstate_ = ps_blocked`
            if (stage >= regs->reg_rdi) {
                break;
            }
            w.maybe_block();
                // expands to `this->yield()` + other stuff
        }
        w.clear();
            // expands to `stage_waiters.erase(&w)`
            // + `this->pstate_ = ps_runnable`
        return 0;
    }

`block_until` shorthand

    case SYSCALL_BLOCK_UNTIL: {
        waiter().block_until(stage_waiters, [] (&) {
            return stage >= regs->reg_rdi;
        });
        return 0;
    }

Links on stacks (drawing)

Process 2 calls sys_block_until(1)
Process 3 calls sys_block_until(1)
Process 4 calls sys_block_until(1)
Process 5 calls sys_set_stage(1)

Lecture 9: Blocking and wait queues

Blocking

Is this process blocked?

It depends on which thread you mean!

Blocking goals

System calls to illustrate blocking

Bad blocking

Key problem: Atomic blocking and unlocking

Synchronization invariants for blocking

Check twice?

Idea 1: Block, then check

Avoid spurious wakeups

Idea 2: Associate each condition with a list of blocked tasks

Linked list of waiters

Linked list of waiters, attempt 2

Linked list of waiters, attempt 3

Waiting on multiple conditions

Idea 3: Store p’s blocking-related links as a local variable on p’s kernel task stack

Wait queues

struct waiter, struct wait_queue

Using wait_queue

block_until shorthand

Links on stacks (drawing)

Idea 3: Store `p`’s blocking-related links as a local variable on `p`’s kernel task stack

`struct waiter`, `struct wait_queue`

Using `wait_queue`

`block_until` shorthand