[Kernel, courtesy IowaFarmer.com CornCam]

CS 235 Advanced Operating Systems, Winter 2008

Lab 3: User Environments

Due 11:59pm Tuesday, February 19

Introduction

In this lab you will implement the basic kernel facilities required to get a protected user-mode environment (i.e., "process") running. You will enhance the JOS kernel to set up the data structures to keep track of user environments, create a single user environment, load a program image into it, and start it running. You will also make the JOS kernel capable of handling any system calls the user environment makes and handling any other exceptions it causes.

Note: In this lab, the terms environment and process are interchangeable -- they have roughly the same meaning. We introduce the term "environment" instead of the traditional term "process" in order to stress the point that JOS environments do not provide the same semantics as UNIX processes, even though they are roughly comparable.

Getting Started

Download our reference code for lab 3 from lab3.tar.gz and untar it, then merge it into your CVS repository as you did for Lab 2. (See the CVS hints.)

Lab 3 contains a number of new source files, which you should browse through:

`inc/`	`env.h`	Public definitions for user-mode environments
	`syscall.h`	Public definitions for system calls from user environments to the kernel
	`lib.h`	Public definitions for the user-mode support library
`kern/`	`env.h`	Kernel-private definitions for user-mode environments
	`env.c`	Kernel code implementing user-mode environments
	`sched.h`	Schedule multiple user environments
	`syscall.h`	Kernel-private definitions for system call handling
	`syscall.c`	System call implementation code
`lib/`	`Makefrag`	Makefile fragment to build user-mode library, `obj/lib/libuser.a`
	`entry.S`	Assembly-language entry point for user environments
	`libmain.c`	User-mode library setup code called from `entry.S`
	`syscall.c`	User-mode system call stub functions
	`console.c`	User-mode implementations of `putchar` and `getchar`, providing console I/O
	`exit.c`	User-mode implementation of `exit`
	`panic.c`	User-mode implementation of `panic`
`user/`	`*`	Various test programs to check lab 3 functionality

In addition, a number of the source files we handed out for lab2 are modified in lab3. To see the differences, you can type:

$ cvs diff -u -rLAB2 -rLAB3

(using whatever tag names you chose when merging in the lab).

Lab Requirements

This lab is divided into three parts. As in lab 2, you will need to do all of the regular exercises described in the lab and write up brief answers to the questions posed in the lab. Please attempt at least one challenge problem. This lab is more challenging than the last; if you cannot complete your challenge problem, write up the design you were aiming for in technical detail. If you can complete it, provide a short (e.g., one or two paragraph) description of what you did. More challenge suggestions are welcome: send them to the class mailing list! Place the write-up in a file called answers.txt (plain text) or answers.html (HTML format) in the top level of your lab3 directory before handing in your work.

Passing all the gmake grade tests does not mean your code is perfect. It may have subtle bugs that will only be tickled by future labs. Keep in mind that debugging an operating system is hard: there are abstraction boundaries, but you can't necessarily place much trust in them since nothing is really enforcing them. If you get all sorts of weird crashes that don't seem to be explainable by a single bug in the layer you're working on, it's likely that they're explainable by a single bug in a different layer -- usually the virtual memory system.

Hand-In Procedure

As before, you can test your code against our test scripts by running gmake grade. When you are ready to hand in your lab code and write-up, run gmake tarball in your jos directory. This will create a file called lab3-yourusername.tar.gz, which you should submit via CourseWeb at 11:59pm on Tuesday, February 19. If you have problems with CourseWeb, you may also email me the file.

Part 1: User Environments

The new include file inc/env.h contains basic definitions for user environments in JOS. The kernel uses the Env data structure to keep track of critical data pertaining to each user environment. You will create just one environment at first, but you will design the JOS kernel to support multiple simultaneously active environments. In Part 3 of this lab you'll take advantage of this functionality by allowing a user environment to fork other environments.

As you can see in kern/env.c, the kernel maintains three main global variables pertaining to environments:

Env *envs = NULL;		// All environments
Env *curenv = NULL;	        // The current env
static Env *free_envs = NULL;	// Free list

Once JOS gets up and running, the envs pointer points to an array of Env structures representing all the environments in the system. In our design, the JOS kernel will support a maximum of NENV simultaneously active environments, although there will typically be far fewer running environments at any given time. (NENV is a constant #define'd in inc/env.h.) Once it is allocated, the envs array will contain a single instance of the Env data structure for each of the NENV possible environments.

The JOS kernel keeps all of the inactive Env structures on the free_envs list. This allows efficient environment allocation and deallocation, much as the free_pages list does for pages.

The kernel uses the curenv variable to keep track of the currently executing environment at any given time. During boot up, before the first environment is run, curenv is initially set to NULL.

Environment State

The Env structure is defined in inc/env.h as follows (although more fields will be added in future labs):

struct Env {
        Env *env_next;                  // Next env on the free list

        envid_t env_id;                 // Unique environment identifier
        envid_t env_parent_id;          // env_id of this env's parent
        unsigned env_status;            // Status of the environment
 
        pde_t *env_pgdir;               // Address space page directory
                                        // (kernel virtual address)
        struct Trapframe env_tf;        // Saved registers
        uint32_t env_runs;              // Number of times environment has run
};

We now briefly describe the state kept by the kernel for each user environment.

env_id

An integer value that uniquely identifies the environment currently using this Env structure (i.e., using this particular slot in the envs array). After a user environment terminates, the kernel may subsequently re-allocate the same Env structure to a different environment, but the env_id will be different. (After many, many allocations, however, the same env_id may reappear.) The Env structure for

envid_t
	e

is located at envs[ENVX(e)] (unless environment e was killed, and the slot was reused in the meantime).

env_parent_id

The env_id of the environment that created this environment. The environments form a tree or hierarchy, which will be useful for making security decisions about whether one environment can kill or map memory into another.

env_status

This variable holds one of the following values:

ENV_FREE: The Env structure is inactive, and therefore on the env_free_list.
ENV_RUNNABLE: The Env structure represents a currently active environment, and the environment is waiting to run on the processor.
ENV_NOT_RUNNABLE: The Env structure represents a currently active environment, but it is not currently ready to run: for example, because it is waiting for an interprocess communication (IPC) from another environment.

env_pgdir

This environment's address space. In x86-compatible processors, of course, an address space is represented by a page directory. The env_pgdir member is the kernel virtual address (>= KERNBASE) of the page directory.

env_tf

Holds the current state of an environment's registers while that environment is not running: i.e., when the kernel or a different environment is running. The kernel saves the processor state into env_tf when switching from user to kernel mode, so that the environment can later be resumed where it left off. We first saw struct Trapframe in Lab 2. (How did we use it there?)

env_runs

A simple counter that records how many times this environment has been run. Set to 0 when the environment is created.

env_next

A pointer for use in the singly-linked free environments list.

Like a Unix process, a JOS environment couples the concepts of "thread", or processor and stack context, and "address space", or memory context. The thread is defined primarily by the saved registers (the env_tf field), and the address space is defined by the page directory and page tables pointed to by env_pgdir. To run an environment, the kernel must set up the CPU with both the saved registers and the appropriate address space.

In JOS, individual environments do not have their own kernel stacks as processes do in Linux and other conventional UNIXes. Instead, all JOS kernel code runs on a single kernel stack, and the kernel saves user-mode register state explicitly in each struct Env's env_tf rather than implicitly on the relevant environment's kernel stack.

Allocating the Environments Array

In lab 2, you allocated memory in mem_init() for the pages array, which is a table the kernel uses to keep track of which pages are free and which are not. You will now need to modify mem_init() further to allocate a similar array of Env structures, called envs.

Exercise 1. Modify mem_init() in kern/pmap.c to allocate and map the envs array. This array consists of NENV instances of the Env structure, and is analogous to the pages array you created in Lab 2.

Creating and Running Environments

You will now write the code in kern/env.c necessary to run a user environment. Because we do not yet have a filesystem, we will set up the kernel to load a static ELF executable image that is embedded within the kernel itself.

Once you integrate our Lab 3 code with your Lab 2 solutions, you will notice that our makefiles generate a number of binary images in the obj/user/ directory. If you look at kern/Makefrag, you will notice some magic that "links" these binaries directly into the kernel executable as if they were .o files. The -b binary option on the linker command line causes these files to be linked in as "raw" uninterpreted binary files rather than as regular .o files produced by the compiler. (As far as the linker is concerned, these files do not have to be ELF images at all -- they could be anything, such as text files or pictures!) If you look at obj/kern/kernel.sym after building the kernel, you will notice that the linker has "magically" produced a number of funny symbols with names like _binary_obj_user_hello_start, _binary_obj_user_hello_end, and _binary_obj_user_hello_size. The linker generates these symbol names by mangling the file names of these binary files; the symbols provide provide the regular kernel code with a way to reference the embedded binary files.

In this lab, the kernel will start up and run one of those binary images. The code to select a binary image is in kern/init.c. The grade script links different binary images into your kernel, to test different properties of your user environment handling. If you're not running the grade script, the kernel normally runs the hello program, defined in user/hello.c, which will print

hello, world!

in the old-school manner when you've progressed far enough through this lab. You're free to run whatever binary you want, but don't change the version inside #ifdef TEST. In addition, our makefile system will let you run a particular program by typing gmake run-programname. For example, gmake run-hello will run the user/hello.c program (without a GUI), regardless of how you've edited kern/init.c.

To summarize some of QEMU and our Makefiles' convenient debugging features:

gmake run runs QEMU on the current kernel.
gmake run-programname runs QEMU on a kernel compiled to run programname.
gmake run-gdb and gmake run-gdb-programname act similarly, but start up QEMU to wait for an attachment from a GDB process.
If QEMU dies too quickly for you to see its output, try gmake run O=1 or gmake run-prog O=1. The O=1 (that's an "Oh", not a zero) tells QEMU to print the kernel's output to the terminal as well as to the screen.
Within QEMU press Shift-PageUp and Shift-PageDown to scroll through multiple screens of output.

In i386_init() in kern/init.c you'll see code to run one of these binary images in an environment. However, the critical functions to set up user environments are not complete; you will need to fill them in.

Exercise 2 (Long!). In the file env.c, finish coding the following functions:

env_init():: Initialize all of the Env structures in the envs array and add them to the free_envs list.
env_mem_init():: Allocate a page directory for a new environment and initialize the kernel portion of the new environment's address space.
load_elf():: Parse an ELF binary image, much like the boot loader already does, and load its contents into the user address space of a new environment.
env_create():: Allocate an environment with env_alloc and call load_elf to load an ELF binary into it.
env_run():: Run the given environment in user mode.

As you write these functions, you might find cprintf's new %e coverter useful -- it prints a description corresponding to an error code. For example,

	r = -E_NO_MEM;
	panic("env_alloc: %e", r);

will panic with the message "env_alloc: out of memory".

Once you are done you should compile your kernel and run it. If all goes well, your system should crash when the user program tries to make a system call, since you haven't implemented system calls yet. This will appear as a General Protection Fault, trap type 0xd. The TRAP frame's EIP should point at an int $0x30 instruction in hello's code. (Look at obj/user/hello.asm to check the EIP.) Here is a call graph of the code up to the point where the user code is invoked. Make sure you understand the purpose of each step.

start (kern/entry.S)
i386_init

cons_init
mem_init
page_init
idt_init
env_init
env_create
env_run
- env_iret

At this point, Bochs will start running user/hello.c in user mode! To see how this happens, use gmake run-gdb and set a GDB breakpoint at env_iret, which should be the last function you hit before actually entering user mode, with the b env_iret command. (You must be in 32bitmode to set the breakpoint. GDB loads the kernel's symbols from the kernel ELF file, which is how it can translate env_iret to a code address.)

Step through env_iret; the processor should enter user mode after the iret instruction. (How can you tell?) You should then see the first instruction in the user environment's executable, which is the cmpl instruction at the label start in lib/entry.S. If you continue past this point, hello should run successfully until it first hits an int $48 instruction, which is what user-mode code executes to make a system call. (See lib/syscall.c to see how this works.) Then, your trap code from the previous lab should activate and kill the environment! (We've changed trap() to handle uncaught user-mode exceptions by killing the offending environment.) If you cannot get to this point, then something is wrong with your address space setup or program loading code; go back and fix it before continuing.

If you run make grade at this point, you should pass the divzero, breakpoint, softint, and badsegment tests, and get 20 points. (Your breakpoint [backtrace] test will fail, however; this is fixed in Exercise 9.)

Question:

Did you have to do anything to make the user/softint program behave correctly (i.e., generate a general protection fault, as the grade script expects)? Why is this the correct behavior? What happens if the kernel actually allows softint's int $14 instruction to invoke the kernel's page fault handler (which is interrupt number 14)?

Part 2: User-Level Exceptions and System Calls

Now, we'll update the exception handling support you added to the last lab, using it to provide important operating system functionality.

The Breakpoint Exception

In the last lab, you turned the breakpoint exception, interrupt number 3 (T_BRKPT), into a primitive debugging instruction that invokes the JOS kernel monitor. The user-mode implementation of panic() in lib/panic.c, for example, performs an int3 after displaying its panic message. Make sure at this point that this functionality works! The breakpoint user program tests it by invoking an int3 instruction.

Challenge Note: If you implemented the single-stepping challenge in Lab 2, you might want to verify that your code works on user-level programs too.

Question:

Executing int3 at user level might deliver a general protection fault to the kernel, rather than a breakpoint exception, depending on how you initialized the breakpoint entry in the IDT (i.e., your call to SETGATE from idt_init). What change would you make to cause user-level breakpoints to generate a GPF? Why does this functionality exist?

Page Faults

The page fault exception, interrupt number 14 (T_PGFLT), is a particularly important one that we will exercise heavily throughout this lab and the next. When the processor takes a page fault, it stores the linear address that caused the fault in a special processor control register, CR2. In trap.c we have provided the beginnings of a special function, page_fault_handler(), to handle page fault exceptions.

Exercise 3. Modify trap() to dispatch page fault exceptions to page_fault_handler(). You should now be able to get make grade to succeed on the faultread, faultreadkernel, faultwrite, and faultwritekernel tests. If any of them don't work, figure out why and fix them.

You will further refine the kernel's page fault handling below, as you implement system calls.

System Calls

User processes ask the kernel to do things for them by invoking system calls. When the user process invokes a system call, the processor enters kernel mode, the processor and the kernel cooperate to save the user process's state, and the kernel executes appropriate code in order to carry out the system call. When it's done, it resumes the user process.

The exact details of how the user process gets the kernel's attention and how it specifies which call it wants to execute vary from system to system. In the x86 kernel, we will use the int instruction, which causes a processor interrupt. In particular, int $48 will cause a system call interrupt. We have defined the constant T_SYSCALL to 48. You will have to set up the interrupt descriptor to allow user processes to cause that interrupt; this causes no ambiguity since hardware cannot cause it.

In the x86 kernel, we will pass the system call number and the system call arguments in registers. This way, we don't need to grub around in the user environment's stack or instruction stream. The system call number will go in %eax, and the arguments (up to five of them) will go in %edx, %ecx, %ebx, %edi, and %esi, respectively. The kernel passes the return value back in %eax. The assembly code to invoke a system call has been written for you, in syscall() in lib/syscall.c. You should read through it and make sure you understand what is going on. You may also find it helpful to read inc/syscall.h.

Exercise 4. Add a handler in the kernel for interrupt number T_SYSCALL. You will have to edit kern/trapentry.S and kern/trap.c's idt_init(). You also need to change trap() to handle the system call interrupt by calling syscall() (defined in kern/syscall.c) with the appropriate arguments, and then arranging for the return value to be passed back to the user environment in %eax.

Finally, you need to implement syscall() in kern/syscall.c; it should dispatch to one of the sys_ functions defined there. See inc/syscall.h for system call numbers. Make sure syscall() returns -E_INVAL if the system call number is invalid. You'll only need SYS_cputs, SYS_cgetc, SYS_getenvid, and SYS_env_destroy for now, but might as well add stubs for them all.

Run the hello program under your kernel. It should print "hello, world" on the console and then cause a page fault in user mode. If this does not happen, it probably means your system call handler isn't quite right. If you the kernel doesn't appear to be receiving a system call interrupt, check your call to SETGATE: are the privileges right?

Challenge! Implement system calls using the sysenter and sysexit instructions instead of using int $48 and iret.

The sysenter/sysexit instructions were designed by Intel to be faster than int/iret. They do this by using registers instead of the stack and by making assumptions about how the segmentation registers are used. The exact details of these instructions can be found in Volume 2B of the Intel reference manuals.

The easiest way to add support for these instructions in JOS is to add a sysenter_handler in kern/trapentry.S that creates the same trap frame that is normally created by an int $48 instruction (being sure to save the correct return address and stack pointer provided by the user environment). Then, instead of calling into trap, push the arguments to syscall and call syscall directly. Once syscall returns, set everything up for and execute the sysexit instruction.

You will also need to add code to kern/init.c to set up the necessary model specific registers (MSRs). Look at the enable_sep_cpu function in this diff for an example of this, and you can find an implementation of wrmsr to add to /inc/x86.h here). Finally, lib/syscall.c must be changed to support making a system call with sysenter. Here is a possible register layout for the sysenter instruction:

	eax                - syscall number
	edx, ecx, ebx, edi - arg1, arg2, arg3, arg4
	esi                - return pc
	ebp                - return esp
	esp                - trashed by sysenter

GCC's inline assembler does not support directly loading values into ebp, so you will need to add code to save (push) and restore (pop) it yourself (and you may want to do the same thing for esi as well). The return address can be put into esi by using an instruction like leal after_sysenter_label, %esi.

Note that this only supports 4 arguments, so you will need to leave the old method of doing system calls around to support 5 argument system calls as well.

User-mode Environment Setup

Now, you'll fix the user-mode page fault in user/hello.c.

JOS is designed to export as much kernel information (physical names) to user programs as possible. In particular, JOS programs expect to be able to see how many physical pages are free, and the state of every other environment in the system. (Question: Is this an information leak?) Rather than providing system calls for environments to extract the information, JOS simply maps read-only copies of the pages[] and envs[] arrays into every environment's address space.

Exercise 5. Edit mem_init to set up mappings for the UPAGES address range, which should map to a read-only version of pages[] and UENVS, which should map to a read-only version of envs[].

Why the crash, even after this exercise? The umain function tries to access env->env_id. The JOS library OS is supposed to set the global pointer env to point at the current environment's struct Env, in the read-only copy of the envs[] array you allocated in Part 1. This global pointer lets the environment efficiently access its state. But currently the pointer is just null.

Exercise 6. JOS user programs start running at the top of lib/entry.S. Trace through, find the point where env should be set, and set it. Note that lib/entry.S has already defined envs to point at the UENVS mapping you set up in lab 2. Hint: You'll want to use a system call.

This is the first point in the lab where you test the user-level read-only mapping of envs[] at UENVS, so you may want to check your code from Part 1 if you have problems here. And don't forget that envid_ts aren't just linear indexes into the envs[] array!

At this point, user/hello should print "hello, world", then "i am environment 00001000". It then attempts to "exit" by calling sys_env_destroy() (see lib/libmain.c and lib/exit.c). Since the kernel currently only supports one user environment, it should report that it has destroyed all environments and then drop into the "idle loop", which for JOS is just the kernel monitor.

Page faults and memory protection

In this section of the lab, you'll begin refining JOS's response to user-level page fault exceptions, which happen when an application tries to access an invalid address or an address for which it has no permissions. Memory protection is a crucial operating system feature, since it can help the OS ensure that bugs in one program cannot corrupt other programs or the operating system itself.

On an invalid access, the processor stops the program at the instruction causing the fault and then traps into the kernel with information about the attempted operation. If the fault is fixable, the kernel can fix it and let the program continue running. If the fault is not fixable, then the program cannot continue, since it will never get past the instruction causing the fault.

As an example of a fixable fault, consider an automatically extended stack. In many systems the kernel initially allocates a single stack page, and then if a program faults accessing pages further down the stack, the kernel will allocate those pages automatically and let the program continue. By doing this, the kernel only allocates as much stack memory as the program needs, but the program can work under the illusion that it has an arbitrarily large stack.

System calls present an interesting problem for memory protection. Most system call interfaces let user programs pass pointers to the kernel. These pointers point at user buffers to be read or written. The kernel then dereferences these pointers while carrying out the system call. There are two problems with this:

A page fault in the kernel is taken a lot more seriously than a page fault in a user program. If the kernel page faults, that's usually a kernel bug, and the fault handler will panic the kernel (and hence the whole system). In a system call, when the kernel is dereferencing pointers to the user's address space, we need a way to prevent or catch any page faults these dereferences cause.
The kernel typically has more memory permissions than the user program. The user program might ask the kernel to read from or write to a location in kernel memory that the user program cannot access but that the kernel can. If the kernel is not careful, a buggy or malicious user program can trick the kernel into using its greater privilege in unintended ways, possibly so as to destroy the integrity of the kernel completely.

This second danger is one instance of a classic security problem known as the "confused deputy" problem. The kernel is acting as a trusted "deputy", which has the special privileges necessary to implement important services needed by untrusted users -- but if users can confuse the kernel into using those special privileges in unintended ways, the security model breaks down.

For both of these reasons the kernel must be extremely careful when handling pointers presented by user programs.

You will now implement solutions to these two problems by writing a function, user_mem_check, that checks that the memory addresses a user specified are OK for that user to access. Then, anywhere a user pointer appears, your kernel will call user_mem_assert to check the pointer. (User_mem_assert calls user_mem_check.) If there is any problem with the pointer, user_mem_assert will destroy the corresponding user environment.

Exercise 7. Implement kern/pmap.c's user_mem_check function. Check that the supplied range is valid user memory (i.e., below ULIM), and that the user has the necessary permission throughout the range. Make sure you set the user_mem_check_addr variable to the faulting address, if there is a fault.

Exercise 8. Change sys_cputs in kern/syscall.c to correctly check the user's supplied pointer before using it. user_mem_assert may be useful.

Change kern/init.c to run user/buggyhello instead of user/hello. This code dereferences an almost-null pointer, causing a segmentation fault. When you compile your kernel and boot it, the environment should be destroyed, and the kernel should not panic. You should see:

	[00000000] new env 00001000
	[00001000] user_mem_check va 00000001
	[00001000] free env 00001000
	Idle loop - nothing more to do!

(The user_mem_check va may differ slightly, but it should be on the same page.) Now change kern/init.c to run user/evilhello. This code tries to be a bit sneakier and print the contents of valid kernel memory, rather than random unmapped memory. Still, when you compile your kernel and boot it, the kernel should not panic; you should see:

	[00000000] new env 00001000
	[00001000] user_mem_check va f0100020
	[00001000] free env 00001000

Exercise 9. Update page_fault_handler in kern/trap.c so that kernel-mode page faults call panic (as described above).

Question:

Would it be harder to implement a safe cputs system call that took a null-terminated string, instead of a string and a length? Why or why not?

User-Level Debugging Information

Like the JOS kernel, JOS user-level programs have debugging information linked in and ready to go. However, this information is a bit harder to get to. The kernel has ready-made __STAB_BEGIN__, __STAB_END__, __STABSTR_BEGIN__, and __STABSTR_END__ symbols telling it where to find the STABS and string table. In user-level applications, the linker script constructs a small structure containing these values that will be loaded at address USTABDATA (or 0x200000). The kernel must load that structure to find the tables, then look in the tables themselves.

The USTABDATA values are user pointers, of course, so they must be checked!

Exercise 10. Change debuginfo_eip in kern/kdebug.c to correctly check user-level pointers before accessing them.

Also, update your mon_backtrace in kern/monitor.c to behave better for user-level applications. This requires two changes. When producing a backtrace for a trapframe (tf != NULL), you should first print a symbolic backtrace line corresponding to tf->tf_eip. Second, during the backtrace, validate any user-level pointers you dereference, and print ? signs or break out of the backtrace rather than dereferencing an invalid pointer.

If you run the user/breakpoint.c program, then type backtrace at the monitor prompt, you should see a backtrace like this:

Stack backtrace:
     user/breakpoint.c:11: _Z5umainiPPc+47 (0 arg)
  0: ebp eeffdfd0  eip 0080007b  args 00000000 00000000 eeffdff0 0080004c
     lib/libmain.c:42: libmain+3f (2 arg)
  1: ebp eeffdff0  eip 00800031  args 00000000 00000000 ? ?
     lib/entry.S:48: <unknown>+0 (0 arg)

Note the ? marks after args in the last frame. This frame is at the very top of the stack, so after two arguments the addresses go above USTACKTOP and into unmapped memory.

Part 3: Creating User Environments and Cooperative Multitasking

Now, you'll implement some new JOS kernel system calls to allow user-level environments to create additional new environments. You will also implement cooperative round-robin scheduling, allowing the kernel to switch from one environment to another when the current environment voluntarily relinquishes the CPU (or exits). In the next lab you'll implement preemptive scheduling, which allows the kernel to re-take control of the CPU from an environment even if the environment does not cooperate.

Round-Robin Scheduling

Your first task in this lab is to change the JOS kernel so that it does not always just run the environment in envs[0], but instead can alternate between multiple environments in "round-robin" fashion. Round-robin scheduling in JOS works as follows:

The function sched_yield() in the new kern/sched.c is responsible for selecting a new environment to run. It searches sequentially through the envs[] array in circular fashion, starting just after the previously running environment (or at the beginning of the array if there was no previously running environment), picks the first environment it finds with a status of ENV_RUNNABLE (see inc/env.h), and calls env_run() to jump into that environment.
User environments call the sys_yield() system call to invoke the kernel's sched_yield() function, and thereby voluntarily give up the CPU to a different environment. As you can see in user/idle.c, the idle environment does this routinely.
If no environments are runnable, sched_yield() drops into the idle loop, which in JOS is just the kernel monitor.

Exercise 11. Implement round-robin scheduling in sched_yield() as described above. Don't forget to modify syscall() to dispatch sys_yield().

Modify kern/init.c to create two (or more!) environments that all run the program user/yield.c. You should see the environments switch back and forth between each other five times before terminating, at which point the idle loop runs. If this does not happen or the output looks wrong, then fix your code before proceeding.

Question:

In your implementation of env_run() you should have called lcr3(). This loads the %cr3 register, and instantly changes the addressing context used by the MMU. But virtual addresses, such as e itself, have meaning relative to a given address context. Why can the pointer e be dereferenced both before and after the addressing switch?

Challenge! Add a less trivial scheduling policy to the kernel, such as a strict priority scheduler that allows each environment to be assigned a priority and ensures that higher-priority environments are always chosen in preference to lower-priority environments. If you're feeling really adventurous, try implementing a Unix-style priority-usage scheduler or even a lottery or stride scheduler. (Look up "lottery scheduling" and "stride scheduling" in Google.)

Write a test or two that verifies that your scheduling algorithm is working correctly (i.e., the right environments get run in the right order).

Challenge! The JOS kernel currently does not allow applications to use the x86 processor's x87 floating-point unit (FPU), MMX instructions, or Streaming SIMD Extensions (SSE). Extend the Env structure to provide a save area for the processor's floating point state, and extend the context switching code to save and restore this state properly when switching from one environment to another. The FXSAVE and FXRSTOR instructions may be useful, but note that these are not in the old i386 user's manual because they were introduced in more recent processors. Write a user-level test program that does something cool with floating-point.

System Calls for Environment Creation

Although your kernel is now capable of running and switching between multiple user-level environments, it is still limited to running environments that the kernel initially set up. You will now implement the necessary JOS system calls to allow user environments to create and start other new user environments.

Unix provides the fork() system call as its process creation primitive. Unix fork() copies the entire address space of the calling process (the parent) to create a new process (the child). The only differences between the two observable from user space are their process IDs and parent process IDs (as returned by getpid and getppid). In the parent, fork() returns the child's process ID, while in the child, fork() returns 0. By default, each process gets its own private address space, and neither process's modifications to memory are visible to the other.

You will provide a different, much more primitive set of JOS system calls for creating new user-mode environments. With these system calls you will be able to implement a Unix-like fork() entirely in user space, in addition to other styles of environment creation. The new system calls you will write for JOS are as follows:

sys_exofork: This system call creates a new environment with an almost blank slate: nothing is mapped in the user portion of its address space, and it is not runnable. The new environment will have the same register state as the parent environment at the time of the sys_exofork call. In the parent, sys_exofork will return the envid_t of the newly created environment (or a negative error code if the environment allocation failed). In the child, however, it will return 0. (Since the child starts out marked as not runnable, sys_exofork will not actually return in the child until the parent has explicitly allowed this by marking the child runnable using....)
sys_env_set_status: Sets the status of a specified environment to ENV_RUNNABLE or ENV_NOT_RUNNABLE. This system call is typically used to mark a new environment ready to run, once its address space and register state has been fully initialized.
sys_page_alloc: Allocates a page of physical memory and maps it at a given virtual address in a given environment's address space.
sys_page_map: Copy a page mapping (not the contents of a page!) from one environment's address space to another, leaving a memory sharing arrangement in place so that the new and the old mappings can both be used to access the same page of physical memory.
sys_page_unmap: Unmap a page mapped at a given virtual address in a given environment.

In any of the system calls that accept environment IDs, the JOS kernel supports the convention that a value of 0 means "the current environment." This convention is implemented by envid2env() in kern/env.c.

We have provided a very primitive implementation of a Unix-like fork() in the test program user/dumbfork.c. This test program uses the above system calls to create and run a child environment with a copy of its own address space. The two environments then switch back and forth using sys_yield as in the previous exercise. The parent exits after 10 iterations, whereas the child exits after 20.

Exercise 12. Implement the system calls described above in kern/syscall.c. You will need to use various functions in kern/pmap.c and kern/env.c, particularly envid2env(). Whenever you call envid2env(), pass 1 in the checkperm parameter to check permissions. Be sure you check for any invalid system call arguments, returning -E_INVAL in that case. Test your JOS kernel with user/dumbfork and make sure it works before proceeding.

Challenge! Add the additional system calls necessary to read all of the vital state of an existing environment as well as set it up. Then implement a user mode program that forks off a child environment, runs it for a while (e.g., a few iterations of sys_yield()), then takes a complete snaphost or checkpoint of the child environment, runs the child for a while longer, and finally restores the child environment to the state it was in at the checkpoint and continues it from there. Thus, you are effectively "replaying" the execution of the child environment from an intermediate state. Make the child environment perform some interaction with the user using sys_cgetc() or readline() so that the user can view and mutate its internal state, and verify that with your checkpoint/restart functionality you can give the child environment a case of selective amnesia, making it "forget" everthing that happened beyond a certain point.

This completes the lab.

Back to CS 235 Advanced Operating Systems, Winter 2008