|
Lab 3: User Environments
Due 11:59pm Tuesday, February 19
Introduction
In this lab you will implement the basic kernel facilities
required to get a protected user-mode environment (i.e., "process") running.
You will enhance the JOS kernel
to set up the data structures to keep track of user environments,
create a single user environment,
load a program image into it,
and start it running.
You will also make the JOS kernel capable
of handling any system calls the user environment makes
and handling any other exceptions it causes.
Note:
In this lab, the terms environment and process are
interchangeable -- they have roughly the same meaning. We introduce the
term "environment" instead of the traditional term "process"
in order to stress the point that JOS environments do not provide
the same semantics as UNIX processes,
even though they are roughly comparable.
Getting Started
Download our reference code for lab 3 from lab3.tar.gz and untar it, then merge it into your
CVS repository as you did for Lab 2. (See the CVS hints.)
Lab 3 contains a number of new source files,
which you should browse through:
inc/ |
env.h |
Public definitions for user-mode environments |
|
syscall.h |
Public definitions for system calls
from user environments to the kernel |
|
lib.h |
Public definitions for the user-mode support library |
kern/ |
env.h |
Kernel-private definitions for user-mode environments |
|
env.c |
Kernel code implementing user-mode environments |
|
sched.h |
Schedule multiple user environments |
|
syscall.h |
Kernel-private definitions for system call handling |
|
syscall.c |
System call implementation code |
lib/ |
Makefrag |
Makefile fragment to build user-mode library,
obj/lib/libuser.a |
|
entry.S |
Assembly-language entry point for user environments |
|
libmain.c |
User-mode library setup code called from entry.S |
|
syscall.c |
User-mode system call stub functions |
|
console.c |
User-mode implementations of
putchar and getchar,
providing console I/O |
|
exit.c |
User-mode implementation of exit |
|
panic.c |
User-mode implementation of panic |
user/ |
* |
Various test programs to check lab 3 functionality |
In addition, a number of the source files we handed out for lab2
are modified in lab3.
To see the differences, you can type:
$ cvs diff -u -rLAB2 -rLAB3
(using whatever tag names you chose when merging in the lab).
Lab Requirements
This lab is divided into three parts.
As in lab 2,
you will need to do all of the regular exercises described in the lab
and write up brief answers
to the questions posed in the lab.
Please attempt at least one challenge problem.
This lab is more challenging than the last; if you cannot complete your
challenge problem, write up the design you were aiming for in technical detail.
If you can complete it, provide
a short (e.g., one or two paragraph) description of what you did.
More challenge suggestions are welcome: send them to the class mailing list!
Place the write-up in a file called answers.txt (plain text)
or answers.html (HTML format)
in the top level of your lab3 directory
before handing in your work.
Passing all the gmake grade tests
does not mean your code is perfect. It may have subtle bugs that will
only be tickled by future labs.
Keep in mind that debugging an operating system is hard:
there are abstraction boundaries, but you
can't necessarily place much trust in them since nothing is really
enforcing them. If you get all sorts of weird crashes
that don't seem to be explainable by a single bug in the layer you're
working on, it's likely that they're explainable by a single bug in
a different layer -- usually the virtual memory system.
Hand-In Procedure
As before,
you can test your code against our test scripts
by running gmake grade .
When you are ready to hand in your lab code and write-up, run gmake tarball in your jos directory. This
will create a file called lab3-yourusername.tar.gz ,
which you should submit via CourseWeb at 11:59pm on Tuesday, February 19.
If you have problems with CourseWeb, you may also email me the file.
Part 1: User Environments
The new include file inc/env.h
contains basic definitions for user environments in JOS.
The kernel uses the Env data structure
to keep track of critical data pertaining to each user environment.
You will create just one environment at first,
but you will design the JOS kernel
to support multiple simultaneously active environments.
In Part 3 of this lab you'll take advantage of this functionality
by allowing a user environment to fork other environments.
As you can see in kern/env.c,
the kernel maintains three main global variables
pertaining to environments:
Env *envs = NULL; // All environments
Env *curenv = NULL; // The current env
static Env *free_envs = NULL; // Free list
Once JOS gets up and running,
the envs pointer points to an array of Env structures
representing all the environments in the system.
In our design,
the JOS kernel will support a maximum of NENV
simultaneously active environments,
although there will typically be far fewer running environments
at any given time.
(NENV is a constant #define'd in inc/env.h.)
Once it is allocated,
the envs array will contain
a single instance of the Env data structure
for each of the NENV possible environments.
The JOS kernel keeps all of the inactive Env structures
on the free_envs list.
This allows efficient environment allocation and
deallocation, much as the free_pages list does for pages.
The kernel uses the curenv variable
to keep track of the currently executing environment at any given time.
During boot up, before the first environment is run,
curenv is initially set to NULL .
Environment State
The Env structure
is defined in inc/env.h as follows
(although more fields will be added in future labs):
struct Env {
Env *env_next; // Next env on the free list
envid_t env_id; // Unique environment identifier
envid_t env_parent_id; // env_id of this env's parent
unsigned env_status; // Status of the environment
pde_t *env_pgdir; // Address space page directory
// (kernel virtual address)
struct Trapframe env_tf; // Saved registers
uint32_t env_runs; // Number of times environment has run
};
We now briefly describe the state kept by the kernel for each user
environment.
- env_id
- An integer value that uniquely identifies the environment currently
using this
Env structure (i.e., using this particular
slot in the envs array). After a user environment
terminates, the kernel may subsequently re-allocate the same
Env structure to a different environment, but the
env_id will be different. (After many, many allocations,
however, the same env_id may reappear.)
The Env structure for envid_t
e is located at envs[ENVX(e)] (unless
environment e was killed, and the slot was reused in
the meantime).
- env_parent_id
- The
env_id of the environment that created this environment.
The environments form a tree or hierarchy,
which will be useful for making security decisions
about whether one environment can kill or map memory into another.
- env_status
- This variable holds one of the following values:
ENV_FREE
- The Env structure is inactive,
and therefore on the env_free_list.
ENV_RUNNABLE
- The Env structure
represents a currently active environment,
and the environment is waiting to run on the processor.
ENV_NOT_RUNNABLE
- The Env structure
represents a currently active environment,
but it is not currently ready to run:
for example, because it is waiting
for an interprocess communication (IPC)
from another environment.
- env_pgdir
- This environment's address space. In x86-compatible processors, of
course, an address space is represented by a page directory. The
env_pgdir member is the kernel virtual address
(>= KERNBASE ) of the page directory.
- env_tf
- Holds the current state of an environment's registers while that
environment is not running: i.e., when the kernel or a different
environment is running. The kernel saves the processor state into
env_tf when switching from user to kernel mode, so that the
environment can later be resumed where it left off. We first saw
struct Trapframe in Lab 2. (How did
we use it there?)
- env_runs
- A simple counter that records how many times this environment has been
run. Set to 0 when the environment is created.
- env_next
- A pointer for use in the singly-linked free environments list.
Like a Unix process, a JOS environment couples the concepts of "thread", or
processor and stack context, and "address space", or memory context. The
thread is defined primarily by the saved registers (the env_tf
field), and the address space is defined by the page directory and page
tables pointed to by env_pgdir . To run
an environment, the kernel must set up the CPU with both the saved
registers and the appropriate address space.
In JOS,
individual environments do not have their own kernel stacks
as processes do in Linux and other conventional UNIXes.
Instead, all JOS kernel code runs on a single kernel stack,
and the kernel saves user-mode register state explicitly
in each struct Env 's env_tf
rather than implicitly on the relevant environment's kernel stack.
Allocating the Environments Array
In lab 2,
you allocated memory in mem_init()
for the pages array,
which is a table the kernel uses to keep track of
which pages are free and which are not.
You will now need to modify mem_init() further
to allocate a similar array of Env structures,
called envs.
Exercise 1.
Modify mem_init() in kern/pmap.c
to allocate and map the envs array.
This array consists of
NENV instances of the Env structure,
and is analogous to the pages array you created in Lab 2.
|
Creating and Running Environments
You will now write the code in kern/env.c
necessary to run a user environment.
Because we do not yet have a filesystem,
we will set up the kernel to load a static ELF executable image
that is embedded within the kernel itself.
Once you integrate our Lab 3 code with your Lab 2 solutions,
you will notice that our makefiles generate a number of binary images
in the obj/user/ directory.
If you look at kern/Makefrag,
you will notice some magic that "links" these binaries
directly into the kernel executable
as if they were .o files.
The -b binary option on the linker command line
causes these files to be linked in as "raw" uninterpreted binary files
rather than as regular .o files produced by the compiler.
(As far as the linker is concerned,
these files do not have to be ELF images at all --
they could be anything, such as text files or pictures!)
If you look at obj/kern/kernel.sym after building the kernel,
you will notice that the linker has "magically" produced
a number of funny symbols with names like
_binary_obj_user_hello_start,
_binary_obj_user_hello_end, and
_binary_obj_user_hello_size.
The linker generates these symbol names
by mangling the file names of these binary files;
the symbols provide provide the regular kernel code with a way
to reference the embedded binary files.
In this lab, the kernel will start up and run one of those binary images.
The code to select a binary image is in kern/init.c .
The grade script links different binary images into your kernel, to test
different properties of your user environment handling. If you're not
running the grade script, the kernel normally runs the hello
program, defined in user/hello.c , which will print
hello, world!
in the old-school manner when you've progressed far enough through this lab.
You're free to run whatever binary you want, but
don't change the version inside #ifdef TEST .
In addition, our makefile system will let you run a particular program by
typing gmake run-programname . For
example, gmake run-hello will run the
user/hello.c program (without a GUI), regardless of how you've
edited kern/init.c .
To summarize some of QEMU and our Makefiles' convenient
debugging features:
gmake run runs QEMU on the current kernel.
gmake run-programname runs QEMU on a kernel compiled to run programname .
gmake run-gdb and gmake run-gdb-programname act similarly, but start up QEMU to wait for an attachment from a GDB process.
- If QEMU dies too quickly for you to see its output, try
gmake run O=1 or gmake
run-prog O=1 . The O=1 (that's an
"Oh", not a zero) tells QEMU to print the kernel's output to the terminal
as well as to the screen.
- Within QEMU press Shift-PageUp and Shift-PageDown to scroll through multiple screens of output.
In i386_init() in kern/init.c
you'll see code to run one of these binary images in an environment.
However,
the critical functions to set up user environments are not complete;
you will need to fill them in.
Exercise 2 (Long!).
In the file env.c ,
finish coding the following functions:
- env_init():
- Initialize all of the Env structures
in the envs array
and add them to the free_envs list.
- env_mem_init():
- Allocate a page directory for a new environment
and initialize the kernel portion
of the new environment's address space.
- load_elf():
- Parse an ELF binary image,
much like the boot loader already does,
and load its contents into the user address space
of a new environment.
- env_create():
- Allocate an environment with env_alloc
and call load_elf to load an ELF binary into it.
- env_run():
- Run the given environment in user mode.
As you write these functions,
you might find cprintf 's new %e coverter
useful -- it prints a description corresponding to an error code.
For example,
r = -E_NO_MEM;
panic("env_alloc: %e", r);
will panic with the message "env_alloc: out of memory".
|
Once you are done you should compile your kernel and run it.
If all goes well, your system should crash when the user program
tries to make a system call, since you haven't implemented
system calls yet. This will appear as a General Protection Fault, trap
type 0xd. The TRAP frame's EIP should point at an int $0x30
instruction in hello 's code. (Look at
obj/user/hello.asm to check the EIP.)
Here is a call graph of the code up to the point where the user
code is invoked.
Make sure you understand the purpose of each step.
-
start (kern/entry.S )
-
i386_init
-
cons_init
-
mem_init
-
page_init
-
idt_init
-
env_init
-
env_create
-
env_run
At this point, Bochs will start running user/hello.c
in user mode!
To see how this happens,
use gmake run-gdb
and set a GDB breakpoint at env_iret,
which should be the last function you hit before actually entering user mode,
with the b env_iret command. (You must be in
32bitmode to set the breakpoint. GDB loads the kernel's symbols from the kernel ELF file, which is how it can translate env_iret to a code address.)
Step through env_iret ;
the processor should enter user mode after the iret instruction.
(How can you tell?)
You should then see the first instruction
in the user environment's executable,
which is the cmpl instruction at the label start
in lib/entry.S.
If you continue past this point, hello should run successfully
until it first hits an int $48 instruction,
which is what user-mode code executes
to make a system call.
(See lib/syscall.c to see how this works.)
Then, your trap code from the previous lab should activate
and kill the environment!
(We've changed trap() to handle
uncaught user-mode exceptions by killing the offending environment.)
If you cannot get to this point,
then something is wrong with your address space setup
or program loading code;
go back and fix it before continuing.
If you run make grade at this point, you should pass the
divzero , breakpoint , softint , and
badsegment tests, and get 20 points. (Your breakpoint
[backtrace] test will fail, however; this is fixed in Exercise
9.)
Question:
- Did you have to do anything
to make the user/softint program behave correctly
(i.e., generate a general protection fault, as the grade script expects)?
Why is this the correct behavior?
What happens if the kernel actually allows softint's
int $14 instruction to invoke the kernel's page fault handler
(which is interrupt number 14)?
|
Part 2: User-Level Exceptions and System Calls
Now, we'll update the exception handling support you added to the last
lab, using it to provide important operating
system functionality.
The Breakpoint Exception
In the last lab, you turned the breakpoint exception, interrupt number 3
(T_BRKPT ), into a primitive debugging instruction that invokes
the JOS kernel monitor. The user-mode implementation of panic()
in lib/panic.c, for example, performs an int3 after
displaying its panic message. Make sure at this point that this
functionality works! The breakpoint user program tests it by
invoking an int3 instruction.
Challenge Note: If you implemented
the single-stepping challenge in Lab 2, you might want to verify that your
code works on user-level programs too. |
Question:
- Executing
int3 at user level might deliver a general
protection fault to the kernel, rather than a breakpoint exception,
depending on how you initialized the breakpoint entry in the IDT
(i.e., your call to SETGATE from
idt_init ). What change would you make to cause
user-level breakpoints to generate a GPF? Why does this
functionality exist? |
Page Faults
The page fault exception, interrupt number 14 (T_PGFLT),
is a particularly important one that we will exercise heavily
throughout this lab and the next.
When the processor takes a page fault,
it stores the linear address that caused the fault
in a special processor control register, CR2.
In trap.c
we have provided the beginnings of a special function,
page_fault_handler(),
to handle page fault exceptions.
Exercise 3.
Modify trap()
to dispatch page fault exceptions
to page_fault_handler().
You should now be able to get make grade
to succeed on the faultread, faultreadkernel,
faultwrite, and faultwritekernel tests.
If any of them don't work, figure out why and fix them.
|
You will further refine the kernel's page fault handling below,
as you implement system calls.
System Calls
User processes ask the kernel to do things for them by
invoking system calls. When the user process invokes a system call,
the processor enters kernel mode,
the processor and the kernel cooperate
to save the user process's state,
and the kernel executes appropriate code in order to carry out the system
call. When it's done, it resumes the user process.
The exact
details of how the user process gets the kernel's attention
and how it specifies which call it wants to execute vary
from system to system.
In the x86 kernel, we will use the int
instruction, which causes a processor interrupt.
In particular, int $48
will cause a system call interrupt.
We have defined the constant
T_SYSCALL to 48. You will have to
set up the interrupt descriptor to allow user processes to
cause that interrupt;
this causes no ambiguity since hardware cannot cause it.
In the x86 kernel, we will pass the system call number and
the system call arguments in registers. This way, we don't
need to grub around in the user environment's stack
or instruction stream. The
system call number will go in %eax , and the
arguments (up to five of them) will go in %edx ,
%ecx , %ebx , %edi ,
and %esi , respectively. The kernel passes the
return value back in %eax . The assembly code to
invoke a system call has been written for you, in
syscall() in lib/syscall.c . You
should read through it and make sure you understand what
is going on.
You may also find it helpful to read inc/syscall.h .
Exercise 4.
Add a handler in the kernel
for interrupt number T_SYSCALL .
You will have to edit kern/trapentry.S and
kern/trap.c 's idt_init() . You
also need to change trap() to handle the
system call interrupt by calling syscall()
(defined in kern/syscall.c)
with the appropriate arguments,
and then arranging for
the return value to be passed back to the user environment
in %eax .
Finally, you need to implement syscall() in
kern/syscall.c ; it should dispatch to one of the
sys_ functions defined there.
See inc/syscall.h for system call numbers.
Make sure syscall() returns -E_INVAL
if the system call number is invalid.
You'll only need SYS_cputs , SYS_cgetc ,
SYS_getenvid , and SYS_env_destroy for now,
but might as well add stubs for them all.
Run the hello program under your kernel.
It should print "hello, world " on the console
and then cause a page fault in user mode.
If this does not happen, it probably means
your system call handler isn't quite right.
If you the kernel doesn't appear to be receiving a system call interrupt,
check your call to SETGATE : are the privileges right?
|
Challenge!
Implement system calls using the sysenter and
sysexit instructions instead of using
int $48 and iret .
The sysenter/sysexit instructions were designed
by Intel to be faster than int/iret . They do
this by using registers instead of the stack and by making
assumptions about how the segmentation registers are used.
The exact details of these instructions can be found in Volume
2B of the Intel reference manuals.
The easiest way to add support for these instructions in JOS
is to add a sysenter_handler in
kern/trapentry.S that creates the same trap frame
that is normally created by an int $48
instruction (being sure to save the correct return address and
stack pointer provided by the user environment). Then,
instead of calling into trap , push the arguments
to syscall and call syscall
directly. Once syscall returns, set everything
up for and execute the sysexit instruction.
You will also need to add code to kern/init.c to
set up the necessary model specific registers (MSRs). Look at
the enable_sep_cpu function in this diff for an
example of this, and you can find an implementation of
wrmsr to add to /inc/x86.h here).
Finally, lib/syscall.c must be changed to support
making a system call with sysenter . Here is a
possible register layout for the sysenter
instruction:
eax - syscall number
edx, ecx, ebx, edi - arg1, arg2, arg3, arg4
esi - return pc
ebp - return esp
esp - trashed by sysenter
GCC's inline assembler does not support directly loading
values into ebp , so you will need to add code to
save (push) and restore (pop) it yourself (and you may want to
do the same thing for esi as well). The return
address can be put into esi by using an
instruction like leal after_sysenter_label,
%esi .
Note that this only supports 4 arguments, so you will need to
leave the old method of doing system calls around
to support 5 argument system calls as well.
|
User-mode Environment Setup
Now, you'll fix the user-mode page fault in
user/hello.c .
JOS is designed to export as much kernel information (physical names) to
user programs as possible. In particular, JOS programs expect to
be able to see how many physical pages are free, and the state of
every other environment in the system. (Question: Is this an
information leak?) Rather than providing system calls for
environments to extract the information, JOS simply maps read-only
copies of the pages[] and envs[] arrays into every environment's
address space.
Exercise 5. Edit mem_init to set up
mappings for the UPAGES address range, which should
map to a read-only version of pages[] and
UENVS , which should map to a read-only version of
envs[] .
|
Why the crash, even after this exercise? The umain
function tries to
access env->env_id .
The JOS library OS is supposed to set the global pointer env
to point at the current environment's struct Env , in the
read-only copy of the envs[] array you allocated in Part 1.
This global pointer lets the environment efficiently access its state.
But currently the pointer is just null.
Exercise 6. JOS user programs start running at the top of
lib/entry.S . Trace through, find the point where
env should be set, and set it. Note that
lib/entry.S has already defined envs to
point at the UENVS mapping you set up in lab 2. Hint:
You'll want to use a system call.
This is the first point in the lab where you test the user-level
read-only mapping of envs[] at UENVS , so
you may want to check your code from Part 1 if you have problems
here. And don't forget that envid_t s aren't just
linear indexes into the envs[] array!
|
At this point, user/hello should print "hello,
world ", then "i am environment 00001000 ". It
then attempts to "exit" by calling sys_env_destroy() (see
lib/libmain.c and lib/exit.c). Since the kernel
currently only supports one user environment, it should report that
it has destroyed all environments and then drop into the "idle loop",
which for JOS is just the kernel monitor.
Page faults and memory protection
In this section of the lab, you'll begin refining JOS's response to
user-level page fault exceptions, which happen when an application tries to
access an invalid address or an address for which it has no permissions.
Memory protection is a crucial operating system feature, since it can help
the OS ensure that bugs in one program cannot corrupt other programs or the
operating system itself.
On an invalid access, the processor stops the program at the instruction
causing the fault and then traps into the kernel with information about the
attempted operation. If the fault is fixable, the kernel can fix it and
let the program continue running. If the fault is not fixable, then the
program cannot continue, since it will never get past the instruction
causing the fault.
As an example of a fixable fault, consider an automatically extended stack.
In many systems the kernel initially allocates a single stack page, and then
if a program faults accessing pages further down the stack, the kernel
will allocate those pages automatically and let the program continue.
By doing this, the kernel only allocates as much stack memory as
the program needs, but the program can work under the illusion that it
has an arbitrarily large stack.
System calls present an interesting problem for memory protection.
Most system call interfaces let user programs pass pointers to the
kernel. These pointers point at user buffers to be read or written.
The kernel then dereferences these pointers
while carrying out the system call.
There are two problems with this:
- A page fault in the kernel
is taken a lot more seriously than a page fault in a user program.
If the kernel page faults, that's usually a kernel bug, and the
fault handler will panic the kernel
(and hence the whole system).
In a system call,
when the kernel is dereferencing pointers to the user's address space,
we need a way to prevent or catch any page faults these dereferences cause.
- The kernel typically has more memory permissions than the user program.
The user program might ask the kernel to read from or write to a
location in kernel memory that the user program cannot access but that
the kernel can.
If the kernel is not careful,
a buggy or malicious user program can trick the kernel
into using its greater privilege in unintended ways,
possibly so as to destroy the integrity of the kernel completely.
This second danger is one instance of a classic security problem
known as the "confused deputy" problem.
The kernel is acting as a trusted "deputy",
which has the special privileges necessary
to implement important services needed by untrusted users --
but if users can confuse the kernel into using those special privileges
in unintended ways, the security model breaks down.
For both of these reasons the kernel must be extremely careful when
handling pointers presented by user programs.
You will now implement solutions to these two problems by writing a
function, user_mem_check , that checks that the memory
addresses a user specified are OK for that user to access. Then, anywhere
a user pointer appears, your kernel will call user_mem_assert
to check the pointer. (User_mem_assert calls
user_mem_check .) If there is any problem with the pointer,
user_mem_assert will destroy the corresponding user
environment.
Exercise 7.
Implement kern/pmap.c 's user_mem_check
function. Check that the supplied range is valid user memory
(i.e., below ULIM ), and that the user has the necessary permission
throughout the range. Make sure you set the
user_mem_check_addr variable to the faulting address,
if there is a fault. |
Exercise 8.
Change sys_cputs in kern/syscall.c to
correctly check the user's supplied pointer before using it.
user_mem_assert may be useful.
Change kern/init.c to run user/buggyhello
instead of user/hello . This code dereferences an
almost-null pointer, causing a segmentation fault.
When you compile your kernel and boot it,
the environment should be destroyed,
and the kernel should not panic.
You should see:
[00000000] new env 00001000
[00001000] user_mem_check va 00000001
[00001000] free env 00001000
Idle loop - nothing more to do!
(The user_mem_check va may differ slightly, but it should be on the same page.)
Now change kern/init.c to run user/evilhello .
This code tries to be a bit sneakier and print the contents of valid
kernel memory, rather than random unmapped memory.
Still, when you compile your kernel and boot it,
the kernel should not panic;
you should see:
[00000000] new env 00001000
[00001000] user_mem_check va f0100020
[00001000] free env 00001000
|
Exercise 9.
Update page_fault_handler in kern/trap.c
so that kernel-mode page faults call panic (as
described above).
|
Question:
Would it be harder to implement a safe cputs system
call that took a null-terminated string, instead of a string and a length?
Why or why not?
|
User-Level Debugging Information
Like the JOS kernel, JOS user-level programs have debugging information
linked in and ready to go. However, this information is a bit harder to
get to. The kernel has ready-made __STAB_BEGIN__ ,
__STAB_END__ , __STABSTR_BEGIN__ , and
__STABSTR_END__ symbols telling it where to find the STABS and
string table. In user-level applications, the linker script
constructs a small structure containing these values that will be loaded
at address USTABDATA (or 0x200000).
The kernel must load that
structure to find the tables, then look in the tables themselves.
The USTABDATA values are user pointers, of course, so they
must be checked!
Exercise 10.
Change debuginfo_eip in kern/kdebug.c to
correctly check user-level pointers before accessing them.
Also, update your mon_backtrace in
kern/monitor.c to behave better for user-level
applications. This requires two changes. When producing a
backtrace for a trapframe (tf != NULL ), you should
first print a symbolic backtrace line corresponding to
tf->tf_eip . Second, during the backtrace, validate
any user-level pointers you dereference, and print ?
signs or break out of the backtrace rather than dereferencing an
invalid pointer.
If you run the user/breakpoint.c program, then type
backtrace at the monitor prompt, you
should see a backtrace like this:
Stack backtrace:
user/breakpoint.c:11: _Z5umainiPPc+47 (0 arg)
0: ebp eeffdfd0 eip 0080007b args 00000000 00000000 eeffdff0 0080004c
lib/libmain.c:42: libmain+3f (2 arg)
1: ebp eeffdff0 eip 00800031 args 00000000 00000000 ? ?
lib/entry.S:48: <unknown>+0 (0 arg)
Note the ? marks after args in the
last frame. This frame is at the very top of the stack, so after
two arguments the addresses go above USTACKTOP and
into unmapped memory. |
Part 3: Creating User Environments and Cooperative Multitasking
Now, you'll implement some new JOS kernel system calls
to allow user-level environments to create
additional new environments.
You will also implement cooperative round-robin scheduling,
allowing the kernel to switch from one environment to another
when the current environment voluntarily relinquishes the CPU (or exits).
In the next lab you'll implement preemptive scheduling,
which allows the kernel to re-take control of the CPU from an environment
even if the environment does not cooperate.
Round-Robin Scheduling
Your first task in this lab is to change the JOS kernel
so that it does not always just run the environment in envs[0],
but instead can alternate between multiple environments
in "round-robin" fashion.
Round-robin scheduling in JOS works as follows:
- The function sched_yield() in the new kern/sched.c
is responsible for selecting a new environment to run.
It searches sequentially through the envs[] array
in circular fashion,
starting just after the previously running environment
(or at the beginning of the array
if there was no previously running environment),
picks the first environment it finds
with a status of ENV_RUNNABLE
(see inc/env.h),
and calls env_run() to jump into that environment.
- User environments call the
sys_yield()
system call
to invoke the kernel's sched_yield() function,
and thereby voluntarily give up the CPU to a different environment.
As you can see in user/idle.c,
the idle environment does this routinely.
- If no environments are runnable,
sched_yield() drops
into the idle loop, which in JOS is just the kernel monitor.
Exercise 11.
Implement round-robin scheduling in sched_yield()
as described above. Don't forget to modify
syscall() to dispatch sys_yield().
Modify kern/init.c to create two (or more!) environments
that all run the program user/yield.c.
You should see the environments
switch back and forth between each other
five times before terminating,
at which point the idle loop runs.
If this does not happen or the output looks wrong,
then fix your code before proceeding.
|
Question:
In your implementation of env_run() you should have
called lcr3() .
This loads the %cr3 register, and instantly changes the
addressing context used by the MMU. But virtual addresses, such as
e itself, have meaning relative to a given address context.
Why can the pointer e be dereferenced both before and after
the addressing switch?
|
Challenge!
Add a less trivial scheduling policy to the kernel,
such as a strict priority scheduler that allows each environment
to be assigned a priority
and ensures that higher-priority environments
are always chosen in preference to lower-priority environments.
If you're feeling really adventurous,
try implementing a Unix-style priority-usage scheduler
or even a lottery or stride scheduler.
(Look up "lottery scheduling" and "stride scheduling" in Google.)
Write a test or two
that verifies that your scheduling algorithm is working correctly
(i.e., the right environments get run in the right order).
|
Challenge!
The JOS kernel currently does not allow applications
to use the x86 processor's x87 floating-point unit (FPU),
MMX instructions, or Streaming SIMD Extensions (SSE).
Extend the Env structure
to provide a save area for the processor's floating point state,
and extend the context switching code
to save and restore this state properly
when switching from one environment to another.
The FXSAVE and FXRSTOR instructions may be useful,
but note that these are not in the old i386 user's manual
because they were introduced in more recent processors.
Write a user-level test program
that does something cool with floating-point.
|
System Calls for Environment Creation
Although your kernel is now capable of running and switching between
multiple user-level environments,
it is still limited to running environments
that the kernel initially set up.
You will now implement the necessary JOS system calls
to allow user environments to create and start
other new user environments.
Unix provides the fork() system call
as its process creation primitive.
Unix fork() copies
the entire address space of the calling process (the parent)
to create a new process (the child).
The only differences between the two observable from user space
are their process IDs and parent process IDs
(as returned by getpid and getppid ).
In the parent,
fork() returns the child's process ID,
while in the child, fork() returns 0.
By default, each process gets its own private address space, and
neither process's modifications to memory are visible to the other.
You will provide a different, much more primitive
set of JOS system calls
for creating new user-mode environments.
With these system calls you will be able to implement
a Unix-like fork() entirely in user space,
in addition to other styles of environment creation.
The new system calls you will write for JOS are as follows:
- sys_exofork
- This system call creates a new environment with an almost blank slate:
nothing is mapped in the user portion of its address space,
and it is not runnable.
The new environment will have the same register state as the
parent environment at the time of the
sys_exofork call.
In the parent, sys_exofork
will return the envid_t of the newly created
environment
(or a negative error code if the environment allocation failed).
In the child, however, it will return 0.
(Since the child starts out marked as not runnable,
sys_exofork will not actually return in the child
until the parent has explicitly allowed this
by marking the child runnable using....)
- sys_env_set_status
- Sets the status of a specified environment
to ENV_RUNNABLE or ENV_NOT_RUNNABLE.
This system call is typically used
to mark a new environment ready to run,
once its address space and register state
has been fully initialized.
- sys_page_alloc
- Allocates a page of physical memory
and maps it at a given virtual address
in a given environment's address space.
- sys_page_map
- Copy a page mapping (not the contents of a page!)
from one environment's address space to another,
leaving a memory sharing arrangement in place
so that the new and the old mappings can both be used
to access the same page of physical memory.
- sys_page_unmap
- Unmap a page mapped at a given virtual address
in a given environment.
In any of the system calls that accept environment IDs,
the JOS kernel supports the convention
that a value of 0 means "the current environment."
This convention is implemented by envid2env()
in kern/env.c.
We have provided a very primitive implementation
of a Unix-like fork()
in the test program user/dumbfork.c.
This test program uses the above system calls
to create and run a child environment
with a copy of its own address space.
The two environments
then switch back and forth using sys_yield
as in the previous exercise.
The parent exits after 10 iterations,
whereas the child exits after 20.
Exercise 12.
Implement the system calls described above
in kern/syscall.c.
You will need to use various functions
in kern/pmap.c and kern/env.c,
particularly envid2env().
Whenever you call envid2env(),
pass 1 in the checkperm parameter
to check permissions.
Be sure you check for any invalid system call arguments,
returning -E_INVAL in that case.
Test your JOS kernel with user/dumbfork
and make sure it works before proceeding.
|
Challenge!
Add the additional system calls necessary
to read all of the vital state of an existing environment
as well as set it up.
Then implement a user mode program that forks off a child environment,
runs it for a while (e.g., a few iterations of sys_yield()),
then takes a complete snaphost or checkpoint
of the child environment,
runs the child for a while longer,
and finally restores the child environment to the state it was in
at the checkpoint
and continues it from there.
Thus, you are effectively "replaying"
the execution of the child environment from an intermediate state.
Make the child environment perform some interaction with the user
using sys_cgetc() or readline()
so that the user can view and mutate its internal state,
and verify that with your checkpoint/restart functionality
you can give the child environment a case of selective amnesia,
making it "forget" everthing that happened beyond a certain point.
|
This completes the lab.
Back to CS 235 Advanced Operating Systems,
Winter 2008
|