- quiz review -- Labs 1 & 2 -- all lectures up to, but not including, this one -- will have copies of papers, lab handouts -- will need to share - what have we covered? -- What is an OS? ... Makes hardware useful to the application programmer ... Provides abstractions and protection for applications ... Makes it easier to write applications; reduces impact of bugs; facilitates sharing -- Fundamental OS choices ... What functionality is provided? ... What programming failures or mistakes are tolerated, and how robustly? -- Operating systems at large ... Unix system call interface ... CTSS, Multics computing utility ... Plan 9 file systerm is everything ... Exokernel design principles: separate protection from management securely expose the hardware How is JOS an exokernel? -- Memory management ... Appel & Li: VM tricks garbage collection shared virtual memory persistent stores extending addressability (64-bit pointers on disk, 32 in memory) heap overflow data compression paging ... x86 addressing segmentation paging virtual, linear, physical addresses ... JOS addressing +------------------+ <- 0xFFFFFFFF (4GB) | 32-bit | | memory mapped | | devices | | | /\/\/\/\/\/\/\/\/\/\ /\/\/\/\/\/\/\/\/\/\ | | | Unused | | | +------------------+ <- depends on amount of RAM | | | | | Extended Memory | | | | | +------------------+ <- 0x00100000 (1MB) | BIOS ROM | +------------------+ <- 0x000F0000 (960KB) | 16-bit devices, | | expansion ROMs | +------------------+ <- 0x000C0000 (768KB) | VGA Display | +------------------+ <- 0x000A0000 (640KB) | | | Low Memory | | | +------------------+ <- 0x00000000 Virtual memory map: Permissions kernel/user 4 Gig --------> +------------------------------+ | | RW/-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ : . : : . : : . : |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| RW/-- | | RW/-- | Remapped Physical Memory | RW/-- | | RW/-- KERNBASE -----> +------------------------------+ 0xf0000000 | Cur. Page Table (Kern. RW) | RW/-- PTSIZE VPT,KSTACKTOP--> +------------------------------+ 0xefc00000 --+ | Kernel Stack | RW/-- KSTKSIZE | | - - - - - - - - - - - - - - -| PTSIZE | Invalid memory | --/-- | ULIM ------> +------------------------------+ 0xef800000 --+ | Cur. Page Table (User R-) | R-/R- PTSIZE UVPT ----> +------------------------------+ 0xef400000 | RO PAGES | R-/R- PTSIZE UPAGES ----> +------------------------------+ 0xef000000 | RO ENVS | R-/R- PTSIZE UTOP,UENVS -------> +------------------------------+ 0xeec00000 UXSTACKTOP -/ | User Exception Stack | RW/RW PGSIZE +------------------------------+ 0xeebff000 | Invalid Memory | --/-- PGSIZE USTACKTOP ----> +------------------------------+ 0xeebfe000 | Normal User Stack | RW/RW PGSIZE +------------------------------+ 0xeebfd000 | | | | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ . . . . . . |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| | | UTEXT -------> +------------------------------+ 0x00800000 | | 2*PTSIZE 0 ------------> +------------------------------+ types ... virtualization Disco: OS changes VMware: no OS changes ballooning, content-based page sharing, idle memory tax **** SCHEDULING **** - Conflicting goals -- High throughput ... Minimize waste ... Waste => time spent not achieving application goals Bookkeeping time If you do some work you better make it worthwhile! Minimize context switches: CPU, TLB, cache, page faults -- Low latency ... Run something as soon as it's ready ... Allocate finite resource among demanding principals - Scheduling fundamentals -- Scheduling based on a time quantum ... Provided by a timer interrupt ... You'll see: Lab 4 -- Applications run for some number of quanta, or until they give up the CPU voluntarily -- How to set the quantum? ... Long => high throughput ... Short => low latency ... BSD: 1/10s since ~1980 Longest "human-visible" latency But a very long time scale for certain programmatic events No user-level packet handling, that's for sure! ... Linux: 1/100s on x86, 1/1024s on some alphas, etc. -- Every quantum, OS decides what to run next ... Also if one application becomes non-runnable ... In practice, a bunch of heuristics that are not necessarily well understood "Even popular priority-based approaches such as delay-usage scheduling are poorly understood, despite the fact that they are employed by numerous operating systems, including Unix." [Stride Scheduling..., Waldsburger et al.] ... Can we do better? - Dimensions to the scheduling question -- Parameters ... What parameters are used to determine how a process is scheduled? ... Examples: priority level, fraction of CPU, sets of deadlines -- Fairness ... Do two processes with the same objective parameter settings achieve the same fraction of the CPU? ... Take 2 processes, A and B. Run them in the order ABABABABAB... Is this fair? ... Maybe not! What if A runs only 50% of its quantum and then goes to sleep (waking up before B finishes)? A will get 50% of B's CPU! This is not generally considered fair; fairness is measured in terms of actual time spent running. -- Progress/Liveness ... Can any one process be starved by others? ... Strict priority scheduling: Yes! Round-robin: No! Unix: Yes! -- Admission control ... Can the OS provide hard guarantees about whether a new process's requirements can be met? ... For example, provide a deadline by which an application MUST have run for at least one quantum. ... This is "real-time scheduling". ... Soft real time: miss deadline and CD will skip ... Hard real time: miss deadline and plane will crash -- Cooperation ... Can related processes run together (to reduce cache thrash)? ... Can one process "donate" the rest of its allocation to another? Example: IPC A sends message to B, blocks on reply Want B to run at A's "priority"! Compare these orders: ABDABDABD, ADBADBADB, AAABBBDDD, ... All equally fair, first minimizes A->B latency -- Multiprocessor issues ... Affinity scheduling: try to keep a thread on the same processor (interprocessor cache thrashing very expensive) -- Computational complexity ... O(n)? O(1)? O(log n)? ... Round robin: O(1)! - Delay-usage scheduling: The Unix base -- Kernel contains a number of priority-marked run queues -- Kernel round-robins among processes on the highest-priority run queue -- Priorities recomputed dynamically ... Two factors: niceness and estimated CPU usage ... Favor interactive, short-running processes: CPU usage is low -- BSD model ... p_nice: niceness -- lower numbers run more frequently ... p_estcpu: estimated CPU usage Incremented by timer interrupt Decayed every second when process runnable: p_estcpu := (2*load / (2*load + 1)) * p_estcpu If load = 1, p_estcpu decreases by 2/3 If load ~= 0, p_estcpu decreases to 0 ... Run queue is p_usrpri/4, where p_usrpri := min(50 + 0.25*p_estcpu + 2*p_nice, 127) If p_estcpu = 100 (10ms timer interrupts), and load ~= 1, p_estcpu ..> 300 and p_userpri ..> 125 If p_estcpu = 0 (no running), p_userpri ..> 50 Lower p_userpri = higher priority ... Sleeping increases priority p_slptime keeps track of sleep time When process wakes, p_estcpu *= (2*load / (2*load + 1))**p_slptime Longer slptime, less p_estcpu ... What's the scheduling complexity? Depends Need not be more than O(1) - Lottery scheduling -- Issue [uniformly-distributed] lottery tickets t_i to processes -- Set T = total number of tickets = \sum t_i -- Chance of winning the next quantum is t_i/T -- "Microeconomic" model: you can give your tickets to others! ... Models cooperation -- What if a process uses a fraction of its quantum? ... Don't want to be penalized for being generous! ... Should get to run more frequently ... Compensation tickets: If P uses f of quantum, inflate its tickets by 1/f until it next runs -- Problem: random choice, expected error can grow large (O(\sqrt n_a) for n_a allocations), unpredictable latencies, unnecessarily many context switches - Stride scheduling -- Make lottery scheduling deterministic -- Take the reciprocal of #tickets: "stride" -- Increment "pass" by "stride" when a process runs Pass grows quickly --> stride large --> tickets small Pass grows slowly --> stride small --> tickets large -- Run the process with the minimum pass on each quantum! -- int tickets, stride, pass; -- const int STRIDE1 = (1 << 20); /* large number represents "1" */ -- Client::init(int tickets) { this->_tickets = tickets; this->_stride = STRIDE1 / tickets; this->_pass = this->_stride; } -- Client::run() { /* this is the element with the lowest _pass */ run_this(); this->_pass += this->_stride; reinsert; } -- Can handle dynamic ticket changes by scaling -- Can handle sleeping/waking up ... When a process wakes up, set its pass to the current minimum pass ... When a process goes to sleep partway through its allocation, don't increment by the whole stride -- instead just the fraction proportional to how much of the quantum it used -- Complexity: O(lg n) ... Need a heap to implement the stride list - BVT -- Begin with an algorithm a lot like stride scheduling ... Actual Virtual Time == pass, weight == tickets, mcu advance == stride -- Add latency sensitivity with a TIME WARP ... Subtract W_i from pass_i ... Borrow time from your future executions to run immediately now ... Wake up from sleep: set pass_i := global_pass; this would not necessarily let you run right away, but subtract warp, and you do! ... More warp => lower apparent _pass => runs more frequently! -- Long-term fairness guaranteed despite warp But don't want too crazy warps or it will be very long term! -- Additional parameters ... Warp time limit L_i: detects amok real-time processes, unwarps them by setting W_i := 0 ... Unwarp time requirement U_i: amount of time before process may warp again -- Figure 3, Figure 4 -- We will revisit next lecture