Note the use of “system entry” for system call.
“However, the structure of files is controlled by the programs that use them, not by the system.” [p2, 1]
This is a bigger deal than it might appear at first. In many early systems, the system (or system library) strictly divided files into typed categories. Text files, for example, were not always seekable; seek was allowed only in a file of binary records. It’s a great advantage of Unix that its file system is flexible enough to efficiently support a wide range of file use cases—small, large; text, binary; etc. (Note that some of the prior ugliness even made its way into language definitions: consider Pascal’s “file of RECORD”.)
“If arbitrary links to directories were permitted, it would be quite difficult to detect when the last connection from the root to a directory was severed.” [p3, 1]
Question: Why? What type of problem is this?
Contrast the mkdir system call described here with the one you’re familiar with. (We did this in class.)
“Because anyone may set the set-user-ID bit on one of his own files, this mechanism is generally available without administrative intervention. For example, this protection scheme easily solves the MOO accounting problem posed by ‘Aleph-null.’” [p4, 1]
The reference to “Aleph-null” is totally worth chasing. [2]
“Each call to the system may potentially result in an error return, which for simplicity is not represented in the calling sequence.” [p4, 1]
The shame starts early! (It’s a cliché that Unix users rarely check system call return values for errors, and therefore rarely write correct programs.)
See the discussion around “Worse is Better”, a famous article by Richard P. Gabriel.
“To create a new file or completely rewrite an old one, there is a create system call that creates the given file if it does not exist…” [p5, 1]
Actually, no; the system call is called creat, with
no final e. “Ken Thompson was once asked what he would do
differently if he were redesigning the UNIX system. His reply:
‘I’d spell creat
with an e.’” (Kernighan & Pike, The UNIX
Programming Environment p.204)
“The file system maintains no locks visible to the user…. Although it is possible for the contents of a file to become scrambled when two users write on it simultaneously, in practice difficulties do not arise. We take the view that locks are neither necessary nor sufficient, in our environment, to prevent interference between users of the same file….” [p5, 1]
They protest too much here, and modern Unix does support advisory file locks. Why? (And what does “advisory” mean?)
“For each open file there is a pointer, maintained inside the system, that indicates the next byte to be read or written.” [p5, 1]
Why inside the system (kernel)? Seems like something they’d keep outside the kernel for simplicity—and, in fact, in initial versions, the pointer was outside the kernel. “Evolution” discusses how it moved into the kernel.
“The space on all disks that contain a file system is divided into a number of 512-byte blocks…” [p6, 1]
Sectors are still 512 bytes! But modern file systems have much bigger blocks.
“The system recognizes when a program has made accesses to sequential blocks of a file, and asynchronously pre-reads the next block. This significantly reduces the running time of most programs while adding little to system overhead.” [p7, 1]
Readahead and a buffer cache were already implemented in 1970! (And earlier, most likely.)
Question: It is extremely unlikely that readahead was implemented on the first, PDP–7 version of Unix. Why not?
“A program that reads or writes files in units of 512 bytes has an advantage over a program that reads or writes a single byte at a time, but the gain is not immense; it comes mainly from the avoidance of system overhead.” [p7, 1]
Question: What system overhead?
“To the system itself, one of [the strengths of the i-list] is the fact that each file has a short, unambiguous name related in a simple way to the protection, addressing, and other information needed to access the file. It also permits a quite simple and rapid algorithm for checking the consistency of a file system, for example, verification that the portions of each device containing useful information and those free to be allocated are disjoint and together exhaust the space on the device. This algorithm is independent of the directory hierarchy….” [p7, 1]
Do you agree? Can a directory-independent algorithm fully check the consistency of a file system?
Does the Unix described in these papers support symbolic links?
“The program text segment begins at location 0 in the virtual address space.”
No modern faulting null pointers in v6 Unix! (Or, at least, a null pointer would have to have a non-zero representation.)
Question: How does the pipe described in Section 5.2 [1] differ
from a modern Unix pipe (which you can learn about with man 2 pipe
)?
“Actually it would be surprising, and in fact unwise for efficiency reasons, to expect authors of commands such as ls to provide such a wide variety of output options.” [p10, 1]
Have modern command-line programs like ls followed this advice? (No!) So was it good advice? What “efficiency reasons” do the authors mean?
Note how up-to-date the shell seems.
“[Opening a file descriptor with a specific number] is easy because, by agreement, the smallest unused file descriptor is assigned when a new file is opened (or created); it is only necessary to close file 0 (or 1) and open the named file.” [p11, 1]
Can you come up with a simpler interface for opening a specifically-numbered file descriptor? There would be reasons to prefer a less implicit interface; in particular, the “smallest-unused” property causes lock contention on the file descriptor table in modern multiprocessor OSes—even if, as is frequently the case, the opening application doesn’t care about the file descriptor number!
“Curiously, the primitives that became wait were considerably more general than the present scheme. …” [p6, 3]
smes and rmes are worth pondering! Why did these seemingly useful primitives fail for their intended purpose? Why, perhaps, could Ritchie say “I can recall no other use of messages”? Would we use the primitives today? If not, how could the smes/rmes design be made more useful?
“In the midst of our jubilation, we discovered that the chdir…command had stopped working. There was much reading of code and anxious introspection about how the addition of fork could have broken the chdir call. Finally the truth dawned: in the old system chdir was an ordinary command; it adjusted the current directory of the (unique) process attached to the terminal. Under the new system, the chdir command correctly changed the current directory of the process created to execute it, but this process promptly terminated and had no effect whatsoever on its parent shell!” [p7, 3]
Classic “D’oh”.
Is the iocall
notation for redirection [p8, 3] not truly ugly—so
ugly to make usage unlikely? As Ritchie says later, “The mental
leap needed to see this possibility and to invent the notation is
large indeed.” [p11, 3] Notation and other interface details
really matter!
“Pipes appeared…at the suggestion (or perhaps insistence) of M.D. McIlroy, a long-time advocate of the non-hierarchical control flow that characterizes coroutines.” [p10, 3]
Question: What is hierarchical control flow? (Answer: Functions, since the called function completely fits within the execution of the calling function.) Computer scientists often overuse hierarchy, in my opinion; it’s interesting to think of pipes as purposefully non-hierarchical.
Dennis Ritchie has a lovely remembrance of Doug McIlroy’s insistence on pipes dating from 1964(!). “Point 1’s garden hose connection analogy…is the one that ultimately whacked us on the head to best effect.”
One of the great things about the Evolution paper is the excitement in discovery it communicates, particularly relating to process control and pipes. Sense the pride behind “The new facility was enthusiastically received”.
“In practice [that the PDP–7 I/O unit was a word, not a byte] meant merely that all programs dealing with character streams ignored null characters, because null was used to pad a file to an even number of characters.” [p3, 3]
“Thus a further file system convention was required: each directory had to contain an entry tty for a special file that referred to the terminal of the process that opened it. If by accident one changed into some directory that lacked this entry, the shell would loop hopelessly; about the only remedy was to reboot. (Sometimes the missing link could be made from the other terminal.)” [p6, 3]
“Lastly, exit(status)
terminates a process, destroys its image,
closes its open files, and generally obliterates it.” [1]
“Perhaps paradoxically, the success of the UNIX system is largely due to the fact that it was not designed to meet any predefined objectives.” [1]
“Our goals throughout the effort, when articulated at all, have always been to build a comfortable relationship with the machine and to explore ideas and inventions in operating systems and other software. We have not been faced with the need to satisfy someone else’s requirements, and for this freedom we are grateful.” [1]
“This may be a thinly disguised version of the ‘salvation through suffering’ philosophy, but in our case it worked.” [1]
“The undertaking was more ambitious than it might seem; because we disdained all existing software, ….” [3]
“What a failure of imagination!” [3]
“The reader will not, on the average, go far wrong if he reads each occurrence of ‘we’ with unclear antecedent as ‘Thompson, with some assistance from me.’” [3]
D. M. Ritchie and K. Thompson, “The UNIX Time-Sharing System”, 1974; revised from Communications of the ACM 17(7), July 1974, pp.365–375
“Aleph-null”, “Computer Recreations”, Software—Practice and Experience 1(2), April–June 1971, pp201–204, http://onlinelibrary.wiley.com/doi/10.1002/spe.v1:2/issuetoc
Dennis M. Ritchie, “The Evolution of the Unix Time-sharing System”, AT&T Bell Laboratories Technical Journal 63(6), Part 2, Oct. 1984, pp.1577–93