ChickadeeFS

ChickadeeFS, or chkfs for short, is the file system design you’ll implement in problem set 4. This page describes it.

Structure

A chkfs is comprised of 4096-byte blocks. (Thus, each chkfs block corresponds to a memory page.) A chkfs can contain up to nine kinds of block:

  1. The superblock.
  2. Swap blocks.
  3. Free-block-bitmap blocks.
  4. Inode blocks.
  5. Indirect-extent blocks.
  6. Data blocks.
  7. Directory blocks.
  8. Journal blocks. (The journal is described separately.)

Of these, only data blocks contain file data (the other kinds contain file metadata, such as file size information and directory listings). User programs can directly read and write data blocks, and can read (but not directly write) directory blocks. All other kinds of block are read and modified only by special-purpose system calls.

The chkfs groups the block types into regions, which appear in a fixed order.

  1. First comes exactly one superblock, in block 0.
  2. Then, zero or more swap blocks (zero in the handout).
  3. Then, one or more free-block-bitmap blocks (one in the handout).
  4. Then, one or more inode blocks (14 in the handout).
  5. Then, one or more blocks in the data region, which comprises data, indirect-extent, directory, and free blocks (~32000 in the handout).
  6. Finally, zero or more blocks in the journal region (128 in the handout).

Chickadeefs structures are declared in chickadeefs.hh. The structures are part of a namespace chkfs, so, for example, you refer to an inode structure using the notation chkfs::inode. It may be useful to add using namespace chkfs; to a function definition or scope. This brings all the chkfs namespace definitions into scope so you can leave off the chkfs:: prefix.

Superblock

The superblock contains overall file system information, such as the total number of blocks and the location of the first inode block. It is stored in disk block 0, chkfs::superblock_offset bytes into the block (this equals 512, but you should use the symbolic constant). This is because the first 512 bytes of the disk are reserved for the boot sector.

struct superblock {
    uint64_t magic;               // must equal `chkfs::magic`
    blocknum_t nblocks;           // # blocks in file system
    blocknum_t nswap;             // # blocks in swap space (unused)
    inum_t ninodes;               // # inodes in file system
    blocknum_t njournal;          // # journal blocks in file system
    blocknum_t swap_bn;           // first swap space block
    blocknum_t fbb_bn;            // first free block bitmap block
    blocknum_t inode_bn;          // first inode block
    blocknum_t data_bn;           // first data-area block
    blocknum_t journal_bn;        // first block in journal
};

Superblock components named *_bn are block numbers. For instance, in our handout file system, sb.fbb_bn == 1. This indicates that the first block in the free block bitmap is block number 1, which covers disk bytes 4096–8191 (and disk sectors 8–15).

The superblock need never be written.

Free block bitmap

The free block bitmap contains one bit per file system block, for a total of sb.nblocks bits. The bit for block bn is 1 iff that block is in the data area and is free for allocation. A single FBB block can represent up to 32768 blocks (and your code need not support chkfs file systems larger than that). The bits are stored in little-endian order.

The bitset_view library type is useful for working with bit vectors such as the free block bitmap. This function looks up the value of the bit for block bn:

bool block_is_free(void* fbb, blocknum_t bn) {
    bitset_view fbb_view(reinterpret_cast<uint64_t*>(fbb), chkfs::bitsperblock);
    return fbb_view[bn];
}

This function sets the bit to 1:

void mark_block_free(void* fbb, blocknum_t bn) {
    bitset_view fbb_view(reinterpret_cast<uint64_t*>(fbb), chkfs::bitsperblock);
    fbb_view[bn] = true;
}

Some bit-vector operators are pretty simple to implement on your own; for example:

bool block_is_free(void* fbb, blocknum_t bn) {
    uint8_t fbb_bytes = reinterpret_cast<uint8_t*>(fbb);
    return (fbb_bytes[bn / 8] & (1 << (bn % 8))) != 0;
}

But bitset_view offers more complex operations, such as find_lsb(i) (find the index of the least-significant 1 bit in the bit vector, starting from index i).

Inodes

The inode area is an array of inodes. An inode block contains space for 64 inodes; inode number inum is located in block number sb.inode_bn + inum/inodesperblock, at byte offset sizeof(inode) * (inum%inodesperblock).

Two inode numbers are special: inode number 0 is always free (so code can use inode number 0 to represent “no file”), and inode number 1 is the root directory.

struct inode {
    uint32_t type;                // file type (regular, directory, or 0/none)
    uint32_t size;                // file size
    uint32_t nlink;               // # hard links to file
    uint32_t flags;               // flags (currently unused)
    std::atomic<uint32_t> mlock;  // used in memory, 0 when loaded from disk
    uint32_t mbcindex;            // used in memory, 0 when loaded from disk
    extent direct[ndirect];       // extents
    extent indirect;
};

struct extent {
    blocknum_t first;             // first block (0 means none)
    uint32_t count;               // number of blocks (0 means end)
};

The inode::type member is either 0, meaning the inode is free, or type_directory or type_file, indicating the inode holds directory data or file data.

inode::size is the number of bytes in the file.

inode::nlink is the number of hard links to the file.

The inode::mlock and inode::mbcindex fields are meaningful only in memory. They’re provided to make programming easier. The clean_inode_block function initialize these values when a block of inodes is loaded from disk.

Extents

A file system extent is a contiguous range of blocks. ChickadeeFS defines extents using type chkfs::extent, which is a pair of first, the first block number in the extent, and count, the number of blocks in the extent. Extents compactly and efficiently represent common file system layouts, such as contiguously-allocated blocks, though for some operations they are more complex to work with than FFS-style block trees.

Extents can represent the same file layout in many ways. For example, consider a file comprising these 16 data blocks in order:

24 25 26 27 28 40 41 42 43 44 45 46 30 32 33 34

Equivalent representations of this file include:

(1)  4 extents: {first: 24, count: 5}, {first: 40, count: 7}, {first: 30, count: 1},
                {first: 32, count: 3}

(2)  7 extents: {first: 24, count: 3}, {first: 27, count: 2}, {first: 40, count: 1},
                {first: 41, count: 6}, {first: 30, count: 1}, {first: 32, count: 1},
                {first: 33, count: 2}

(3) 16 extents: {first: 24, count: 1}, {first: 25, count: 1}, {first: 26, count: 1},
                {first: 27, count: 6}, {first: 28, count: 1}, {first: 40, count: 1},
                {first: 41, count: 6}, {first: 42, count: 1}, {first: 43, count: 1},
                {first: 44, count: 6}, {first: 45, count: 1}, {first: 46, count: 1},
                {first: 30, count: 6}, {first: 32, count: 1}, {first: 33, count: 1},
                {first: 34, count: 1}

Each inode has space for up to 4 extents in the inode::direct array. These are used in order, and terminated by an extent with count == 0. If more than 4 extents are needed, the 5th and subsequent extents will be stored separate from the inode, in an indirect extent. This is a contiguous range of data blocks that contains extents rather than file data. For instance, the 7-extent version of the file above might be represented this way:

inode->direct[0] = {first: 24, count: 3}
inode->direct[1] = {first: 27, count: 2}
inode->direct[2] = {first: 40, count: 1}
inode->direct[3] = {first: 41, count: 6}
inode->indirect  = {first: 29, count: 1}

disk block 29 contents:
extent[0] (bytes 0-7)   = {first: 30, count: 1}
extent[1] (bytes 8-15)  = {first: 32, count: 1}
extent[2] (bytes 16-23) = {first: 33, count: 2}
extent[3] (bytes 24-31) = {first: 0, count: 0} (terminates extent list)

To read a file in order, the file system first reads each direct extent, then (if necessary) loads the indirect extent to find the locations of the remaining data extents.

Inode size

In the common case, inode sizes and block allocations match up. That is, if the file size is sz, then the file contains nb = round_up(sz, blocksize) / blocksize blocks. But it is also OK for the inode’s size to be bigger or smaller than its allocated blocks. The size can be bigger than the allocation when some data extent has first==0 and count!=0. A file with such a “hole” is called a sparse file. When read, the missing extent appears to contain zero-valued bytes; when written, the file system should allocate blocks for file data on the fly. The size can be smaller than the allocation when allocated extents represent more blocks than the file size requires. This can be used to preallocate space in case the file is extended. (The chickadeefsck checker program warns about this kind of extra allocated space.)

In the common case, an inode with nlink == 0 is also free (has type == 0). However, in some cases, it’s important to preserve an inode with no links. This can happen, for example, if a process first opens a file, and then unlinks it without closing the open file descriptor. In this situation, Unix-like systems don’t recycle the inode’s extents or free the inode itself until the last open reference to the file is closed.

Directories

Directories are represented by inodes with type == type_directory. Like regular files, directories have data stored in extents; but unlike regular files, that data represents a directory structure. The data in a directory file comprises an array of chkfs::dirent structures, and the size of a directory inode must be a multiple of sizeof(chkfs::dirent) == 128.

struct dirent {
    inum_t inum;                  // inode # (0 = none)
    char name[maxnamelen + 1];    // file name (null terminated)
};

Directory entries with inum == 0 are free; they are ignored on read. An entry with inum > 0 says that this directory contains a link to the given file under the specified name. Directories must not contain duplicate names or entries with negative inum.