ChickadeeFS, or chkfs for short, is the file system design you’ll implement in problem set 4. This page describes it.
Structure
A chkfs is comprised of 4096-byte blocks. (Thus, each chkfs block corresponds to a memory page.) A chkfs can contain up to nine kinds of block:
- The superblock.
- Swap blocks.
- Free-block-bitmap blocks.
- Inode blocks.
- Indirect-extent blocks.
- Data blocks.
- Directory blocks.
- Journal blocks. (The journal is described separately.)
Of these, only data blocks contain file data (the other kinds contain file metadata, such as file size information and directory listings). User programs can directly read and write data blocks, and can read (but not directly write) directory blocks. All other kinds of block are read and modified only by special-purpose system calls.
The chkfs groups the block types into regions, which appear in a fixed order.
- First comes exactly one superblock, in block 0.
- Then, zero or more swap blocks (zero in the handout).
- Then, one or more free-block-bitmap blocks (one in the handout).
- Then, one or more inode blocks (14 in the handout).
- Then, one or more blocks in the data region, which comprises data, indirect-extent, directory, and free blocks (~32000 in the handout).
- Finally, zero or more blocks in the journal region (128 in the handout).
Chickadeefs structures are declared in chickadeefs.hh
. The
structures are part of a namespace chkfs
, so, for example, you refer
to an inode structure using the notation chkfs::inode
. It may be
useful to add using namespace chkfs;
to a function definition or
scope. This brings all the chkfs
namespace definitions into scope so
you can leave off the chkfs::
prefix.
Superblock
The superblock contains overall file system information, such as the total
number of blocks and the location of the first inode block. It is stored in
disk block 0, chkfs::superblock_offset
bytes into the block (this equals
512, but you should use the symbolic constant). This is because the
first 512 bytes of the disk are reserved for the boot sector.
struct superblock {
uint64_t magic; // must equal `chkfs::magic`
blocknum_t nblocks; // # blocks in file system
blocknum_t nswap; // # blocks in swap space (unused)
inum_t ninodes; // # inodes in file system
blocknum_t njournal; // # journal blocks in file system
blocknum_t swap_bn; // first swap space block
blocknum_t fbb_bn; // first free block bitmap block
blocknum_t inode_bn; // first inode block
blocknum_t data_bn; // first data-area block
blocknum_t journal_bn; // first block in journal
};
Superblock components named *_bn
are block numbers. For instance, in our
handout file system, sb.fbb_bn == 1
. This indicates that the first block in
the free block bitmap is block number 1, which covers disk bytes 4096–8191
(and disk sectors 8–15).
The superblock need never be written.
Free block bitmap
The free block bitmap contains one bit per file system block, for a total of
sb.nblocks
bits. The bit for block bn
is 1 iff that block is in the data
area and is free for allocation. A single FBB block can represent up to 32768
blocks (and your code need not support chkfs file systems larger than that).
The bits are stored in little-endian order.
The bitset_view
library type is useful for working with bit vectors such as
the free block bitmap. This function looks up the value of the bit for block
bn
:
bool block_is_free(void* fbb, blocknum_t bn) {
bitset_view fbb_view(reinterpret_cast<uint64_t*>(fbb), chkfs::bitsperblock);
return fbb_view[bn];
}
This function sets the bit to 1:
void mark_block_free(void* fbb, blocknum_t bn) {
bitset_view fbb_view(reinterpret_cast<uint64_t*>(fbb), chkfs::bitsperblock);
fbb_view[bn] = true;
}
Some bit-vector operators are pretty simple to implement on your own; for example:
bool block_is_free(void* fbb, blocknum_t bn) {
uint8_t fbb_bytes = reinterpret_cast<uint8_t*>(fbb);
return (fbb_bytes[bn / 8] & (1 << (bn % 8))) != 0;
}
But bitset_view
offers more complex operations, such as find_lsb(i)
(find
the index of the least-significant 1 bit in the bit vector, starting from
index i
).
Inodes
The inode area is an array of inodes. An inode block contains space for 64
inodes; inode number inum
is located in block number sb.inode_bn + inum/inodesperblock
, at byte offset sizeof(inode) * (inum%inodesperblock)
.
Two inode numbers are special: inode number 0 is always free (it cannot be
used), and inode number 1 is the root directory. [As explained in the discussion
of directories later in this document, inode number 0 must always be free
because Chickadee indicates that a directory entry should be ignored by
giving that directory entry an inum
of 0.]
struct inode {
uint32_t type; // file type (regular, directory, or 0/none)
uint32_t size; // file size
uint32_t nlink; // # hard links to file
uint32_t flags; // flags (currently unused)
std::atomic<uint32_t> mlock; // used in memory
extent direct[ndirect]; // direct extents
extent indirect;
};
struct extent {
blocknum_t first; // first block (0 means none)
uint32_t count; // number of blocks (0 means end)
};
The inode::type
member is either 0, meaning the inode is free, or
type_directory
or type_file
, indicating the inode holds directory data or
file data.
inode::size
is the number of bytes in the file.
inode::nlink
is the number of hard links to the file.
The inode::mlock
is available for in-memory use; it is cleared to 0
whenever an inode is loaded into memory. The current plan is described in
problem set 4.
Extents
A file system extent is a contiguous range of blocks. ChickadeeFS
defines extents using type chkfs::extent
, which is a pair of first
, the
first block number in the extent, and count
, the number of blocks in the
extent. Extents compactly and efficiently represent common file system
layouts, such as contiguously-allocated blocks, though for some operations
they are more complex to work with than FFS-style block trees.
Extents can represent the same file layout in many ways. For example, consider a file comprising these 16 data blocks in order:
24 25 26 27 28 40 41 42 43 44 45 46 30 32 33 34
Equivalent representations of this file include:
(1) 4 extents: {first=24, count=5}; {first=40, count=7}; {first=30, count=1};
{first=32, count=3}.
(2) 7 extents: {first=24, count=3}; {first=27, count=2}; {first=40, count=1};
{first=41, count=6}; {first=30, count=1}; {first=32, count=1};
{first=33, count=2}.
(3) 16 extents: {first=24, count=1}; {first=25, count=1}; {first=26, count=1};
… 11 more single-block extents …;
{first=33, count=1}; {first=34, count=1}.
Each inode has space for up to 4 extents in the inode::direct
array. These
are used in order, and terminated by an extent with count=0
. If more than 4
extents are needed, the 5th and subsequent extents will be stored separate
from the inode, in an indirect extent. This is a contiguous range of data
blocks that contains extents rather than file data. For instance, the
7-extent version of the file above might be represented this way:
inode->direct[0] = {first=24, count=3}
inode->direct[1] = {first=27, count=2}
inode->direct[2] = {first=40, count=1}
inode->direct[3] = {first=41, count=6}
inode->indirect = {first=29, count=1}
disk block 29 contents:
extent[0] (bytes 0-7) = {first=30, count=1}
extent[1] (bytes 8-15) = {first=32, count=1}
extent[2] (bytes 16-23) = {first=33, count=2}
extent[3] (bytes 24-31) = {first=0, count=0} (terminates extent list)
To read a file in order, the file system first reads each direct extent, then (if necessary) loads the indirect extent to find the locations of the remaining data extents.
Inode size
In the common case, inode sizes and block allocations match up. That is, if
the file size is sz
, then the file contains nb = round_up(sz, blocksize) / blocksize
blocks. But it is also OK for the inode’s size to be bigger or
smaller than its allocated blocks. The size can be bigger than the allocation
when some data extent has first=0
and count!=0
. A file with such a “hole”
is called a sparse file. When
read, the missing extent appears to contain zero-valued bytes; when written,
the file system should allocate blocks for file data on the fly. The size can
be smaller than the allocation when allocated extents represent more blocks
than the file size requires; this can be used to preallocate space in case the
file is extended.
In the common case, the nlink
field is zero iff the inode is free
(i.e., type == 0
). However, in some cases, an allocated inode will have zero link
count. This happens if the last link to a file is removed while some program
has a reference to the inode. The inode should be freed and the file data
recycled only after the last reference to the inode is dropped.
Directories
A directory—that is, an inode with type == type_directory
—comprises an array
of direntry
structures. Its size must be a multiple of sizeof(direntry) == 128
.
struct dirent {
inum_t inum; // inode # (0 = none)
char name[maxnamelen + 1]; // file name (null terminated)
};
A directory entry with inum == 0
is ignored, and entries with negative
inum
are illegal. An entry with inum > 0
says that this directory contains
a link to the given file under the specified name. A directory should
not contain duplicate names.