Masstree
- B-family tree for key-value storage
- A trie of B– trees
- Each layer in the trie is a B– tree that resolves one
8-byte slice of the key
- Goal: chase/understand performance limits
- github.com/kohler/masstree-beta
Coding patterns
- NO READ LOCKS EVER FOR ANY REASON EVER NO NO NO
- Requires garbage collection (read-copy update)
- Read validation (hand-over-hand during traversal)
- Write locking (latching): just as cheap as compare-and-swap
(hand-over-hand during traversal)
Why B-trees?
- Delicious fat prefetchable nodes
- Not too big or prefetching’s counterproductive
- We chose 4 lines, 15-way fanout
Why Masstree?
- 8-byte key slices are quick to compare
- # byte comparisons for most operations
same as or better than B-trees
- Simplify internal nodes: only compare key slices
(all nodes with same slice are in same leaf)
Factor analysis
Insert was hard
- Temptation: lock leaf when you find it
- But leaf might point to a lower layer
- Splits in lower layer require adjusting upper layer (we do this lazily)
Remove was hard
- We clean up empty leaves, “twigs,” and empty layers
- Fun but so hard
- Problem: can’t lock out concurrent readers
- When can you recycle the memory?
- What must a reader do if they end up in the wrong place?
Silo: transactional Masstree
- Durable serial transactions
- With no centralized contention points
- Optimistic concurrency control


Is a single compare-and-swap expensive?


Are overwrites useful?
What about partitioning?
