Readings summary

Your work

Three papers used 9 or more unique adverbs (words ending in “-ly”). What are the adverb-heavy papers?
- “Ceph”, “CoralCDN”, “OmniLedger”
Which is which?
- amusingly, anonymously, durably, hourly, imperfectly, liberally, meaningfully, preferably, stably, strikingly, suddenly
  - “CoralCDN”
- adequately, economically, experimentally, mildly, pessimistically, redundantly, seriously, transitively, unfairly
  - “OmniLedger”
- definitively, exceedingly, gladly, incompletely, maximally, ordinarily, overwhelmingly, sensibly, singly, unduly, variably
  - “Ceph”
Six papers have no unique adverbs, including two really important ones. What are they?
- “It’s the Critical Path!”, “Distributed Aggregation…”, “MapReduce”, “Mesos”, “Finding Bugs in Testing”, “Paxos”
- Jeff Dean may not like adverbs! TensorFlow only has one unique adverb (“enormously”).
The Bitcoin paper has one unique adverb. What is it?
- “digitally”

I computed the most popular word in each paper, excluding common words like “the”.
Can you guess the association?

The author listed the most papers we read was on 7 papers. Who were they?
- Michael Isard, MSR/Google
- “Quincy”, “DryadLINQ”, “Distributed Aggregation”, “COST!”, “Timely Dataflow”
- AND “TensorFlow”, “Dynamic TensorFlow”
The next-most-cited author was on 6 papers. Who were they?
- Ion Stoica, Berkeley
- “Mesos”, “Spark”, “X-Trace”, “GraphX”, “Ray”, “RLlib”

Two authors were both listed on two papers published 14 years apart. Who were they?
- Jeff Dean and Sanjay Ghemawat, Google
- “MapReduce” → “Distributed TensorFlow”

Lazy vs. eager execution
- Both have advantages, many systems evolve toward supporting both
Optimization of dataflow execution graphs
- e.g., the evolution of Spark
Usability, performance, and supportability
- Motivation of MapReduce, Spark, TensorFlow, PyTorch
- Motivation of MPI, timely dataflow, Parameter Server
- Different supportability approaches behind X-Trace, pivot tracing, formal methods
Persistence of bit twiddling
- Machine words in MPI and p4 through shared memory and serialization in Spark through tensor representations in Dask, PyTorch, TensorFlow
- Cache memory affecting scheduling policies in Dask
- Everything in TritonSort 😉
New hardware and new problems need new systems
- Let’s write them