Your work
Papers
- You have read approximately 44 papers
- Published 1990 through 2020
- Containing ~432,000 words
Word count outliers
- Shortest paper?
- Bitcoin
- Longest paper?
- Quincy
- Word count ratio?
- 1:4.5 (3,573 vs. 16,420 words)
Tag cloud
acm aggregation algorithm applications based block case change client cluster code communication computation control data described design different disk distributed edge efficient example execution experiments figure file framework function general graph group http implementation including input iteration job key large latency learning level local log machine management map memory message model network node number object operations optimization order parallel partitioning per performance processing programming provide query read receive reduce requests requires resources results running scale scheduling section server shared single size spark state storage store support system table tasks trace training transactions types used usenix user value work worker write
created at TagCrowd.com
Unique words
- I computed a list words that each appeared in only one of the papers we read.
- Can you guess the association?
Unique words: Easy
- AmazonDynamoDB
- “How AWS Uses Formal Methods”
- BitcoinNG
- “OmniLedger”
- Clockwork
- “Clockwork”
- Pivot
- “Pivot Tracing”
- Pitchfork
- “Haystack”
- Proposer
- “Paxos Made Simple”
- Specs
- “How AWS Uses Formal Methods”
Unique words: Harder
- Egalitarian
- “Raft”
- Escrow
- “Bitcoin”
- Enormously
- “TensorFlow”
- Evil
- “CoralCDN”
- Exceedingly
- “Ceph”
- Homomorphism
- “DryadLINQ”
- Notation
- “Spark SQL”
- Orphan
- “Haystack”
- Radix
- “TritonSort”
- Triangle
- “GraphX”
Unique words: Nearly impossible
- Afternoon
- “MapReduce”
- Badly
- “F1”
- Elephant
- The original ORNL MPI report
- Draw
- “Haystack”
- Flesh
- “Raft”
- Gender
- “Spark SQL”
- Height
- “Concurrent Contracts”
- Journey
- “Ray”
- Multiplicative
- “A Bridging Model for Parallel Computations”
- Plausible
- “How AWS Uses Formal Methods”
- Profound
- “X-Trace”
- Purple
- “PLink”
- Rectangle
- “Timely Dataflow”
- Scouring
- “Ceph”
- Skin
- ORNL MPI
- Suffering
- “Quincy”
- Thumb
- “TritonSort”
- Ukulele
- “Parameter Server”
Adverbs
- Three papers used 9 or more unique adverbs (words ending in “-ly”). What are
the adverb-heavy papers?
- “Ceph”, “CoralCDN”, “OmniLedger”
- Which is which?
- amusingly, anonymously, durably, hourly, imperfectly, liberally, meaningfully, preferably, stably, strikingly, suddenly
- “CoralCDN”
- adequately, economically, experimentally, mildly, pessimistically, redundantly, seriously, transitively, unfairly
- “OmniLedger”
- definitively, exceedingly, gladly, incompletely, maximally, ordinarily, overwhelmingly, sensibly, singly, unduly, variably
- “Ceph”
- amusingly, anonymously, durably, hourly, imperfectly, liberally, meaningfully, preferably, stably, strikingly, suddenly
- Six papers have no unique adverbs, including two really important ones. What
are they?
- “It’s the Critical Path!”, “Distributed Aggregation…”, “MapReduce”, “Mesos”, “Finding Bugs in Testing”, “Paxos”
- Jeff Dean may not like adverbs! TensorFlow only has one unique adverb (“enormously”).
- The Bitcoin paper has one unique adverb. What is it?
- “digitally”
Popular words
- I computed the most popular word in each paper, excluding common words like “the”.
- Can you guess the association?
Popular words: Easy
- Akamai
- “Akamai”
- Clockwork
- “Clockwork”
- CoralCDN
- “CoralCDN”
- Dask
- “Dask”
- DryadLINQ
- “DryadLINQ”
- F1
- “F1”
- Mesos
- “Mesos”
- MPI
- MPI Standard
- P4
- “p4”
- RDDs
- “Resilient Distributed Data Sets”
- Reduce
- “MapReduce”
- TensorFlow
- “TensorFlow”
- TLA
- “How AWS Uses Formal Methods”
Popular words: Harder
- Server
- “Parameter Server”
- Scheduling
- “The TensorFlow Partitioning and Scheduling Problem”
- Block
- “Bitcoin”
- File
- “Ceph”
- Job
- “Quincy”
- Tasks
- “Borg”
- Replicas
- “PBFT”
- Leader
- “Raft”
- Photo
- “Haystack”
Popular words: Shared
- Data
- “Dask vs. Spark”
- Data
- “NetSolve/D”
- Data
- “TritonSort”
- Graph
- “GraphX”
- Graph
- “Scalability!”
Most read authors
- The author listed the most papers we read was on 7 papers. Who were they?
- Michael Isard, MSR/Google
- “Quincy”, “DryadLINQ”, “Distributed Aggregation”, “COST!”, “Timely Dataflow”
- AND “TensorFlow”, “Dynamic TensorFlow”
- The next-most-cited author was on 6 papers. Who were they?
- Ion Stoica, Berkeley
- “Mesos”, “Spark”, “X-Trace”, “GraphX”, “Ray”, “RLlib”
Longest activity authors
- Two authors were both listed on two papers published 14 years apart. Who were they?
- Jeff Dean and Sanjay Ghemawat, Google
- “MapReduce” → “Distributed TensorFlow”
Charts
Themes
- Lazy vs. eager execution
- Both have advantages, many systems evolve toward supporting both
- Optimization of dataflow execution graphs
- e.g., the evolution of Spark
- Usability, performance, and supportability
- Motivation of MapReduce, Spark, TensorFlow, PyTorch
- Motivation of MPI, timely dataflow, Parameter Server
- Different supportability approaches behind X-Trace, pivot tracing, formal methods
- Persistence of bit twiddling
- Machine words in MPI and p4 through shared memory and serialization in Spark through tensor representations in Dask, PyTorch, TensorFlow
- Cache memory affecting scheduling policies in Dask
- Everything in TritonSort 😉
- New hardware and new problems need new systems
- Let’s write them