Reading
-
“PipeDream: generalized pipeline parallelism for DNN training”, Deepak Narayanan, Aaron Harlap, Amar Phanishayee, Vivek Seshadri, Nikhil R. Devanur, Gregory R. Ganger, Phillip B. Gibbons, Matei Zaharia (SOSP 2019)
-
“Efficient Memory Management for Large Language Model Serving with PagedAttention”, Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, Ion Stoica (SOSP 2023)
Reading questions
In what ways do the systems ideas applicable to DNN training differ from those applicable for LLM serving? How do the systems characteristics of these workloads affect the systems on which they run?