4/9 Scalable AI scheduling

Reading

These papers contain some of the most important early work on AI-specific distributed systems.

  1. “Scaling distributed machine learning with the Parameter Server”, Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, Bor-Yiing Su (OSDI 2014; presentation available)

  2. “Gandiva: Introspective cluster scheduling for deep learning”, Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, Fan Yang, Lidong Zhou (OSDI 2018; presentation available)

Reading questions

Set up a meeting with Eddie to discuss project ideas.