Hi !
As of May 2024, I am a Research Scientist at Meta.
For close to 12 years prior to that, I worked at the creative haven of Microsoft Research in Redmond, where I was a Sr. Principle Researcher.
There, among other things, I started and led Project Fiddle,
where my collaborators and I worked on systems for large-scale DNN training and serving.
I was also fortunate, working with a group of fearless engineers, to be one of a small group of software-systems architects
for Microsoft's early forays in 1P AI Supercomputing efforts.
Earlier still, many moons ago, I got my PhD in Computer Science at Carnegie Mellon University where I worked with Dave Andersen.
The core of my work, with fellow collaborators then, anchored around: (i) FAWN (energy-efficient distributed systems, especially for large-scale random-access IO-bound workloads),
and (ii) Incast (catastrophic TCP throughput collapse in datacenter-scale workloads).
Incast has since become an adjective and Wimpy is used in straight-faced technical conversations; who'd a thunk it?
The goal of my research is to enable the creation of high performace and efficient systems for large-scale data-intensive computing.
To this end, my work follows two broad approaches:
- Radically rethinking datacenter architecture
- Energy-efficient clusters for data intensive computing - FAWN
- Freespace optics for datacenter networks - ProjecToR
- Build robust distributed systems and protocols
- Efficient systems for distributed deep learning - Fiddle
- First to analyze and solve TCP throughput collapse and latency insensitivity for clustered systems - TCP Incast
- Distributed KV systems - FAWN-KV, Flex-KV (and the underlying Ouroborous protocol)
- Reliable and scalable multicast - Ricochet, Plato
- Compositional reasoning for systematically testing distributed systems - ModP
I have also occasionally looked at addressing issues in geo-distributed systems:
- High-performance multi-hop wireless transfers using opportunistic caching - Ditto
- Scalable one-hop overlay routing - NuRON
- Systems and abstractions for connected devices - Bolt (storage) and Beam (programming framework and runtime)
- Papers & Selected Articles
-
Data-driven Forecasting of Deep Learning Performance on GPUs
Seonho Lee, Amar Phanishayee, Divya Mahajan
ASPLOS 2025.
-
DéjàVu: KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving
Foteini Strati, Sara McAllister, Amar Phanishayee, Jakub Tarnawski, Ana Klimovic
ICML 2024.
-
Integrated Hardware Architecture and Device Placement Search
Irene Wang, Jakub Tarnawski, Amar Phanishayee, Divya Mahajan
ICML 2024.
-
MGit: A Model Versioning and Management System
Wei Hao, Daniel Mendoza, Rafael da Silva, Deepak Narayanan, Amar Phanishayee, Asaf Cidon, Junfeng Yang
ICML 2024.
-
Blox: A Modular Toolkit for Deep Learning Schedulers
Saurabh Agarwal, Amar Phanishayee, Shivaram Venkataraman
EuroSys 2024.
[longer version: arxiv:2312.12621 | Dec 2023.]
-
Workload-Aware Hardware Accelerator Mining for Distributed Deep Learning Training
Muhammad Adnan, Amar Phanishayee, Janardhan Kulkarni, Prashant J. Nair, Divya Mahajan
arxiv:2404.14632 | April 2024.
-
Packrat: Automatic Reconfiguration for Latency Minimization in CPU-based DNN Serving
Ankit Bhardwaj, Amar Phanishayee, Deepak Narayanan, Mihail Tarta, Ryan Stutsman
arXiv:2311.18174 | Nov 2023.
-
A Study on the Intersection of GPU Utilization and CNN Inference
Jack Kosaian, Amar Phanishayee
arXiv:2212.07936 | December 2022.
-
Harmony: Overcoming the hurdles of GPU memory capacity to train massive DNN models on commodity servers
Youjie Li, Amar Phanishayee, Derek Murray, Jakub Tarnawski, Nam Sung Kim
VLDB 2022.
[longer version with Appendix: arxiv:2202.01306]
-
Looking Beyond GPUs for DNN Scheduling on Multi-Tenant Clusters
Jayashree Mohan, Amar Phanishayee, Janardhan Kulkarni, Vijay Chidambaram
USENIX OSDI 2022.
[longer version with Appendix: arxiv:2110.06073]
-
Piper: Multidimensional Planner for DNN Parallelization
Jakub Tarnawski, Deepak Narayanan, Amar Phanishayee
NeurIPS 2021.
-
Efficient Large-Scale Language Model Training on GPU Clusters using Megatron-LM
Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Anand Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia
SC 2021 (Awarded Best Student Paper).
-
Boosting the Throughput and Accelerator Utilization of Specialized CNN Inference Beyond Increasing Batch Size
Jack Kosaian, Amar Phanishayee, Matthai Philipose, Debadeepta Dey, Rashmi Vinayak
ICML 2021.
-
Memory-Efficient Pipeline-Parallel DNN Training
Deepak Narayanan, Amar Phanishayee, Kaiyu Shi, Xie Chen, Matei Zaharia
ICML 2021.
-
Doing more with less: Training large DNN models on commodity servers for the masses
Youjie Li, Amar Phanishayee, Derek Murray, Nam Sung Kim
HotOS Workshop 2021.
-
CheckFreq: Frequent, Fine-Grained DNN Checkpointing
Jayashree Mohan, Amar Phanishayee, Vijay Chidambaram
USENIX FAST 2021.
-
Analyzing and Mitigating Data Stalls in DNN Training
Jayashree Mohan, Amar Phanishayee, Ashish Raniwala, Vijay Chidambaram
VLDB 2021.
[longer version: arXiv:2007.06775]
-
Efficient Algorithms for Device Placement of DNN Graph Operators
Jakub Tarnawski, Amar Phanishayee, Nikhil R. Devanur, Divya Mahajan, Fanny Nina Paravecino
NeurIPS 2020.
-
Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads
Deepak Narayanan, Keshav Santhanam, Fiodar Kazhamiaka, Amar Phanishayee, Matei Zaharia
USENIX OSDI 2020.
-
The Non-IID Data Quagmire of Decentralized Machine Learning
Kevin Hsieh, Amar Phanishayee, Onur Mutlu, and Phillip B Gibbons
ICML 2020.
-
Daydream: Accurately Estimating the Efficacy of Performance Optimizations for DNN Training
Hongyu Zhu, Amar Phanishayee, Gennady Pekhimenko
USENIX ATC 2020.
-
Blink: Fast and Generic Collectives for Distributed ML
Guanhua Wang, Shivaram Venkataraman, Amar Phanishayee, Jorgen Thelin, Nikhil Devanur, Ion Stoica
arXiv:1910.04940
MLSys 2020.
-
THEMIS: Fair and Efficient GPU Cluster Scheduling
Kshiteej Mahajan, Arjun Balasubramanian, Arjun Singhvi, Shivaram Venkataraman, Aditya Akella, Amar Phanishayee, Shuchi Chawla
USENIX NSDI 2020.
-
Analysis and Exploitation of Dynamic Pricing in the Public Cloud for ML Training
Deepak Narayanan, Keshav Santhanam, Fiodar Kazhamiaka, Amar Phanishayee, Matei Zaharia
VLDB DISPA Workshop 2020.
-
PipeDream: Generalized Pipeline Parallelism for DNN Training
Deepak Narayanan, Aaron Harlap, Amar Phanishayee, Vivek Seshadri, Nikhil Devanur, Greg Granger, Phil Gibbons, Matei Zaharia
ACM SOSP 2019.
-
The Case for Unifying Data Loading in Machine Learning Clusters
Aarati Kakaraparthy, Abhay Venkatesh, Amar Phanishayee, Shivaram Venkataraman
USENIX HotCloud Workshop 2019.
-
Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads
Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, Fan Yang
USENIX ATC 2019.
Philly Traces
-
Accelerating Deep Learning Workloads Through Efficient Multi-Model Execution
Deepak Narayanan, Keshav Santhanam, Amar Phanishayee, Matei Zaharia
NeurIPS Systems for ML Workshop 2018.
-
Compositional Programming and Testing of Dynamic Distributed Systems
Ankush Desai, Amar Phanishayee, Shaz Qadeer, Sanjit Seshia
ACM OOPSLA 2018.
-
Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training
Liang Luo, Jacob Nelson, Luis Ceze, Amar Phanishayee, Arvind Krishnamurthy
ACM SOCC 2018.
-
TBD: Benchmarking and Analyzing Deep Neural Network Training
Hongyu Zhu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Amar Phanishayee, Bianca Schroeder, Gennady Pekhimenko
ISSWC 2018.
-
Gist: Efficient Data Encoding for Deep Neural Network Training
Animesh Jain, Amar Phanishayee, Jason Mars, Lingjia Tang, Gennady Pekhimenko
ISCA 2018.
-
Atomic In-Place Updates for Non-Volatile Main Memories with Kamino-Tx
Amir Saman Memaripour, Anirudh Badam, Amar Phanishayee, Ram Alagappan, Yanqi Zhou, Karin Strauss, Steven Swanson
ACM EuroSys 2017.
-
RAIL: A Case for Redundant Arrays of Inexpensive Links in Data Center Networks
Danyang Zhuo, Manya Ghobadi, Ratul Mahajan, Amar Phanishayee, Xuan Zou, Hang Guan, Arvind Krishnamurthy, Thomas Anderson
ACM NSDI 2017.
-
ProjecToR: Agile Reconfigurable Data Center Interconnect
Manya Ghobadi, Ratul Mahajan, Amar Phanishayee, Nikhil Devanur, Jana Kulkarni, Gireeja Ranade, Pierre-Alexandre Blanche, Houman Rastegarfar, Madeleine Glick, and Daniel Kilper
ACM SIGCOMM 2016.
[Appendix]
-
Beam: Ending Monolithic Applications for Connected Devices
Chenguang Shen, Rayman Preet Singh, Amar Phanishayee, Aman Kansal, Ratul Mahajan
USENIX ATC 2016.
-
Evaluation of Elastic Modulation Gains in Microsoft’s Optical Backbone in North America
Manya Ghobadi, Jamie Gaudette, Ratul Mahajan, Amar Phanishayee, Buddy Klinkers, Daniel Kilper
Optical Fiber Communication Conference 2016.
-
It’s time to end monolithic apps for connected devices
Rayman Preet Singh, Chenguang Shen, Amar Phanishayee, Aman Kansal, Ratul Mahajan
USENIX Login Mag, Oct 2015.
-
A case for ending monolithic apps for connected devices
Rayman Preet Singh, Chenguang Shen, Amar Phanishayee, Aman Kansal, Ratul Mahajan
HotOS Workshop 2015.
-
Bolt: Data management for connected homes
Trinabh Gupta, Rayman Preet Singh, Amar Phanishayee, Jaeyeon Jung, Ratul Mahajan
USENIX NSDI 2014.
-
Towards a storage system for connected homes
Trinabh Gupta, Amar Phanishayee, Jaeyeon Jung, Ratul Mahajan
ACM Large Scale Distributed Systems Workshop, 2013.
-
Chaining For Flexible and High-performance KV Systems
Amar Phanishayee
Ph.D. Thesis, Department of Computer Science, Carnegie Mellon University, CMU-CS-12-139, Sept. 2012.
-
Flex-KV: Enabling High-performance and Flexible KV Systems
Amar Phanishayee, David G. Andersen, Himabindu Pucha, Anna Povzner, Wendy Belluomini
ACM Management of Big Data Systems Workshop, September 2012.
-
FAWN: A Fast Array of Wimpy Nodes
David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan
Communications of the ACM, July 2011, pages 101--109.
-
Scaling All-Pairs Overlay Routing to the Thousands
David Sontag, Yang Zhang, Amar Phanishayee, David Andersen, David Karger
ACM CoNEXT 2009.
-
FAWN: A Fast Array of Wimpy Nodes
David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan
ACM SOSP 2009 (Awarded Best Paper).
-
Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication
Vijay Vasudevan, Amar Phanishayee, Hiral Shah, Elie Krevat, David G. Andersen, Gregory R. Ganger, Garth A. Gibson, Brian Mueller
ACM SIGCOMM 2009.
-
FAWNdamentally Power-efficient Clusters
Vijay Vasudevan, Jason Franklin, David G. Andersen, Amar Phanishayee, Lawrence Tan, Michael Kaminsky, Iulian Moraru
HotOS Workshop 2009.
-
Ditto: A System for Opportunistic Caching in Multi-hop Wireless Mesh Networks
Fahad Dogar, Amar Phanishayee, Himabindu Pucha, Olatunji Ruwase, David Andersen
ACM MobiCom 2008.
-
Measurement and Analysis of TCP Throughput Collapse in Cluster-Based Storage Systems
Amar Phanishayee, Elie Krevat, Vijay Vasudevan, David G. Andersen, Gregory R. Ganger, Garth A. Gibson, Srinivasan Seshan
USENIX FAST 2008.
-
On Application-level Approaches to Avoiding TCP Throughput Collapse in Cluster-Based Storage Systems
Elie Krevat, Vijay Vasudevan, Amar Phanishayee, David G. Andersen, Gregory R. Ganger, Garth A. Gibson, Srinivasan Seshan
Petascale Data Storage Workshop at Supercomputing 2007.
-
Ricochet: Lateral Error Correction for Time-Critical Multicast
Mahesh Balakrishnan, Ken Birman, Amar Phanishayee, Stefan Pleisch
USENIX NSDI 2007.
-
Scalable Multicast Platforms for a New Generation of Robust Distributed Applications
Ken Birman, Mahesh Balakrishnan, Danny Dolev, Tudor Marian, Krzysztof Ostrowski, Amar Phanishayee
IEEE Comsware 2007.
-
PLATO: Predictive Latency-Aware Total Ordering
Mahesh Balakrishnan, Ken Birman, Amar Phanishayee
IEEE SRDS 2006.
Intro | Publications | Misc.