Alay Shah

Alay Shah

Machine Learning Software Engineer

FedML, INC.

Biography

Alay Shah is a ML Software Engineer at FEDML, currently leading the development of AI / ML platform that facilitates distributed execution and deployment for GenAI tasks. Anything that falls at the intersection of Machine Learning and Distributed Systems excites him the most!

Interests
  • Large Language Models
  • Distributed Systems
  • Deep Learning
  • Computer Vision
  • Machine Learning
Education
  • Master’s in Computer Science, 2021

    University of Southern California

  • Post Graduate Diploma in Data Science, 2018

    International Institute of Information Technology

  • Bachelor’s in Mechanical Engineering, 2017

    Gujarat Technological University

Experience

 
 
 
 
 
FEDML, Inc.
ML Software Engineer
September 2023 – Present California
  • Building AI/ML platform that facilitates distributed execution and deployment for GenAI tasks.
  • Tech lead of orchestration and scheduling layer that enables spot jobs and model deployments on a decentralized compute plane.
  • Projects where I’ve played a significant role: Launch, Deploy, Compute, Storage
  • Technologies: Python, Java, SQL, Git, Docker, Kubernetes, Pytorch, MQTT, Redis, Bash, Jenkins, Jira
 
 
 
 
 
Palantir Technologies, Inc.
Backend Software Engineer
July 2021 – September 2023 California
  • Implemented monitoring and alerting systems for databases, resulting in a 75% reduction in downtime and a 50% increase in response time.
  • Developed Python automation scripts to transition assets to a multi-tenant environment, reducing human effort by 90%.
  • Contributed to improving authorization frameworks for data protection in multi-tenant setups.
  • Technologies: Java, Python, Golang, Bash, SQL, Git, Docker, Kubernetes, AWS, Observability
 
 
 
 
 
USC Viterbi School of Engineering
Research Assistant
August 2020 – May 2021 California
  • Advised by Professor Salman Avestimehr
  • Research Areas: Distributed Systems, Deep Learning, Computer Vision, Federated Learning, Machine Learning
  • Projects: FedCV, FedSegment
  • Technologies: Python, Pytorch, Communication Protocols
 
 
 
 
 
Amazon Web Services
Software Engineer Intern
May 2020 – August 2020 Washington
  • Developed visualization dashboard for forecast reports, aiding evaluation and improvement of ML model forecasts via user-friendly scenario-driven charts with customizable options.
  • Technologies: Vue, Javascript, Java

Skills

coding
Programming Languages

Python, Java, Golang, C/C++, Bash, SQL

distributed-systems
Tools & Technologies

Docker, Kubernetes, AWS, Git, Redis, MQTT, Observability

machine-learning
Machine Learning

Pytorch, Tensorflow, Keras

Projects

*
FedSegment
A Federated Learning Framework for Image Segmentation
FedSegment
Distributed Healthcare Resource Allocation System with Dynamic Offloading
Implemented a computational offloading distributed system based on client-server architecture using UDP and TCP sockets.
Distributed Healthcare Resource Allocation System with Dynamic Offloading

Contact