Alay Shah is a Machine Learning Software Engineer at TensorOpera AI, currently building AI / ML platform that facilitates distributed execution and deployment for GenAI tasks. Anything that falls at the intersection of AI / ML and Distributed Systems excites him the most!
Master’s in Computer Science, 2021
University of Southern California
Post Graduate Diploma in Data Science, 2018
International Institute of Information Technology
Bachelor’s in Mechanical Engineering, 2017
Gujarat Technological University
Python, Java, Golang, C/C++, Bash, SQL
Docker, Kubernetes, AWS, Git, Redis, R2, Postgres, MQTT, Telemetry & Observability
Pytorch, Tensorflow, Keras, TensorRT
In this work, we optimized both engine and platform. Our study reveals that with the growing complexity of LLM applications, the platform latency will be the major bottleneck. Our take being, instead of optimizing the local inference speed, the industrial research should focus more on simplifying the serving gateway and optimizing the platform.