M.S. Computer Engineering at NYU Tandon. Incoming Software Development Engineer Intern at Amazon.
I work across ML systems, backend engineering, and applied research. Recent work includes CUDA kernel experiments on H100, production-minded NLP pipelines, and backend systems work from startup and research settings. I also worked on dysarthric speech research that led to an IEEE SPCOM 2024 paper and a journal acceptance in 2026.
- Fused Linear Attention: CUDA study of fused and hybrid attention kernels on H100, with profiling, correctness checks, and memory-traffic analysis.
- Deadline Detection System: RoBERTa plus BERT NER pipeline for contract deadline extraction with MLflow, Docker, and review routing.
- Portfolio: projects, writing, publications, and current work.
- inference optimization and GPU systems
- backend systems for ML products
- practical ML infrastructure and evaluation
- Portfolio: https://bhanuu01.github.io
- LinkedIn: https://www.linkedin.com/in/bhanujakarumuru/
- Email: bk3170@nyu.edu


