Pratham Arora
Final-year CS & AI at Plaksha University. I build production ML systems — RAG pipelines, LLM agents, applied NLP. Researching where frontier VLMs fail at physical-world reasoning.
Available for full-time roles from August 2026 · Tailored AI/ML and SDE resumes available on the resume page
Experience
AI Engineering Intern — Cotality (PropTech SaaS)
June 2025 – July 202585%
Cost Reduction
50k+ LOC
Lines Per Day
15+ langs
Languages
12 APIs
Endpoints
- Architected FastAPI backend parsing 50,000+ LOC/day across 15+ languages via Abstract Syntax Tree (AST) analysis, enabling automated real-time documentation generation across polyglot codebases.
- Reduced AI embedding generation costs by 85% by designing a hash-based change detection system with SHA-256 on AST nodes, chunking codebases function-by-function and skipping unchanged functions to eliminate redundant vector database updates.
- Built RAG pipeline using LangChain, Azure OpenAI embeddings, and Cosmos DB with DiskANN vector search for semantic code retrieval; migrated from FAISS as retrieval latency requirements tightened with repository growth.
- Designed and delivered 12 production RESTful API endpoints with typed Pydantic response schemas, powering real-time code documentation preview in the client-facing web interface.
Projects
View all ↗VR LLM Conversational Agent
AI/MLMost VR agents have 3–5 second response latency — long enough to break immersion. I built a conversational agent that hits 1.8s average by parallelizing Gemini 2.5 Flash calls with Google Cloud STT/TTS, caching partial results, and preprocessing audio before transmission.
AI Resume Builder & Interview Prep Tool
Full StackTailoring a resume for each role is tedious and opaque. This tool takes your resume and a job description, generates an ATS-optimised version using Gemini 2.5 Flash via streaming API, and surfaces role-specific interview questions by matching your experience against a structured DSA/STAR prep database.
Kelp Forest Semantic Segmentation
AI/MLKelp forests are a critical ocean ecosystem — detecting them in satellite imagery is hard because atmospheric interference corrupts images unpredictably. I trained a U-Net with EfficientNet-B3 backbone for segmentation and built a streak-detection pipeline to filter corrupted training images before they hurt model accuracy.
Anuj Desai Associates — CA Firm Website
Full StackDesigned and shipped a full-stack site for a CA firm with a careers page and a private admin dashboard where firm partners review and manage candidate applications, powered by Firebase Realtime Database.
Research
Undergraduate Researcher — Plaksha University
Jan. 2026 – PresentVisual Benchmarking of VLMs | Sup. Prof. Pankaj Pansari
- Constructing a 500-image benchmark dataset across five physical estimation tasks (weight, volume, angle, fit, structural stability) with instrument-verified ground truth to evaluate frontier VLMs on physical-world grounding under naturalistic conditions.
- Preliminary evaluation across 200 images: Gemini 3 Pro achieves 21.57% MAPE on angle estimation vs. 28.99% for GPT-5.2; both models score near chance (0.476) on structural stability, identifying an unsolved failure mode in current frontier models.
- Designing human-baseline study platform to collect performance data across all five tasks.
Undergraduate Researcher — Plaksha University
June 2023 – Aug. 2023Supervised by Prof. Sandeep Manjanna
- Developed preprocessing pipeline (contrast adjustment, noise reduction, edge detection) to optimise Segment Anything Model (SAM) for agricultural crop-weed segmentation on sparse datasets, maximising zero-shot segmentation quality on out-of-distribution agricultural imagery.
- Improved batch processing throughput for 10,000+ image datasets by implementing vectorized NumPy operations and Python multiprocessing.