Research
Institute for Signal Processing & System Theory (ISS), University of Stuttgart — controllable generative models for autonomous driving.
Semantically Controlled Video Generation for Autonomous Driving
ISS · University of Stuttgart · Sep 2025 — Feb 2026. Supervisor: Khaled Seyam (PhD), Prof. Dr.-Ing. Bin Yang.
Designed a three-stage controllable video generation pipeline — Semantic-Native VAE → future semantic prediction → ControlNet-guided rendering on Stable Video Diffusion XT — for KITTI-360 driving scenes, addressing a core limitation of Ctrl-V (TMLR 2025).
Built a parameter-efficient Semantic-Native VAE (~200K trainable params) hitting 89.7% mIoU on validation clips (+35.4 pp over RGB-palette baseline). Final system reached FID 21.91, FVD-I3D 255.2, and 50.64% mIoU on 487 held-out clips, outperforming Ctrl-V by +20.2 pp.
Industry Experience
Neural Codec TTS with Controllable Duration & Voice Cloning — Sony Europe
Research Assistant · Stuttgart · May 2025 — May 2026. Supervised by Dr. Hassan Shahmohammadi.
Trained a 12-layer autoregressive Transformer language model for neural-codec TTS predicting discrete audio tokens; implemented speaker-rate prefix conditioning by discretising per-utterance token rates into rate bins, learning a rate embedding, and injecting it as a prefix to control speech tempo without forced alignment.
Executed a phased training workflow: large-scale multi-speaker pre-training for speaker-invariant linguistic / prosodic structure, followed by post-training for duration control and voice cloning. Built distributed training infrastructure with Prefix-LM masking, repetition-aware sampling, and multi-GPU training.
LLM Agent / Web Application — IKTD, University of Stuttgart
Student Assistant · Stuttgart · Oct 2024 — Sep 2025.
Developed a production RAG-based LLM agent for industrial supply-chain QA over heterogeneous technical documents (PDFs, spreadsheets), with multi-step query decomposition, embedding-based retrieval, and source citation; exposed via a stable API and dashboards.
Software Engineer, Embedded Communication Systems — Bosch Global Software Technologies
Coimbatore, India · Jul 2021 — Mar 2024.
Owned the AUTOSAR Classic Platform COM stack as module owner for EV charging ECUs: designed, implemented, and integrated CAN-based inter-ECU communication protocols for battery management and EV charging coordination. Full software lifecycle from architecture through MISRA-C coding, unit testing, and integration across multiple project releases.
Collaborated with hardware, systems, and validation teams to integrate the communication stack with power electronics and charging infrastructure; drove module-level technical decisions and code reviews as the communication-software owner within a cross-functional automotive team.
Engineering Projects
ScreenAI — Multimodal Desktop Assistant (Open Source)
Personal project · Mar 2026 — Present.
Open-source tray-based Electron assistant that streams multimodal AI answers about any screen region via a global hotkey; voice-first Jarvis mode captures the screen, transcribes speech with ElevenLabs STT, generates a step-by-step visual guide, and plays the response back via streaming TTS.
Engineered the full native desktop stack: HiDPI-aware region capture, secure preload/IPC boundaries, multi-provider LLM routing (Gemini, OpenAI), local API-key storage, and packaged Windows / macOS installer builds distributed via GitHub Releases.
Links: live site, GitHub.
Anytrace — AI Founder Detection for Early-Stage VCs (Hackathon)
TUM.ai × Yellow × Project A Hackathon · Munich · May 2026.
Co-built an AI deal-sourcing tool that detects founders before they show up in any startup database, by reading the public signals of a VC's trusted network and surfacing convergence patterns that point to early-stage talent. Shipped end-to-end within the hackathon: ranked outreach feed, interactive network graph explorer, evidence-linked founder dossiers, and an automated daily digest.
Demo: YouTube.
Powerly — AI Renewable Energy Designer (Hackathon)
Tech Europe Big Berlin Hackathon · Reonic Track · Berlin · Apr 2026.
Co-built an AI tool that produces a complete residential renewable-energy proposal in seconds — solar PV, battery, heat pump, and EV wallbox sized to the household — presented as Budget / Balanced / Premium options with cost, self-sufficiency, payback, and refinable live by the user.
Engine: deterministic sizing rules + retrieval over 1,000+ real installations + an LLM for composition with safe rule-based fallback. Owned roof intelligence and 3D experience: built a CV pipeline over photogrammetry building models that detects roof planes and auto-places panels, integrated Google's photorealistic 3D Maps for live click-to-place panel editing.
Links: GitHub, demo.
WhatsApp AI Lead Qualification & Scheduling Agent (Customer Deployed)
Production deployment for a fitness business · Mar — Apr 2026.
Shipped a production AI agent on WhatsApp: qualifies leads conversationally, proposes time slots, obtains coach approval, and sends Google Meet confirmations — deployed to a paying customer with zero manual intervention in the booking flow.
Stack: Next.js, Vercel AI SDK, Neon Postgres, Upstash Redis, Meta Cloud API. Engineered stateful lead tracking, webhook deduplication, per-number rate limiting, human-handoff triggers, and Vercel Cron for automated reminders.
AI Factory Scheduling Agent (Hackathon)
Physical AI × Manufacturing Hackathon · Forgis · Zurich · Feb 2026.
Co-built an AI agent that schedules factory production orders on the Arke MES, takes operator approval over Telegram, and executes each phase on a robot arm with vision-based verification. Designed and implemented the scheduling algorithm as a mixed-integer linear program (Google OR-Tools) producing provably optimal sequences that minimise deadline tardiness across multi-phase orders.
Built the vision-verification module that checks the workspace state after each robot move; simulated the full schedule live on an SO-101 robot arm during the 24h on-site demo.
Real-Time Voice Meeting Agent for Discord (In progress)
Personal project · Jan 2026 — Present.
Building an autonomous voice agent that joins Discord voice channels, engages in real-time conversation using STT/TTS, generates structured meeting summaries, and maintains persistent memory across sessions for longitudinal context.
YouTube Learning Companion — DeepNoteAI (Open Source)
Personal project · Feb — Jun 2025.
AI web app converting long-form YouTube videos into structured study sessions with segment-wise explanations and comprehension questions; multi-expert LLM pipeline with an async backend and FAISS-based vector retrieval. GitHub.
Stack
Frameworks: PyTorch, TensorFlow, HuggingFace Diffusers & Transformers.
Infrastructure: SLURM (multi-GPU H200/A6000), FP16 mixed-precision, Weights & Biases, distributed training.
Generative Modeling: Latent diffusion (SVD, DDPM, flow matching), ControlNet, VAE / VQVAE, autoregressive token models.
Computer Vision: Semantic segmentation (Cityscapes / KITTI-360), DRN-D-105, FID / FVD / mIoU evaluation.
Web & Apps: React, Next.js, Vercel AI SDK, Electron, Tailwind CSS.
Languages: Python, TypeScript / JavaScript, C / C++ (AUTOSAR / embedded).