Mohammed Jaseel Kunnathodika

M.Sc. student in Electrical Engineering at the University of Stuttgart, working on controllable generative models for vision and speech. Based in Stuttgart, Germany. Open to research internships and research engineer roles.

Research interests: controllable generative models; video generation for autonomous driving; neural audio codecs and TTS; representation learning for structured control.

Research

Institute for Signal Processing & System Theory (ISS), University of Stuttgart — controllable generative models for autonomous driving.

Semantically Controlled Video Generation for Autonomous Driving

ISS · University of Stuttgart · Sep 2025 — Feb 2026. Supervisor: Khaled Seyam (PhD), Prof. Dr.-Ing. Bin Yang.

Designed a three-stage controllable video generation pipeline — Semantic-Native VAE → future semantic prediction → ControlNet-guided rendering on Stable Video Diffusion XT — for KITTI-360 driving scenes, addressing a core limitation of Ctrl-V (TMLR 2025).

Built a parameter-efficient Semantic-Native VAE (~200K trainable params) hitting 89.7% mIoU on validation clips (+35.4 pp over RGB-palette baseline). Final system reached FID 21.91, FVD-I3D 255.2, and 50.64% mIoU on 487 held-out clips, outperforming Ctrl-V by +20.2 pp.

Industry Experience

Neural Codec TTS with Controllable Duration & Voice Cloning — Sony Europe

Research Assistant · Stuttgart · May 2025 — May 2026. Supervised by Dr. Hassan Shahmohammadi.

Trained a 12-layer autoregressive Transformer language model for neural-codec TTS predicting discrete audio tokens; implemented speaker-rate prefix conditioning by discretising per-utterance token rates into rate bins, learning a rate embedding, and injecting it as a prefix to control speech tempo without forced alignment.

Executed a phased training workflow: large-scale multi-speaker pre-training for speaker-invariant linguistic / prosodic structure, followed by post-training for duration control and voice cloning. Built distributed training infrastructure with Prefix-LM masking, repetition-aware sampling, and multi-GPU training.

LLM Agent / Web Application — IKTD, University of Stuttgart

Student Assistant · Stuttgart · Oct 2024 — Sep 2025.

Developed a production RAG-based LLM agent for industrial supply-chain QA over heterogeneous technical documents (PDFs, spreadsheets), with multi-step query decomposition, embedding-based retrieval, and source citation; exposed via a stable API and dashboards.

Software Engineer, Embedded Communication Systems — Bosch Global Software Technologies

Coimbatore, India · Jul 2021 — Mar 2024.

Owned the AUTOSAR Classic Platform COM stack as module owner for EV charging ECUs: designed, implemented, and integrated CAN-based inter-ECU communication protocols for battery management and EV charging coordination. Full software lifecycle from architecture through MISRA-C coding, unit testing, and integration across multiple project releases.

Collaborated with hardware, systems, and validation teams to integrate the communication stack with power electronics and charging infrastructure; drove module-level technical decisions and code reviews as the communication-software owner within a cross-functional automotive team.

Engineering Projects

ScreenAI — Multimodal Desktop Assistant (Open Source)

Personal project · Mar 2026 — Present.

Open-source tray-based Electron assistant that streams multimodal AI answers about any screen region via a global hotkey; voice-first Jarvis mode captures the screen, transcribes speech with ElevenLabs STT, generates a step-by-step visual guide, and plays the response back via streaming TTS.

Engineered the full native desktop stack: HiDPI-aware region capture, secure preload/IPC boundaries, multi-provider LLM routing (Gemini, OpenAI), local API-key storage, and packaged Windows / macOS installer builds distributed via GitHub Releases.

Links: live site, GitHub.

Anytrace — AI Founder Detection for Early-Stage VCs (Hackathon)

TUM.ai × Yellow × Project A Hackathon · Munich · May 2026.

Co-built an AI deal-sourcing tool that detects founders before they show up in any startup database, by reading the public signals of a VC's trusted network and surfacing convergence patterns that point to early-stage talent. Shipped end-to-end within the hackathon: ranked outreach feed, interactive network graph explorer, evidence-linked founder dossiers, and an automated daily digest.

Demo: YouTube.

Powerly — AI Renewable Energy Designer (Hackathon)

Tech Europe Big Berlin Hackathon · Reonic Track · Berlin · Apr 2026.

Co-built an AI tool that produces a complete residential renewable-energy proposal in seconds — solar PV, battery, heat pump, and EV wallbox sized to the household — presented as Budget / Balanced / Premium options with cost, self-sufficiency, payback, and refinable live by the user.

Engine: deterministic sizing rules + retrieval over 1,000+ real installations + an LLM for composition with safe rule-based fallback. Owned roof intelligence and 3D experience: built a CV pipeline over photogrammetry building models that detects roof planes and auto-places panels, integrated Google's photorealistic 3D Maps for live click-to-place panel editing.

Links: GitHub, demo.

WhatsApp AI Lead Qualification & Scheduling Agent (Customer Deployed)

Production deployment for a fitness business · Mar — Apr 2026.

Shipped a production AI agent on WhatsApp: qualifies leads conversationally, proposes time slots, obtains coach approval, and sends Google Meet confirmations — deployed to a paying customer with zero manual intervention in the booking flow.

Stack: Next.js, Vercel AI SDK, Neon Postgres, Upstash Redis, Meta Cloud API. Engineered stateful lead tracking, webhook deduplication, per-number rate limiting, human-handoff triggers, and Vercel Cron for automated reminders.

AI Factory Scheduling Agent (Hackathon)

Physical AI × Manufacturing Hackathon · Forgis · Zurich · Feb 2026.

Co-built an AI agent that schedules factory production orders on the Arke MES, takes operator approval over Telegram, and executes each phase on a robot arm with vision-based verification. Designed and implemented the scheduling algorithm as a mixed-integer linear program (Google OR-Tools) producing provably optimal sequences that minimise deadline tardiness across multi-phase orders.

Built the vision-verification module that checks the workspace state after each robot move; simulated the full schedule live on an SO-101 robot arm during the 24h on-site demo.

Real-Time Voice Meeting Agent for Discord (In progress)

Personal project · Jan 2026 — Present.

Building an autonomous voice agent that joins Discord voice channels, engages in real-time conversation using STT/TTS, generates structured meeting summaries, and maintains persistent memory across sessions for longitudinal context.

YouTube Learning Companion — DeepNoteAI (Open Source)

Personal project · Feb — Jun 2025.

AI web app converting long-form YouTube videos into structured study sessions with segment-wise explanations and comprehension questions; multi-expert LLM pipeline with an async backend and FAISS-based vector retrieval. GitHub.

Education

M.Sc. Electrical Engineering — Smart Systems

University of Stuttgart, Germany · Oct 2024 — Present. Specialising in machine learning, generative modelling, and signal processing.

B.Tech Electrical and Electronics Engineering

National Institute of Technology, Calicut · 2017 — 2021. CGPA 8.01 / 10. Suspension & Steering Lead, Formula Student Team Unwired.

Stack

Frameworks: PyTorch, TensorFlow, HuggingFace Diffusers & Transformers.

Infrastructure: SLURM (multi-GPU H200/A6000), FP16 mixed-precision, Weights & Biases, distributed training.

Generative Modeling: Latent diffusion (SVD, DDPM, flow matching), ControlNet, VAE / VQVAE, autoregressive token models.

Computer Vision: Semantic segmentation (Cityscapes / KITTI-360), DRN-D-105, FID / FVD / mIoU evaluation.

Web & Apps: React, Next.js, Vercel AI SDK, Electron, Tailwind CSS.

Languages: Python, TypeScript / JavaScript, C / C++ (AUTOSAR / embedded).