Mobeen Ahmad – Machine Learning Researcher

I build and evaluate multimodal AI systems.

I work on vision–language models, large-scale video understanding, and practical MLOps for deploying these systems in real products. I’m especially interested in retrieval-augmented pipelines, representation learning, and evaluation for safety and robustness.

Multimodal LLMs & video understanding
Large-scale evaluation & content safety
End-to-end pipelines from research to deployment

Current focus

Designing and evaluating video search & summarization pipelines.
Fine-tuning vision–language models for safety and classification.
Building tools to read and analyze research papers more efficiently.

About

I am a machine learning researcher working on multimodal AI and real-world deployment of vision–language systems. My recent work focuses on video search and summarization, content understanding, and evaluation pipelines that connect research models to production constraints.

Broadly, I enjoy working at the boundary between research ideas and systems that actually ship: designing models and representations, and then building the data, training, and inference infrastructure needed to make them useful in practice.

Research & Publications

I am currently organizing my publications, patents, and project notes. A more detailed, paper-style list will appear here soon.

Interests

Multimodal representation learning (text–image–video)
Evaluation and alignment of vision–language models
Retrieval-augmented generation and video summarization
Safety, brand suitability, and content classification

Selected themes

Building evaluation pipelines for large-scale video catalogs
Leveraging LLMs and VLMs for automatic labeling and curation
Designing long-form understanding tasks over video segments

Projects

Below are a few representative areas I’ve worked on recently. Some are research-oriented; others are closer to production ML engineering.

Video Search & Summarization (VSS)

End-to-end pipeline for indexing and retrieving scenes and moments from large video catalogs using multimodal embeddings, ASR, and LLM-based summarization.

Content Safety & Moderation

Fine-tuning vision–language models for multi-label content classification (e.g., safety, topics, and suitability), with an emphasis on interpretable outputs and robust thresholds.

Tools for Reading Papers

Early-stage tooling to make reading and revisiting research papers more efficient: linking highlights to exact locations, attaching notes, and experimenting with LLM support.

Experience (short)

A detailed CV is available on request; this is a compact snapshot of my recent work.

Machine Learning Researcher

Industry · Multimodal AI & Video Understanding

Working on video understanding, multimodal retrieval, and large-scale evaluation frameworks that connect LLMs/VLMs to real production workloads.
PhD in Computer Science

Deep Learning & Representation Learning

Research on model design and training methods with a focus on reducing human effort and deploying practical systems.

Contact

I’m open to conversations about research collaboration, postdoctoral opportunities, applied ML projects, and practical deployment of multimodal systems.

Email: ahmadmobeen24@gmail.com