Staff AI Engineer at LinkedIn leading three cross-cutting GenAI reliability initiatives spanning pre-deployment evaluation, post-deployment monitoring, and AI-driven observability across the platform’s GenAI product portfolio.
Conference speaker, peer reviewer (NeurIPS, IEEE Computer), and co-author of published AI research. MBA with Distinction from London Business School · B.Tech Computer Science from IIT Delhi · Fellow of BCS · IEEE Senior Member.
Technical lead for three cross-cutting GenAI reliability initiatives spanning pre-deployment evaluation, post-deployment monitoring, and AI-driven observability across LinkedIn’s GenAI product portfolio.
ML Tech Lead for Dash.ai, Dropbox’s AI-powered universal search product.
Worked on personalization, user representation models, and fairness in recommender systems, delivering measurable impact on systems affecting billions of users.
Worked on search and NLP systems for educational content platforms, building foundational experience in information retrieval and natural language processing at scale.
Delivered a presentation on the challenges of quality assurance and performance evaluation for production LLMs, covering robustness engineering practices for real-world deployments.
Led a hands-on workshop session demystifying the core agentic loop by building a tool-calling agent from scratch, as part of the Agentic AI Summit.
Explaining Model Decisions both globally and locally using capabilities like SHAP values Trust in Generative AI by focussing on logprobs and model's chain of thought and reasoning traces
Participated in a group discussion on AI compliance and ethics and addressed questions on balancing the need for explanations and transparency with the performance of the AI models
Presented a framework for objective-driven evaluation pipelines, mixing golden sets and automated scoring to accelerate model iteration.
Shared strategies to detect prompt and embedding drift, and to design feedback loops that surface real-world hallucinations before they reach end users.
Student-led discussion on responsible data practices for GenAI.
Wide-ranging Q&A on large-scale ML, LLM safety and career advice for AI practitioners.
Finance and healthcare leaders discussed the technical and regulatory hurdles of launching GenAI while meeting strict data-control and audit standards.
Explored transparency, fairness and accountability trade-offs when shipping LLM products under emerging AI-governance regimes.
Introduces matched-pair calibration, a test that measures subgroup-level exposure bias in score-based ranking systems by comparing outcomes of near-identical item pairs.
Proposes a benchmark for assessing attribution quality in AI-generated content by evaluating style embeddings alongside large-language-model Judges to determine how well generated output can be traced back to original sources. Accepted for presentation at the ICDM RARA Workshop (Grounding Documents with Reasoning, Agents, Retrieval, and Attribution) on 11/12/2025. https://raraworkshop.github.io/
A structured evaluation framework for LLM systems that emphasizes defining clear objectives, using curated datasets pre-deployment, and ongoing quality assessment post-deployment via feedback and monitoring.
Explains how the temperature parameter in large language models affects creativity, determinism, and output diversity, illustrating its real-world implications for reliability and user experience.
In this *Meet the Writer* interview, LinkedIn Staff AI Engineer Misam Abbas shares his journey from Meta and Dropbox to building trustworthy AI systems that balance ethics, diversity, and innovation. He discusses his writing process, fascination with AI temperature tuning, and the importance of explainability in large language models. Beyond code, Abbas champions mentorship, storytelling, and widening community participation in the evolution of AI.
Describes a platform that lets retail investors pool commitments and due diligence costs to acquire private-equity stakes in startups.
An introduction to sentiment analysis that makes the case that computers don't actually understand emotions, but they can make educated guesses about sentiment by analyzing text using computational methods like sentiment analysis. While these tools can process vast amounts of text quickly and identify general positive or negative sentiment, they still fall short of human-level performance and struggle with nuances like sarcasm.
Served as an official ethics reviewer, assessing compliance of research submissions for both the Regular and the Datasets and Benchmarks tracks.
Served as a peer reviewer for the IEEE Computer journal, completing two review assignments in early 2026 via Publons/Clarivate.
Served as a reviewer for a paper in Responsible AI
Judge for the student-run AI hackathon, scoring projects on impact, creativity and technical execution.
Judged participants in the Agent Hackathon organized by MindStudio.ai.
Judge for the Globee Artificial Intelligence Awards, reviewing cutting edge AI products and solutions.
Judge for the Webby Awards for AI General and Features Category