Naveen Sama | AI/ML Engineer

Who I Am

About Me

I'm an AI/ML Engineer with 6+ years of experience turning complex AI research into systems that handle millions of real-world interactions. At PNC Financial Services, I architect conversational AI, LLM agents, and RAG pipelines that automate banking workflows at scale.

My work spans the full AI lifecycle — from data pipelines and model fine-tuning to production deployment, evaluation, and monitoring. I specialize in LLM agent orchestration, retrieval-augmented generation, and LLMOps infrastructure.

Location San Jose, CA
Education M.S. Big Data Analytics — SDSU, 2024 (GPA 3.94)
Email naveensama0797@gmail.com
Focus LLM Agents · RAG · MLOps · Fine-tuning

95%

Python

93%

LLM Agents

88%

RAG / Vector DB

85%

MLOps

87%

PyTorch / TF

83%

AWS / Cloud

Portfolio

Featured Projects

🤖

LangGraph Research Agent

Autonomous multi-step research agent with persistent memory, tool orchestration, and ReAct reasoning loops.

LangGraphOpenAIFastAPIDocker

hover to flip →

Multi-agent StateGraph with planner, researcher, summarizer, and critic nodes. Streaming responses, conversation checkpointing, and ReAct tool-use loops deployed via FastAPI + Docker.

LangGraph · OpenAI · FastAPI · Redis

View on GitHub →

🔍

RAG Document QA System

End-to-end retrieval-augmented generation pipeline for document Q&A with hybrid search and reranking.

LangChainChromaDBFastAPIStreamlit

hover to flip →

Hybrid retrieval (dense embeddings + BM25), contextual compression reranking, and a Streamlit chat UI. Ingests PDFs, Word docs with chunk overlap and metadata filtering.

LangChain · ChromaDB · OpenAI · Streamlit

View on GitHub →

📈

LLM Time Series Forecaster

Benchmarks 8 LLM prompting strategies vs classical ML for short-term electricity load forecasting.

GPT-4oClaude 3.5LSTMProphet

hover to flip →

24 forecasters evaluated across 3 LLMs. Key finding: GPT-4o with statistical context + chain-of-thought achieves 3.41% MAPE — outperforming SARIMA, XGBoost, and LSTM baselines.

GPT-4o · Claude · LLaMA 3.1 · XGBoost · LSTM

View on GitHub →

🚀

MLOps Pipeline

Production-grade end-to-end ML pipeline with experiment tracking, model registry, and automated deployment.

MLflowDockerAWS ECSGrafana

hover to flip →

Full lifecycle: data validation (Great Expectations), MLflow tracking, staging→production promotion, Dockerized FastAPI serving on AWS ECS, Prometheus + Grafana monitoring, GitHub Actions CI/CD.

MLflow · FastAPI · Docker · AWS · Prometheus

View on GitHub →

⚾

MLB Win Predictor

ML system predicting MLB home team win probability using advanced sabermetrics and automated report generation.

scikit-learnSQLPlotlyDocker

hover to flip →

SQL feature extraction from Retrosheet data. Cramér's V + Tschuprow's T correlation analysis, Random Forest importance, brute-force variable selection. Auto-generates self-contained HTML analytics reports.

scikit-learn · Pandas · Plotly · SQL · Docker

View on GitHub →

🏥

Healthcare ML Suite

Predictive models for diabetes, heart disease, and cancer detection with SHAP explainability and class imbalance handling.

XGBoostSHAPSMOTEJupyter

hover to flip →

Ensemble models (XGBoost + Random Forest) with SHAP explainability for clinical interpretability. SMOTE for class imbalance. ROC-AUC > 0.92 across all three prediction tasks.

XGBoost · scikit-learn · SHAP · SMOTE

View on GitHub →

Career

Experience

Jun 2024 — Present

AI/ML Engineer

PNC Financial Services · San Jose, CA

Conversational AI using LangGraph + Pinecone handling 1M+ monthly conversations at sub-2.5s p95 latency, reducing live agent escalations by 20%
Extended to voice banking with LiveKit WebRTC, Whisper STT, and ElevenLabs TTS — 50K+ monthly interactions at sub-1.5s latency
KYC agent with human-in-the-loop for OFAC sanctions screening, cutting review time 60%; CrewAI credit memo agent reducing drafting from days to hours
Document AI service: 94% F1, 1,000+ docs/day, reducing manual form processing by 70% across agents and business units
Internal RAG chatbot with Neo4j knowledge graph — 98.9% citation compliance, 2,000+ daily users, sub-2s p95 latency
Fine-tuned Llama 3 8B, Mistral, Llama Nemotron via QLoRA/PEFT on EKS GPU nodes — 90%+ inference cost reduction vs external APIs
LLM evaluation with DeepEval, RAGAS, LLM-as-a-Judge, and RLHF-informed prompt tuning; reduced hallucination rates across all production agents

Jan 2024 — May 2024

Machine Learning Engineer Intern

Blue Cross Blue Shield of Michigan · San Diego, CA

Member risk-scoring pipeline on SageMaker (XGBoost + SHAP) achieving 87% AUC-ROC for chronic disease progression prediction
LLM evaluation frameworks for ContractsGPT and SecureGPT on AWS Bedrock — reduced hallucination 5%, improved latency 10%

May 2023 — Dec 2023

Data Science Intern

SDSU Research Foundation · San Diego, CA

Predictive model for high-priority customer classification (XGBoost, RF, LightGBM + SMOTE) at 84% recall; deployed on GCP Vertex AI
NLP pipeline (NLTK, spaCy, PyTorch) on 100GB+ FHIR EHR data reducing clinician review time from 5 min to 1 min; 92% F1 on 35K+ patient records

Jan 2021 — Jul 2022

Software Development Engineer

Amazon · India

Distributed competitor catalog scraping system processing 10M+ listings at 98% crawl success across 5+ marketplaces
Spark-based selection gap pipeline with BERT taxonomy classification covering 85% of catalog nodes; reduced selection gap by 5%
NLP query rewriting pipeline at sub-20ms latency reducing zero-result searches by 18% across multiple marketplaces

May 2019 — Jan 2021

Software Engineer

Accenture · India

Full-stack Java Spring Boot + React.js enterprise apps with JWT/OAuth 2.0; reduced UI defects 40%, improved developer velocity 30%
Data migration pipeline migrating 50M+ historical records at 99.5% accuracy with zero production downtime
Containerized microservices on Kubernetes with blue-green deployments; cut release cycles from 2 weeks to 2 days