Data Scientist & AI Engineer

Sai Swapna Gollapudi

Building Intelligence at Scale — LLMs, NLP & ML Systems

Data Scientist at Amazon with 8+ years of experience designing and productionizing machine learning systems that operate at massive scale. Specializing in Large Language Models, NLP pipelines, and end-to-end ML architecture.

Sai Swapna Gollapudi
8+
Years Experience
AMLC
2025 Conference

Who I Am

About Me

I'm a Data Scientist at Amazon, where I build machine learning systems that operate at the intersection of scale and precision. My work spans personalized learning platforms, fraud detection, search relevance, and LLM evaluation — each touching millions of users globally.

With a Master's in Data Science from Indiana University Bloomington and nearly a decade of industry experience, I bring both theoretical rigor and hands-on engineering depth to every problem I tackle.

My research on Abuse Detection was accepted and presented at the Amazon Machine Learning Conference (AMLC) 2025 — a recognition of the novel multi-layer ML architecture I designed for enterprise-scale fraud pattern detection.

Outside of model building, I'm passionate about translating complex ML outputs into clear, actionable insights for business stakeholders — bridging the gap between research and real-world impact.

🧠 Large Language Models

Designing LLM-driven pipelines for skill extraction, feature engineering, code generation, and evaluation at production scale using Claude, GPT, and open-source models.

⚙️ ML Systems Engineering

End-to-end model development on AWS SageMaker — from experimentation on TB-scale datasets to productionized, monitored, and scalable deployments.

🔍 Search & Recommendations

Built semantic search and learning-to-rank systems replacing legacy BM25 models, with measurable NDCG metric improvements and stakeholder-driven A/B test validation.

📊 Anomaly & Fraud Detection

Designed multi-layer unsupervised ML architectures using Isolation Forest, Autoencoders, and DBSCAN to detect emerging fraud patterns across high-volume enterprise data.

Career

Work Experience

Amazon
Data Scientist
March 2022 — Present

Personalized Learning Experience (Learn)

  • Designed foundational ML architecture for a personalized learning platform processing 3M+ training documents
  • Built LLM-driven skill extraction pipeline using prompt engineering on unstructured training content
  • Implemented BERT/RoBERTa embedding-based skill normalization with K-Means and HDBSCAN clustering
  • Productionized models on AWS SageMaker with scalability, reproducibility, and monitoring
PythonPyTorchBERTRoBERTaSageMakerK-MeansHDBSCANClaude Sonnet
⭐ AMLC 2025 — Accepted & Presented

Abuse Detection

  • Led end-to-end experimentation on TB-scale datasets with missing/inconsistent data transformations
  • Proposed multi-layer ML architecture: unsupervised anomaly detection + consensus model to reduce false positives
  • Built LLM pipeline to extract structured features from policy documents under zero-shot and few-shot constraints
  • Designed LLM-assisted Python code generation with automated correction for production migration
Isolation ForestAutoencodersDBSCANOne-Class SVMSageMakerLambdaGluespaCy

Effortless Resolution Indicator

  • Developed effort estimation framework analyzing 2M+ monthly employee contacts across multiple channels
  • Created composite effort score combining operational metrics with ML-extracted behavioral and linguistic signals
  • Applied PCA, t-SNE, and clustering to identify effort archetypes and explain score drivers
PCAt-SNEAutoencodersNLPPrompt EngineeringSQL

LLM Evaluation Pipeline

  • Developed evaluation pipeline for globally deployed chatbot processing 50,000+ query-response pairs daily
  • Reduced processing time from 30 minutes to 30 seconds per 100 messages via parallel processing optimization
  • Benchmarked against AWS Bedrock Guardrails for hallucination detection and retrieval quality
SageMakerSQSLambdaBedrock APIGlueQuickSight

Search Engine Development & Topic Modeling

  • Replaced BM25 model with learning-to-rank algorithm, demonstrating NDCG improvements in two weeks
  • Designed A/B testing framework to measure search impact and refine ranking strategies
  • Developed BERTopic + LLM taxonomy generation system for document classification
BERTopicLearning-to-RankA/B TestingClaude
Capital One
Senior Data Scientist
May 2021 — March 2022

Identity Theft Detection

  • Built ML models to identify identity theft in customer login process using CNN, Random Forest, XGBoost, LSTM
  • Built end-to-end data pipelines with feature engineering and statistical analysis
  • Developed Tableau dashboards to monitor model performance and evaluate metrics
CNNXGBoostLSTMPythonSQLTableau
Capital One
Data Science Intern
June 2020 — Aug 2020

Digital Footprint Pattern Classification

  • Built model to identify and classify patterns in customer digital footprint with clickstream data
  • Applied LSTM/GRU, association rule mining, and convolutional neural networks
LSTMGRUCNNAssociation Rule Mining
Tata Consultancy Services
Data Analyst
July 2016 — July 2019

Predictive Classification & Financial Analytics

  • Built predictive classification models for retail order return analysis using Logistic Regression, Decision Trees, Random Forest with SMOTE for imbalanced data
  • Built risk classification model for Account Receivables
  • Developed automated financial analysis dashboards using RStudio, Spark, SQL, and Hadoop
PythonSklearnSMOTERStudioSparkHadoopSQL

Expertise

Skills & Technologies

Languages
PythonRSQLSparkPySparkHiveQLJavaJavaScriptReactNodeJSHTMLCSS
Cloud & Platforms
AWS SageMakerBedrockS3LambdaSQSSNSOpenSearchGlueGCPDatabricksMLflowHadoop
ML & Deep Learning
PyTorchTensorFlowKerasScikit-learnHugging FaceFastAIXGBoostRandom ForestSVM
LLMs & Generative AI
ClaudeGPTBERTRoBERTaPrompt EngineeringFine-tuningZero-shotFew-shotRAGEmbeddings
NLP & Algorithms
spaCyNLTKBERTopicK-MeansHDBSCANDBSCANIsolation ForestAutoencodersPCAt-SNE
Visualization & Analytics
TableauQuickSightA/B TestingStatistical AnalysisFeature EngineeringTemporal Validation

Academic Background

Education

Master of Science — Data Science
Indiana University Bloomington
Bloomington, IN  ·  Aug 2019 – May 2021
B.Tech — Mechanical Engineering
JNTU Hyderabad
Hyderabad, India  ·  Graduated May 2016

Get In Touch

Contact

Let's connect.

Whether you're interested in collaboration, research opportunities, or just want to talk AI and ML — feel free to reach out. I'm always open to meaningful conversations.

Open to Opportunities

I'm currently based in the US and open to Data Scientist, ML Engineer, and AI Research roles — particularly those focused on LLMs, NLP systems, and large-scale ML infrastructure.


⭐ AMLC 2025 presenter — Abuse Detection with multi-layer ML architecture