Saurabh Chhajed

Lead Data & AI Engineer

📍 Hyderabad, Telangana, India

Professional Summary

Seasoned Big Data Analytics and Machine Learning Engineer with 15+ years of expertise in developing and leading teams to build cloud-based and on-premise big data platforms and ML solutions, specializing in distributed processing on AWS, GCP and other cloud platforms.
Good knowledge of application development in Big Data Ecosystem (Hadoop, MapReduce, Spark, Trino, Hive, Airflow) and Cloud-Native Computing Technologies (Docker, Kubernetes) etc.
Hands on experience developing web and large scale data processing applications in Java, Scala and Python.
Good knowledge on the end to end machine learning model training, optimization, monitoring and deployments, both offline and online models for recommender systems.
Good knowledge of computer algorithms and data structures and various data processing frameworks and design patterns.

Professional Experience

Lead Data/AI Engineer

Thoughtworks India Pvt Ltd, Hyderabad, India | Oct 2024 - Present

Client: Leading U.S. Home Improvement Retailer

Spearheaded architecture and delivery of a scalable Promotion Analytics Engine, driving campaign insights and analytics based on 20+ financial parameters.
Developed robust, production-grade, DBT models based ETL pipelines combining PySpark and BigQuery for batch and near-real-time transformations.
Integrated LightGBM regression models via BigQuery ML to enable accurate volume forecasting.
Led a team of 10 engineers, overseeing design, architecture, and code reviews ensuring quality and scalability.
Collaborated across various stakeholders – Product, Data Science, and Architects.

Lead Data Engineer (LMTS)

Salesforce India Pvt Ltd, Hyderabad, India | June 2022 - Sept 2024

Salesforce’s Unified Intelligence Platform (UIP) is an enterprise-scale internal data lake and analytics ecosystem, facilitating petabyte-scale data ingestion, exploration, transformation, and visualization.

Led the architecture and development of a metadata-driven ingestion pipeline processing petabytes of data, integrating Kafka, Spark, Scala, Trino, and Airflow for scalable batch and streaming ingestion.
Designed GDPR-compliant data leak scanners with sampling techniques, cutting scanning costs by 40%.
Built a high-throughput Leak Management Pipeline cleaning leaked PII data across 3000+ record types and several PBs of data.
Engineered an advanced tokenization service securing sensitive identifiers while enabling efficient analytics.
Developed Airflow Operators and frameworks to streamline ingestion and exploration workflows.
Implemented system monitoring and alerting with Grafana and PagerDuty for real-time system visibility.
Created a workload analytics dashboard using Apache Superset, optimizing Spark cluster resource utilization by ~20%.
Led a team of 5 engineers, driving design reviews, code quality initiatives, and operational improvements.

Lead Data/ML Engineer

American Express (via Impetus Technologies Inc.), Phoenix, Arizona, USA | Dec 2014 - June 2022

Architected and built a merchant recommender system using an ensemble of CatBoost, collaborative filtering, and Word2Vec models on Spark, enhancing Amex marketing personalization.
Developed end-to-end ML pipelines: feature extraction, model training, hyperparameter tuning (distributed grid search), monitoring, and deployment to online scoring systems.
Reduced hyperparameter tuning time by 40% through distributed Spark-based optimization.
Designed and deployed microservices-based model serving architecture for real-time, geo-personalized merchant recommendations.
Built Model/Feature Monitoring solutions to track GINI, PSI, and accuracy metrics, ensuring model health.
Spearheaded development of an online offer personalization engine with Hadoop, Hive, MapRDB, and Elasticsearch, improving campaign launch speed by 50%.
Collaborated cross-functionally with Product, Data Science, and Marketing teams using Agile/SAFe methodologies.

Application Developer

JP Morgan Chase & Co., India | Aug 2012 - Dec 2014

Developed a multi-clustered distributed data management platform for high availability, low-latency processing.
Built real-time Search and Analytics solutions using ELK Stack (Elasticsearch, Logstash, Kibana).
Designed real-time order update systems using distributed caching (Gemfire).
Evangelized code quality tools (Sonar, Jira, Crucible), improving team code health.
Conducted a Hadoop PoC for analyzing cross-application usage patterns.
Worked on Messaging products, providing end to end integration between many business-critical applications involving app. 2-3 million message exchanges daily.
Exposure to Multithreading and Java performance tuning methodologies involving GC algorithms and tuning.

Systems Engineer

General Electric (GE) Company (TCS), India | Nov 2009 - July 2012

Led re-architecture of a large-scale ASP/IIS application to a Java/Spring microservices framework.
Designed and developed RESTful APIs consumed by multiple clients.
Improved critical business process execution time by 50%, saving $30,000.
Optimized database queries and developed complex PL/SQL procedures.

Technical Skills

Big Data Platforms

Spark Core/ML/Streaming
Trino
Hive
Hadoop
Apache Iceberg
Airflow

Cloud Platforms

AWS (EMR, EC2, S3, IAM etc.)
GCP (BigQuery, Dataproc, Cloud Composer, Storage etc.)

Programming

Java
Scala
Python
SQL
PL/SQL
Shell Scripting

Distributed Systems

Kubernetes
Docker
Microservices Architecture

Data Security

Tokenization
GDPR Compliance
Leak Management Pipelines

Additional Technologies

Elasticsearch
HBase
MapRDB
Pivotal Gemfire
Apache Superset
DBT

DevOps & CI/CD

Jenkins
Maven
Gradle
Terraform
Git
JIRA
Hashicorp Vault

Education

Institute of Engineering and Technology, Indore, MP | 2009 - B.E. in Computer Science Engineering - Top 1% of batch

Certifications & Publications

GCP Certified Professional Data Engineer
Cloudera Certified Hadoop Developer
MapR Certified Spark Developer
Authored a book on ELK (Elasticsearch, Logstash and Kibana) for PacktPub
Performed technical review for multiple Bigdata and ML books for PacktPub

Awards & Accolades

Outstanding Employee of the Year among 1500+ in the Big Data group
DZone Most Valuable Blogger (Top 5%) for contributions on web tech, big data, and open source
Top Performer in TCS Initial Learning Program (400+ trainees)
5× TCS Gems “On the Spot” awardee for exceptional performance and client recognition

Contact Information

LinkedIn | GitHub | Email