Lead Data & AI Engineer
📍 Hyderabad, Telangana, India
Professional Summary
- Seasoned Big Data Analytics and Machine Learning Engineer with 15+
years of expertise in developing and leading teams to build cloud-based
and on-premise big data platforms and ML solutions, specializing in
distributed processing on AWS, GCP and other cloud platforms.
- Good knowledge of application development in Big Data Ecosystem
(Hadoop, MapReduce, Spark, Trino, Hive, Airflow) and Cloud-Native
Computing Technologies (Docker, Kubernetes) etc.
- Hands on experience developing web and large scale data processing
applications in Java, Scala and Python.
- Good knowledge on the end to end machine learning model training,
optimization, monitoring and deployments, both offline and online models
for recommender systems.
- Good knowledge of computer algorithms and data structures and
various data processing frameworks and design patterns.
Professional Experience
Lead Data/AI Engineer
Thoughtworks India Pvt Ltd, Hyderabad, India |
Oct 2024 - Present
Client: Leading U.S. Home Improvement Retailer
- Spearheaded architecture and delivery of a scalable Promotion
Analytics Engine, driving campaign insights and analytics based on 20+
financial parameters.
- Developed robust, production-grade, DBT models based ETL pipelines
combining PySpark and BigQuery for batch and near-real-time
transformations.
- Integrated LightGBM regression models via BigQuery ML to enable
accurate volume forecasting.
- Led a team of 10 engineers, overseeing design, architecture, and
code reviews ensuring quality and scalability.
- Collaborated across various stakeholders – Product, Data Science,
and Architects.
Lead Data Engineer (LMTS)
Salesforce India Pvt Ltd, Hyderabad, India |
June 2022 - Sept 2024
Salesforce’s Unified Intelligence Platform (UIP) is an
enterprise-scale internal data lake and analytics ecosystem,
facilitating petabyte-scale data ingestion, exploration, transformation,
and visualization.
Tech Stack: AWS EMR | S3 | Airflow | Spark | Scala |
Python | Kubernetes | Docker | Trino | Iceberg | Jupyter Notebooks
- Led the architecture and development of a metadata-driven ingestion
pipeline processing petabytes of data, integrating Kafka, Spark, Scala,
Trino, and Airflow for scalable batch and streaming ingestion.
- Designed GDPR-compliant data leak scanners with sampling techniques,
cutting scanning costs by 40%.
- Built a high-throughput Leak Management Pipeline cleaning leaked PII
data across 3000+ record types and several PBs of data.
- Engineered an advanced tokenization service securing sensitive
identifiers while enabling efficient analytics.
- Developed Airflow Operators and frameworks to streamline ingestion
and exploration workflows.
- Implemented system monitoring and alerting with Grafana and
PagerDuty for real-time system visibility.
- Created a workload analytics dashboard using Apache Superset,
optimizing Spark cluster resource utilization by ~20%.
- Led a team of 5 engineers, driving design reviews, code quality
initiatives, and operational improvements.
Lead Data/ML Engineer
American Express (via Impetus Technologies Inc.), Phoenix,
Arizona, USA | Dec 2014 - June 2022
- Architected and built a merchant recommender system using an
ensemble of CatBoost, collaborative filtering, and Word2Vec models on
Spark, enhancing Amex marketing personalization.
- Developed end-to-end ML pipelines: feature extraction, model
training, hyperparameter tuning (distributed grid search), monitoring,
and deployment to online scoring systems.
- Reduced hyperparameter tuning time by 40% through distributed
Spark-based optimization.
- Designed and deployed microservices-based model serving architecture
for real-time, geo-personalized merchant recommendations.
- Built Model/Feature Monitoring solutions to track GINI, PSI, and
accuracy metrics, ensuring model health.
- Spearheaded development of an online offer personalization engine
with Hadoop, Hive, MapRDB, and Elasticsearch, improving campaign launch
speed by 50%.
- Collaborated cross-functionally with Product, Data Science, and
Marketing teams using Agile/SAFe methodologies.
Application Developer
JP Morgan Chase & Co., India | Aug 2012 -
Dec 2014
- Developed a multi-clustered distributed data management platform for
high availability, low-latency processing.
- Built real-time Search and Analytics solutions using ELK Stack
(Elasticsearch, Logstash, Kibana).
- Designed real-time order update systems using distributed caching
(Gemfire).
- Evangelized code quality tools (Sonar, Jira, Crucible), improving
team code health.
- Conducted a Hadoop PoC for analyzing cross-application usage
patterns.
- Worked on Messaging products, providing end to end integration
between many business-critical applications involving app. 2-3 million
message exchanges daily.
- Exposure to Multithreading and Java performance tuning methodologies
involving GC algorithms and tuning.
Systems Engineer
General Electric (GE) Company (TCS), India | Nov
2009 - July 2012
- Led re-architecture of a large-scale ASP/IIS application to a
Java/Spring microservices framework.
- Designed and developed RESTful APIs consumed by multiple
clients.
- Improved critical business process execution time by 50%, saving
$30,000.
- Optimized database queries and developed complex PL/SQL
procedures.
Technical Skills
- Spark Core/ML/Streaming
- Trino
- Hive
- Hadoop
- Apache Iceberg
- Airflow
- AWS (EMR, EC2, S3, IAM etc.)
- GCP (BigQuery, Dataproc, Cloud Composer, Storage etc.)
Programming
- Java
- Scala
- Python
- SQL
- PL/SQL
- Shell Scripting
Distributed Systems
- Kubernetes
- Docker
- Microservices Architecture
Data Security
- Tokenization
- GDPR Compliance
- Leak Management Pipelines
Additional Technologies
- Elasticsearch
- HBase
- MapRDB
- Pivotal Gemfire
- Apache Superset
- DBT
DevOps & CI/CD
- Jenkins
- Maven
- Gradle
- Terraform
- Git
- JIRA
- Hashicorp Vault
Education
Institute of Engineering and Technology, Indore, MP
| 2009 - B.E. in Computer Science Engineering - Top 1% of
batch
Certifications &
Publications
- GCP Certified Professional Data Engineer
- Cloudera Certified Hadoop Developer
- MapR Certified Spark Developer
- Authored a book on ELK (Elasticsearch, Logstash and Kibana) for
PacktPub
- Performed technical review for multiple Bigdata and ML books for
PacktPub
Awards & Accolades
- Outstanding Employee of the Year among 1500+ in the Big Data
group
- DZone Most Valuable Blogger (Top 5%) for contributions on web tech,
big data, and open source
- Top Performer in TCS Initial Learning Program (400+ trainees)
- 5× TCS Gems “On the Spot” awardee for exceptional performance and
client recognition
LinkedIn |
GitHub | Email