Summary
Overview
Work History
Education
Skills
Timeline
Generic

ABDUL WAHAB SYED

Data Engineer
St Louis

Summary

Data Engineer with over 7+ years of experience in designing, building, and optimizing scalable data pipelines and cloud-native analytics solutions. Started career as an ETL Developer, gaining strong foundations in data integration, transformation, and warehousing, and transitioned into modern Data Engineering with hands-on expertise in Python, PySpark, and SQL. Skilled in developing both batch and real-time data pipelines using Spark, Kafka, and cloud services across AWS, Azure, and GCP ecosystems. Proficient in orchestrating workflows with Apache Airflow and Azure Data Factory, managing data lakes and warehouses (Snowflake, Synapse, BigQuery), and delivering clean, reliable, and analytics-ready data for BI and machine learning teams. Adept in data modeling, performance tuning, and implementing robust data quality and validation frameworks. Proven ability to collaborate across teams to drive data initiatives that support strategic business goals.

Overview

7
7
years of professional experience

Work History

Data Engineer

HCL Technologies Ltd
10.2017 - 12.2022

Client: Techem GmbH / Consumers Energy


  • Designed and developed scalable, cloud-native data pipelines (batch and streaming) using PySpark and Apache Spark across AWS, Azure, and GCP platforms.
  • Built robust ETL/ELT workflows using tools like Azure Data Factory, AWS Glue, and Databricks to ingest, cleanse, and transform large-scale structured and semi-structured data.
  • Developed real-time data streaming solutions with Kafka and AWS Kinesis to support business-critical insights and real-time analytics.
  • Implemented orchestration and workflow automation using Apache Airflow and Azure Data Factory to schedule, monitor, and manage end-to-end data pipelines.
  • Managed and optimized data lakes and data warehouses including Azure Synapse, Snowflake, BigQuery, and AWS S3 for scalable storage and analytics.
  • Applied advanced data modeling techniques including star/snowflake schema designs and OLAP cube implementation for BI and reporting needs.
  • Integrated and transformed data across hybrid sources (SQL, NoSQL, API, Flat Files) into platforms like Cosmos DB, Cassandra, and PostgreSQL using PySpark and Sqoop.
  • Established automated data validation, anomaly detection, and error-handling frameworks in PySpark and SSIS to ensure data quality and pipeline reliability.
  • Designed and implemented Spark-based batch and streaming jobs to process and enrich data, optimizing performance via partitioning, caching, and tuning.
  • Developed Python-based reusable transformation libraries and services (Flask, Pandas) for consistent data processing and API delivery.
  • Migrated on-premise databases (Oracle, SQL Server, MongoDB, DB2) to cloud environments (Azure Data Lake Storage, AWS S3) ensuring secure and reliable transfer.
  • Delivered analytical dashboards using Tableau and Looker by creating data models and metric views on Snowflake and Azure SQL.
  • Leveraged Delta Lake and Unity Catalog for robust data lake versioning, governance, and compliance.
  • Collaborated with cross-functional teams (data scientists, BI analysts, DevOps) to align data solutions with business goals and enable machine learning use cases.
  • Created CI/CD pipelines with Azure DevOps and handled infrastructure management using ARM templates and Terraform for automated deployments.
  • Client: Techem GmbH / Consumers Energy
  • Environment: Languages & Frameworks: Python, PySpark, SQL, Scala, Pandas, Django, Flask
  • Cloud Platforms: AWS (S3, EMR, Kinesis, Glue, Redshift), Azure (Data Lake, Synapse, ADF, Databricks, SQL, DevOps), GCP (BigQuery, Dataflow, Cloud Storage)
  • Big Data & Streaming: Apache Spark, Kafka, Airflow, Sqoop, Hive, HDFS, NiFi, Oozie
  • Databases: Snowflake, Azure SQL, Cosmos DB, Cassandra, MongoDB, PostgreSQL, MySQL
  • Orchestration & DevOps: Apache Airflow, Azure Data Factory, Jenkins, Terraform, Azure ARM, Tidal Scheduler
  • Visualization & BI: Tableau, Looker
  • Tools & Others: Visual Studio Code, Git, Pytest, SSIS, TensorFlow (optimization), Docker, Unity Catalog

ETL Developer

Sapotech Solutions Pvt Ltd
08.2015 - 09.2017

Client: ValueLabs


  • Designed and implemented ETL pipelines to extract, transform, and load data from diverse sources into Hive and Teradata environments.
  • Automated data ingestion and preprocessing workflows using Hive scripts, Python, Bash, and One Automation tools.
  • Developed scalable Spark (Scala) and PySpark scripts for both batch and streaming data transformations.
  • Created and managed Hive tables and wrote complex HiveQL queries to handle structured and semi-structured data.
  • Utilized Sqoop for data movement between HDFS and data warehouses such as Teradata and Snowflake.
  • Built robust full and incremental load processes using TDCH and SnowSQL scripting.
  • Collaborated with data warehouse teams to design dimensional data models and document detailed ETL specifications.
  • Developed and optimized SQL stored procedures and transformation logic for data cleansing, validation, and enrichment.
  • Supported and monitored batch processing jobs across hourly, daily, weekly, and monthly schedules.
  • Worked with tools from the Hadoop ecosystem, including Pig, Flume, MapReduce, and Cloudera CDH, for distributed data processing.
  • Client: ValueLabs
  • Technologies & Tools: Hive, HDFS, Sqoop, Spark (Scala & PySpark), Hadoop, Python, Bash, Snowflake, Teradata, TDCH, SnowSQL, Pig, Flume, MapReduce, Cloudera CDH

Education

Master of Science - Data Science

Maryville University of Saint Louis
St. Louis, MO
05.2001 -

Bachelor of Technology - Electrical, Electronics And Communications Engineering

Jawaharlal Nehru Technological University Hyderabad, India
05.2001 -

Skills

Python

PySpark

SQL

Scala

Pandas

Django

Flask

AWS

S3

EMR

Kinesis

Glue

Redshift

Azure

Data Lake

Synapse

ADF

Databricks

SQL

DevOps

GCP

BigQuery

Dataflow

Cloud Storage

Apache Spark

Kafka

Airflow

undefined

Timeline

Data Engineer

HCL Technologies Ltd
10.2017 - 12.2022

ETL Developer

Sapotech Solutions Pvt Ltd
08.2015 - 09.2017

Master of Science - Data Science

Maryville University of Saint Louis
05.2001 -

Bachelor of Technology - Electrical, Electronics And Communications Engineering

Jawaharlal Nehru Technological University Hyderabad, India
05.2001 -
ABDUL WAHAB SYEDData Engineer