We are looking for a seasoned Data Engineer with over 10 years of experience in Big Data technologies, PySpark, Data Warehousing, and Cloud Data Platforms. The ideal candidate will have extensive expertise in designing and developing data pipelines and transforming data using tools such as Azure Data Factory, Azure Databricks, Ab Initio, Teradata, and SSIS.
The role involves working with PySpark and Spark SQL to perform data extraction, transformation, and aggregation from multiple sources and file formats. A strong understanding of Spark architecture and hands-on experience with Hive, HDFS, JSON, ORC, and Parquet is essential. The candidate should also have experience in data warehousing techniques such as slowly changing dimensions (SCD), surrogate key assignment, and change data capture (CDC).
Proficiency in Python and Unix shell scripting, along with Airflow, Control-M, and Autosys for orchestration, is required. The ideal candidate will be involved in end-to-end ETL implementations, including requirements gathering, database and process design, implementation, testing, and production support.
Certifications such as Databricks Certified Associate Developer for Apache Spark, Azure Data Engineer Associate (DP-203), and Teradata 14 Certified Professional are highly desirable