Must Have Skills
- Programming languages: Python / Spark
- Experience with Linux utilities & SQL.
- Experience in using PySpark for data transformation.
- Knowledge of AWS services such as Redshift, Glue, Cloudformation, EC2, S3, Lambda.
Good to Have Skills
- ETL Tool Experience
- AWS Exposure
Key Responsibilities
- Responsible for the development of ETL pipeline in open source to carry out data ingestion
- Write programs to extract data from data lake and curated data layer to meet business objectives
- Collaborate with different teams to gather the understanding of the application to design the ETL pipeline
- Gather business and functional requirements, and translate these requirements into robust, scalable, operable solutions that work well within the overall data architecture
- Participate in the full development life cycle, end-to-end, from design, implementation, and testing, to documentation, delivery, support, and maintenance and produce comprehensive, usable dataset documentation and metadata