Must Have Skills
1. Programming languages: Python / Spark
2. Experience with Linux utilities & SQL.
3. Experience in using PySpark for data transformation.
4. Knowledge of AWS services such as Redshift, Glue, Cloudformation, EC2, S3, Lambda.
Good to Have Skills
1. ETL Tool Experience
2. AWS Exposure
Key Responsibilities
1. Responsible for the development of ETL pipeline in open source to carry out data ingestion
2. Write programs to extract data from data lake and curated data layer to meet business objectives
3. Collaborate with different teams to gather the understanding of the application to design the ETL pipeline
4. Gather business and functional requirements, and translate these requirements into robust, scalable, operable solutions that work well within the overall data architecture
5. Participate in the full development life cycle, end-to-end, from design, implementation, and testing, to documentation, delivery, support, and maintenance and produce comprehensive, usable dataset documentation and metadata