Key Responsibilities:
1. Responsible for the development of ETL pipeline in open source to
carry out data ingestion
2. Write programs to extract data from data lake and curated data layer
to meet business objectives
3. Collaborate with different teams to gather the understanding of the
application to design the ETL pipeline
4. Gather business and functional requirements, and translate these
requirements into robust, scalable, operable solutions that work well
within the overall data architecture
5. Participate in the full development life cycle, end-to-end, from
design, implementation, and testing, to documentation, delivery,
support, and maintenance and produce comprehensive, usable
dataset documentation and metadata
Requirements:
1.Programming languages: Python / Spark
2.Experience with Linux utilities & SQL.
3.Experience in using PySpark for data transformation.
4. Knowledge of AWS services such as Redshift, Glue, Cloudformation,
EC2, S3, Lambda.
5. ETL Tool Experience would be an added advantage
6. AWS Exposure would be an added advantage