Job Duties
• Research, design, and develop computer and network software or specialised utility programs.
• Analyse user needs and develop software solutions, applying principles and techniques of computer science, engineering, and mathematical analysis.
• Update software, enhances existing software capabilities, and develops and direct software testing and validation procedures.
• Work with computer hardware engineers to integrate hardware and software systems and develop specifications and performance requirements.
Key Responsibilities
- Responsible for the development of ETL pipeline in open source to carry out data ingestion
- Write programs to extract data from data lake and curated data layer to meet business objectives
- Collaborate with different teams to gather the understanding of the application to design the ETL pipeline
- Gather business and functional requirements, and translate these requirements into robust, scalable, operable solutions that work well within the overall data architecture
- Participate in the full development life cycle, end-to-end, from design, implementation, and testing, to documentation, delivery, support, and maintenance and produce comprehensive, usable dataset documentation and metadata
Must Have Skills
- Programming languages: Python / Spark
- Experience with Linux utilities & SQL.
- Experience in using PySpark for data transformation.
- Knowledge of AWS services such as Redshift, Glue, Cloudformation, EC2, S3, Lambda.
Good to Have Skills
- ETL Tool Experience
- AWS Exposure