Job Duties
• Research, design, and develop computer and network software or specialised utility programs.
• Analyse user needs and develop software solutions, applying principles and techniques of computer science, engineering, and mathematical analysis.
• Update software, enhances existing software capabilities, and develops and direct software testing and validation procedures.
• Work with computer hardware engineers to integrate hardware and software systems and develop specifications and performance requirements.
Key Responsibilities
Responsible for the development of ETL pipeline in open source to carry out data ingestion
Write programs to extract data from data lake and curated data layer to meet business objectives
Collaborate with different teams to gather the understanding of the application to design the ETL pipeline
Gather business and functional requirements, and translate these requirements into robust, scalable, operable solutions that work well within the overall data architecture
Participate in the full development life cycle, end-to-end, from design, implementation, and testing, to documentation, delivery, support, and maintenance and produce comprehensive, usable dataset documentation and metadata
Must Have Skills
Programming languages: Python / Spark
Experience with Linux utilities & SQL.
Experience in using PySpark for data transformation.
Knowledge of AWS services such as Redshift, Glue, Cloudformation, EC2, S3, Lambda.
Good to Have Skills
ETL Tool Experience
AWS Exposure