Job Summary:
An exciting opportunity for Data Engineer to be a part of a project for strategic initiatives in architecting, designing and building next-generation cloud-based data platforms.
Mandatory Skill-set:
- Degree in Data Engineering, Computer Science, Information Technology;
- At least 5 years of experience in data engineering and modelling;
- Strong hands-on exposure in data science, data analytics and / or data engineering;
- Deep understanding of system design, data structure and algorithms, data modelling, data access, and data storage;
- Have experience with Big Data platforms such as Hadoop, Spark, and NoSQL databases;
- Advanced SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases;
- Good understanding in using cloud technologies such as AWS, Azure, and Google Cloud;
- Experience with orchestration frameworks such as Azure Data Factory, Airflow;
- Proficiency in programming languages such as Python, R, Java, or Scala;
- Familiarity with building and using CI/CD pipelines;
- Familiarity with DevOps tools such as Git, Terraform, Git;
- Strong hands-on exposure with ETL Extract, Transform, Load tools;
- Possess strong written and verbal communication abilities, effectively conveying information to various stakeholders.
Desired Skill-set:
- Experience in designing, building, and maintaining batch and real-time data pipelines;
- Experience with Databricks or Snowflake.
Responsibilities:
- Work closely with the various team in architecting, designing and building ingestion pipelines to collect, clean, merge, and harmonize data from different source systems;
- Involved in architecting and scaling data analytics infrastructure on cloud environment; development of systems, architectures, and platforms that can scale large volume and variety;
- Architect and build ingestion pipelines to collect, clean, merge, and harmonize data from different source systems;
- Improve and optimize the workloads, processes to ensure that performance levels can support continuous accurate, reliable, and timely delivery of data products; ensure maximum database uptime;
- Monitoring of databases and ETL systems for database capacity planning and maintenance, monitoring, and performance tuning; diagnose issues and deploy measures to prevent recurrence;
- Construct, test, and update useful and reusable data models based on business data needs;
- Engagement and collaboration with product managers, software engineers, data analysts and data scientists to build scalable and data-driven platforms and tools;
- Collaborate with data stewards to establish and enforce data governance policies, best practices and procedures;
- Assist to maintain master data catalogue to document data assets, metadata and lineage;
- Implement and enforce data security best practices, including access control, encryption, and data masking, to safeguard sensitive data;
- Involved in managing production services that provide analytics capabilities to our various data users across the division.