Responsibilities:
· Data Pipeline Development: Develop and maintain data pipelines that extract, transform, and load (ETL) data from various sources into a centralized data storage system, such as a data warehouse or data lake.
· Data Integration: Integrate data from multiple sources and systems, including databases, APIs, log files, streaming platforms, and external data providers.
· Data Transformation and Processing: Develop data transformation routines to clean, normalize, and aggregate data. Apply data processing techniques to handle complex data structures, handle missing or inconsistent data, and prepare the data for analysis, reporting, or machine learning tasks.
· Contribute to common frameworks and best practices in code development, deployment, and automation/orchestration of data pipelines.
· Implement data governance in line with company standards.
· Partner with Data Analytics and Product leaders to design best practices and standards for developing and productionalising analytic pipelines.
· Partner with Infrastructure leaders on architecture approaches to advance the data and analytics platform, including exploring new tools and techniques that leverage the cloud environment (Azure, Databricks, others).
· Monitoring and Support: Monitor data pipelines and data systems to detect and resolve issues promptly. Develop monitoring tools, alerts, and automated error handling mechanisms to ensure data integrity and system reliability.
Requirements:
· Proven experience as a Data Engineering role (3 years +), with a strong track record of delivering scalable data pipelines.
· Extensive experience designing data solutions including data modelling is required.
· Extensive hands-on experience developing data processing jobs (PySpark / SQL) that demonstrate a strong understanding of software engineering principles is needed.
· Experience orchestrating data pipelines using technology like ADF, Airflow etc is necessary.
· Experience working with both real-time and batch data is important.
· Experience building data pipelines on Azure is crucial. AWS data pipelines will be beneficial.
· Fluency in SQL (any flavour), with experience using Window functions and more advanced features is required.
Understanding of DevOps tools, Git workflow and building CI/CD pipelines is essential.
Nice to Have:
· Domain knowledge of commodities, covering Sales, Trading, Risk, Supply Chain, Customer Interaction, etc. is highly desirable.
· Familiarity with Scrum methodology and experience working in a Scrum team can be advantageous. This includes understanding of Scrum roles, events, artifacts, and rules, and the ability to apply them in a practical context.
· Experience with streaming data processing technologies such as Apache Kafka, Apache Flink, or AWS Kinesis can be beneficial. This includes the ability to design and implement real-time data processing pipelines.