- Assist in the development and optimization of ETL processes using PySpark and Databricks.
- Collaborate with team members to design and implement data models and architecture in AWS.
- Maintain and monitor data pipelines to ensure data quality and reliability.
- Utilize GitHub for version control and collaboration on codebases.
- Participate in code reviews and contribute to best practices in coding and data engineering.
- Support the team in implementing CI/CD practices for data workflows.
- Document processes and workflows for data engineering tasks.
- Familiarity with AWS services (e.g., S3, Redshift, Glue) and cloud computing concepts.
- Basic knowledge of Databricks and PySpark for data processing.
- Understanding of Git and experience with GitHub for version control.
- Awareness of DevOps principles and tools (e.g., CI/CD, Docker) is a plus.
- Databricks certification is a plus.