Key Responsibilities:
- Design, develop, and maintain scalable data pipelines and ETL processes using Databricks and Python.
- Collaborate with data scientists, analysts, and other engineers to ensure efficient data flow and processing.
- Manage and optimize data workflows in AWS cloud environments.
- Ensure data quality through data cleansing, validation, and monitoring.
- Automate and optimize existing data pipelines to improve performance and reduce manual intervention.
- Work closely with stakeholders to understand business requirements and translate them into technical solutions.
- Troubleshoot and resolve issues related to data processing and workflow execution.
Requirements:
- 3-5+ years of hands-on experience with Databricks and Python in a production environment.
- Proven experience with AWS cloud services for data engineering workflows (e.g., S3, Lambda, Glue, Redshift).
- Strong knowledge of SQL and experience with relational databases.
- Solid understanding of data processing and transformation techniques.
- Familiarity with data architecture principles, ETL processes, and big data technologies.
- Excellent problem-solving skills and the ability to work independently or as part of a team.
- Experience with data visualization tools like Power BI or Tableau.
- Knowledge of Spark, Airflow, and other data orchestration tools is an advantage
- Strong understanding of DevOps practices for data engineering (CI/CD pipelines, version control, etc.).