Job Summary:
We are seeking a skilled Data Engineer with 3 to 5+ years of experience in Databricks and Python to join our dynamic team. The ideal candidate will play a key role in designing, developing, and optimizing data pipelines and analytics solutions on Databricks. Experience with AWS Cloud is highly preferred.
Key Responsibilities:
- Develop and maintain scalable data pipelines and ETL processes using Databricks and Python.
- Optimize and troubleshoot existing Databricks workflows for performance and reliability.
- Collaborate with cross-functional teams to understand business requirements and translate them into technical solutions.
- Implement data processing solutions using Spark in Databricks.
- Integrate and manage data from multiple sources, ensuring accuracy, quality, and consistency.
- Leverage AWS cloud services (e.g., S3, Lambda, Redshift, Glue) for data storage, transformation, and analytics.
- Perform data profiling and validation to ensure the reliability of processed data.
- Monitor and maintain data pipelines for continuous performance improvements and error handling.
Required Skills:
- 3-5+ years of hands-on experience with Databricks and Python development.
- Strong expertise in PySpark and data transformation techniques.
- Proficiency in writing optimized, efficient, and scalable Python code.
- Experience with SQL and relational databases for data querying and manipulation.
- Familiarity with AWS services (e.g., S3, EC2, RDS) and cloud-based data engineering concepts.
- Knowledge of version control systems such as Git.
- Strong problem-solving and analytical skills with a focus on data processing and integration.
Preferred Skills:
- Working experience with AWS Redshift, Glue, or Athena.
- Familiarity with CI/CD pipelines for data engineering projects.
- Exposure to data visualization tools or frameworks.
- Knowledge of data security best practices and governance principles.
Qualifications:
- Bachelor's degree in Computer Science, Data Engineering, or a related field.
- Relevant certifications in Databricks, AWS, or Python development are a plus.