Job Description:
A Databricks Engineer is responsible for leading the migration of on-premises data warehouses to Databricks, designing robust data architectures, and ensuring seamless data integration and governance. The role involves implementing cloud-based solutions, optimizing data workflows, and extracting insights while maintaining data security and compliance standards.
Key Responsibilities:
· Lead end-to-end data migration projects from on-premises environments to Databricks with minimal downtime.
· Design and implement solutions on Databricks within AWS environments, adhering to functional and non-functional requirements.
· Configure Databricks clusters, write PySpark code, and develop CI/CD pipelines for deployment.
· Apply optimization techniques like Z-ordering, auto-compaction, and vacuuming for enhanced performance.
· Process near real-time data using Auto Loader and DLT pipelines.
· Optimize AWS and Databricks resources for performance and cost efficiency.
· Stay updated with advancements in AWS, Databricks services, and data engineering practices to recommend improvements.
· Implement engineering methodologies, standards, and best practices.
Required Skills:
Experience:
· Minimum 5 years in data engineering, with expertise in Databricks, AWS/Azure services, or Informatica IDMC.
Technical Proficiency:
· Strong knowledge of Databricks, Python, PySpark, SQL, and NoSQL databases.
· Hands-on experience with Apache Spark, Hadoop, and cloud-based data architectures.
· Proficiency in building data pipelines using Python, Java, or Scala.
Data Management:
· Expertise in data modelling, schema design, and data governance.
Optimization Skills:
· Familiarity with techniques like Z-ordering and compaction.
Certifications:
· Databricks certifications and Informatica certifications (preferred).
Soft Skills:
· Excellent analytical, problem-solving, communication, and collaboration abilities.
Preferred Skills:
· Experience with containerization and orchestration tools like Docker and Kubernetes.
· Knowledge of data visualization tools like Tableau or Power BI.
· Familiarity with DevOps principles, CI/CD pipelines, and version control systems like Git.
· Understanding of data cataloguing tools and data governance, especially Informatica IDMC.
This role demands a proactive approach to implementing innovative solutions and optimizing workflows to ensure high-performance data engineering.