A Databricks Engineer is responsible for migrating from as-is on-perm data warehouse to Databricks and sound
knowledge of Databricks. Requires good understanding of data architecture, cloud solutions, building and maintainirobust, integrated, and governed data infrastructure. The role involves extracting valuable insights from
ensuring data security, compliance, and high-quality data management.
Roles And Responsibilities:
• Lead end to end data migration project from on-premises environments to Databricks with minimal downtime.
• Work with architects and lead solution design to meet functional and non-functional requirements.
• Hands on experience in Databricks to design and implement the solution on AWS.
• Hands on experience in configuring Databricks clusters, writing Pyspark codes, build CI/CD pipelines for the
deployments.
• Highly experienced in optimization techniques (Zordering, Auto Compaction, vacuuming)
• Process near real time data through Auto Loader, DLT pipelines
• Must have strong background in python and able to identify, communicate and mitigate risks and issues.
• Identify and resolve data-related issues and provide support to ensure data availability and integrity.
• Optimize AWS, Databricks resource usage to control costs while meeting performance and scalability
requirements.
• Stay up to date with AWS, Databricks services, and data engineering best practices to recommend and
implement new technologies and techniques.
• Proactively implement engineering methodologies, standards, and leading practices.
Requirements / Qualifications
• Bachelor’s or master’s degree in computer science, data engineering, or a related field.
• Minimum 5 years of experience in data engineering, with expertise in AWS or Azure services, Databricks, and/or
Informatica IDMC.
• Proficiency in programming languages such as Python, Java, or Scala for building data pipelines.
• Evaluate potential technical solutions and make recommendations to resolve data issues especially on
performance assessment for complex data transformations and long running data processes.
• Strong knowledge of SQL and NoSQL databases.
• Familiarity with data modelling and schema design.
• Excellent problem-solving and analytical skills.
• Strong communication and collaboration skills.
• Databricks certifications, and Informatica certifications are a plus.
Preferred Skills:
• Experience with big data technologies like Apache Spark and Hadoop on Databricks.
• Experience in AWS Services focusing on data and architecting.
• Knowledge of containerization and orchestration tools like Docker and Kubernetes.
• Familiarity with data visualization tools like Tableau or Power BI.
• Understanding of DevOps principles for managing and deploying data pipelines.
• Experience with version control systems (e.g., Git) and CI/CD pipelines.
• Knowledge of data governance and data cataloguing tools, especially Informatica IDMC.