Your Role
As a Lead Data Engineer, you will spearhead the development and optimization of our data infrastructure to enable advanced analytics capabilities. Collaborating with cross-functional teams, you will ensure seamless integration and delivery of robust, scalable, and efficient data solutions. In this leadership role, you will mentor and guide a team of engineers while contributing to the strategic direction of the organization’s data landscape.
Key Responsibilities
- Data Engineering & Development:Design and implement large-scale data processing pipelines using Spark Scala, handling structured and unstructured data.
Integrate Elasticsearch with Spark for efficient indexing, querying, and data retrieval.
Optimize Spark applications for performance and scalability, ensuring minimal latency and high throughput.
- Containerization & Orchestration:Deploy data solutions on OpenShift Container Platform (OCP) using containerization and orchestration techniques.
Optimize workflows for containerized environments and ensure efficient resource utilization.
- DevOps Integration:Collaborate with DevOps teams to streamline deployment using CI/CD pipelines and tools such as Jenkins, Bitbucket, and Ansible.
Implement robust monitoring and logging frameworks using tools like Grafana, Prometheus, and Splunk to maintain platform stability.
- Collaboration & Stakeholder Management:Work with data scientists, analysts, and infrastructure engineers to understand requirements and translate them into technical solutions.
Communicate technical designs and processes effectively to both technical and non-technical stakeholders.
- Leadership & Mentorship:Lead and mentor a team of data engineers, fostering a culture of continuous learning and innovation.
Advocate for best practices in data engineering, DevOps, and cloud technologies.
- Documentation & Research:Document workflows, pipelines, and configurations to ensure team knowledge sharing.
Stay abreast of emerging technologies and best practices in data engineering and cloud infrastructure.
Requirements
- Education:Bachelor’s degree in Computer Science, Data Engineering, Information Technology, or a related field.
- Experience:Minimum of 10 years in data engineering roles with expertise in Hadoop, Spark, and related tools (e.g., HDFS, Hive, Ranger).
Hands-on experience with OpenShift Container Platform (OCP) and container orchestration tools like Kubernetes.
Proven track record of building and scaling data infrastructure in large, complex environments.
- Technical Skills:Proficiency in programming languages such as Scala, Python, or Java.
Deep understanding of DevOps practices, including CI/CD pipelines and automation tools like Docker, Jenkins, and Ansible.
Familiarity with monitoring and logging tools such as Grafana, Prometheus, and Splunk.
Exposure to cloud platforms (AWS, Azure, GCP) and their data services is a plus.
- Soft Skills: Exceptional problem-solving abilities and a proactive approach to resolving technical challenges.
Strong communication and collaboration skills to work effectively across diverse teams.
Ability to balance multiple priorities and deliver high-quality results in a dynamic environment.
Preferred Qualifications
- Experience in cloud-native data services (e.g., AWS Redshift, Azure Data Factory, GCP BigQuery).
- Knowledge of advanced analytics and machine learning pipelines.