Job Summary:
We are seeking a skilled and motivated Data Engineer with expertise in Scala, SQL, and Hadoop to join our dynamic team. In this role, you will design, develop, and maintain scalable data pipelines and systems to process large datasets. You will collaborate with cross-functional teams to deliver high-quality solutions for data storage, transformation, and analytics.
Key Responsibilities:
1. Data Pipeline Development
· Design, build, and optimize scalable data pipelines using Scala and Hadoop frameworks.
· Implement ETL processes for ingesting, transforming, and storing data from various sources
2. Data Analysis and Query Optimization
· Write and optimize complex SQL queries for efficient data retrieval and transformation.
· Troubleshoot and resolve performance issues in queries and data workflows.
3. Big Data Ecosystem Management
· Manage Hadoop-based data infrastructure, including HDFS, Hive, and related components.
· Monitor system performance and optimize resource utilization in a distributed environment.
4. Collaboration and Problem Solving
· Work closely with data analysts, data scientists, and business stakeholders to understand requirements and translate them into technical solutions.
· Provide technical guidance and best practices for data engineering within the team.
5. Automation and Monitoring
· Develop scripts and tools to automate repetitive tasks and enhance system monitoring.
· Ensure data quality and consistency through validation and monitoring mechanisms.
6. Documentation and Compliance
· Document technical solutions, workflows, and processes to ensure knowledge sharing and reproducibility.
· Ensure compliance with data governance and security policies.
Skills and Qualifications:
Technical Skills:
· Proficiency in Scala programming and functional programming principles.
· Strong SQL skills, including query optimization and database design.
· Hands-on experience with Hadoop ecosystem tools such as HDFS, Hive, YARN, and MapReduce.
· Familiarity with other big data tools like Spark, Kafka, or Flink is a plus.