We are looking for a skilled Data Engineer with strong proficiency in Apache Spark and Scala to join our growing data team. The ideal candidate will have hands-on experience working with big data technologies and be responsible for building, managing, and optimizing data pipelines and distributed data processing systems. You will work closely with data scientists, analysts, and software engineers to ensure that data solutions meet business needs.
Key Responsibilities:
• Design, develop, and maintain data pipelines using Apache Spark and Scala.
• Implement real-time and batch processing data solutions.
• processing frameworks.
• Collaborate with data scientists and business teams to understand requirements and design scalable solutions.
• Ensure data quality, security, and integrity by implementing best practices.
• Support the migration of data and workloads to the cloud (e.g., AWS, GCP, or Azure).
• Conduct root cause analysis on data and processes to address issues as they arise.
• Monitor and troubleshoot system performance and data quality.
Requirements:
• Strong experience with Apache Spark and Scala for large-scale data processing.
• Proficiency in building distributed data processing systems using Spark.
• Experience in SQL for data manipulation and querying.
• Hands-on experience with cloud platforms like AWS, GCP, or Azure (preferably with tools like AWS Glue, Databricks, or EMR).
• Familiarity with big data ecosystems (e.g., Hadoop, HDFS, Hive, Kafka).
• Experience with version control systems like Git.
• Understanding of software engineering principles, including CI/CD pipelines.