Key Responsibilities
- Design, develop, and implement Spark Scala applications and data processing pipelines to process large volumes of structured and unstructured data.
- Integrate Elasticsearch with Spark to enable efficient indexing, querying, and retrieval of data.
- Optimize and tune Spark jobs for performance and scalability, ensuring efficient data processing and indexing in Elasticsearch.
- Collaborate with data engineers, data scientists, and other stakeholders to understand requirements and translate them into technical specifications and solutions.
- Implement data transformations, aggregations, and computations using Spark RDDs, DataFrames, and Datasets, and integrate them with Elasticsearch.
- Develop and maintain scalable and fault-tolerant Spark applications, adhering to industry best practices and coding standards.
- Troubleshoot and resolve issues related to data processing, performance, and data quality in the Spark-Elasticsearch integration.
- Monitor and analyze job performance metrics, identify bottlenecks, and propose optimizations in both Spark and Elasticsearch components.
- Stay updated with emerging trends and advancements in the big data technologies space to ensure continuous improvement and innovation.
Key Requirements
- Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field.
- Strong experience in developing Spark applications. Experience with Spark Streaming is a plus.
- Proficiency in Scala programming language and familiarity with functional programming concepts.
- In-depth understanding of Apache Spark architecture, RDDs, DataFrames, and Spark SQL.
- Experience integrating and working with Elasticsearch for data indexing and search applications.
- Solid understanding of Elasticsearch data modeling, indexing strategies, and query optimization.
- Experience with distributed computing, parallel processing, and working with large datasets.
- Familiarity with big data technologies such as Hadoop, Hive, and HDFS.
- Proficient in performance tuning and optimization techniques for Spark applications and Elasticsearch queries.
- Strong problem-solving and analytical skills with the ability to debug and resolve complex issues.
- Familiarity with version control systems (e.g., Git) and collaborative development workflows.
- Excellent communication and teamwork skills with the ability to work effectively in cross-functional teams.