Responsibilities:
- Design, develop, and optimize large-scale data pipelines for efficient data collection, processing, and storage.
- Monitor and maintain data pipelines to ensure high availability and consistency.
- Develop algorithms for processing and analyzing large datasets.
- Utilize distributed computing frameworks like Hadoop and Spark for scalable data processing.
- Design and manage large-scale data storage solutions, including HDFS, Cassandra, NoSQL, and BigQuery.
- Implement robust data management practices to ensure data integrity and security.
- Collaborate with data scientists, analysts, and stakeholders to meet data requirements.
- Communicate complex data concepts effectively to non-technical stakeholders.
- Stay updated on the latest trends and advancements in big data technologies.
- Continuously enhance data infrastructure to improve performance and scalability.
Requirements:
- Bachelor's degree or higher in Computer Science, Data Science, or a related field.
- Minimum of 2 years of relevant professional experience preferred.
- Proven experience in designing and maintaining data pipelines and storage solutions.
- Strong proficiency with distributed computing frameworks, particularly Spark.
- Expertise in data management systems like HDFS, Cassandra, NoSQL, or BigQuery.