Responsibilities:
- Design, develop, and optimize large-scale data pipelines to facilitate efficient data collection, processing, and storage.
- Maintain and monitor existing data pipelines, ensuring high data availability and consistency.
- Develop and refine data processing algorithms capable of handling and analyzing vast amounts of data.
- Utilize distributed computing frameworks such as Hadoop and Spark for scalable data processing.
- Design and maintain robust, large-scale data storage solutions using technologies like HDFS, Cassandra, NoSQL and BigQuery. Implement data management practices to ensure data integrity and security.
- Collaborate with data scientists, analysts, and other stakeholders to accurately understand data requirements and deliver effective solutions.
- Communicate complex data concepts and insights clearly to non-technical stakeholders.
- Continuously evaluate and enhance the existing data infrastructure and processes to improve performance and scalability.
Requirements:
- Bachelor's degree or higher in Computer Science, Data Science, or a related field. A minimum of 2 years of relevant professional experience is preferred.
- Proven experience in designing and maintaining data pipelines and storage solutions.
- Strong technical proficiency with distributed computing frameworks like Spark.
- Expertise in data management systems such as HDFS, Cassandra, NoSQL or BigQuery.