Key Responsibilities:
- Data Pipeline Design & Management:
Design, build, and maintain scalable and reliable data pipelines to support both real-time and batch processing requirements. - Workflow Optimization & Monitoring:
Manage, monitor, and optimize workflows in Apache Airflow to ensure data quality, integrity, and system performance. - Data Integration:
Develop and integrate data flows using Apache NiFi to ensure seamless data ingestion and transformation processes. - Real-Time Data Streaming:
Work extensively with Apache Kafka for data streaming and messaging across various data sources. - Data Processing Solutions:
Implement data processing solutions using PySpark and Spark Scala to handle large-scale datasets and complex transformations. - Automation & Scripting:
Write efficient code in Python and Java to automate data workflows and support data engineering needs, utilizing shell scripting for operational tasks. - Cross-functional Collaboration:
Collaborate with cross-functional teams to understand data requirements and provide optimized engineering solutions. - Data Security & Compliance:
Ensure data security, compliance, and performance by following best practices in big data and distributed systems. - Continuous Improvement:
Continuously improve the performance, scalability, and reliability of data processing pipelines.
Required Skills and Experience:
- Apache Airflow:
Extensive experience in managing, scheduling, and monitoring data pipelines. - Apache NiFi:
Strong experience in designing data flows for ingestion and transformation. - Apache Kafka:
In-depth knowledge of Kafka for real-time data streaming and messaging systems. - PySpark & Spark Scala:
Proficiency in using PySpark and Spark Scala for large-scale data processing. - Programming Languages:
Strong experience with Python and Java, with additional expertise in shell scripting. - Big Data Knowledge:
Familiarity with big data ecosystems and distributed data processing. - Problem-Solving & Teamwork:
Ability to work independently and collaboratively in a fast-paced environment, solving complex data engineering challenges.
Educational Qualifications:
- Required:
Bachelor’s or Master’s degree in Computer Science, Information Technology, Data Engineering, or a related field.
Preferred Qualifications:
- Cloud Experience:
Experience in cloud environments (AWS, GCP, Azure) with big data components. - Version Control & CI/CD:
Experience with version control tools (e.g., Git) and CI/CD practices in data engineering. - Analytical & Communication Skills:
Strong analytical, problem-solving, and communication skills.