Job Title: Senior Data Engineer (5-10 Years of Experience)
Position Summary:
A company is seeking an experienced and skilled Senior Data Engineer with 5 to 10 years of hands-on experience in building and managing data pipelines and big data infrastructure. The ideal candidate will possess a strong technical background in data engineering tools, big data ecosystems, and programming languages. This role focuses on designing, implementing, and optimizing complex data workflows using technologies such as Apache Airflow, Apache NiFi, Kafka, PySpark, and Spark Scala. Candidates with a strong background in Python, Java, and shell scripting are highly preferred.
Responsibilities:
- Design, build, and maintain scalable and reliable data pipelines to support real-time and batch processing requirements.
- Manage, monitor, and optimize workflows in Apache Airflow to ensure data quality and integrity.
- Develop and integrate data flows using Apache NiFi for seamless data ingestion and transformation.
- Work extensively with Apache Kafka for data streaming and messaging across various data sources.
- Implement data processing solutions using PySpark and Spark Scala to handle large-scale datasets and complex transformations.
- Write efficient code in Python and Java to automate data workflows and support data engineering needs.
- Utilize shell scripting for operational tasks and automation within the data environment.
- Collaborate with cross-functional teams to understand data requirements and provide optimized data engineering solutions.
- Ensure data security, compliance, and performance by following best practices in big data and distributed systems.
- Continuously improve the performance, scalability, and reliability of data processing pipelines.
Required Skills and Experience:
- Apache Airflow: Extensive experience in managing, scheduling, and monitoring data pipelines.
- Apache NiFi: Strong experience in designing data flows for data ingestion and transformation.
- Apache Kafka: In-depth knowledge of Kafka for real-time data streaming and messaging systems.
- PySpark & Spark Scala: Proficiency in using PySpark and Spark Scala for large-scale data processing.
- Programming Languages: Strong experience with Python and Java, with additional knowledge in shell scripting.
- Big Data Knowledge: Familiarity with big data ecosystems and distributed data processing.
- Ability to work independently and collaboratively in a fast-paced environment, handling complex data engineering challenges.
Educational Qualifications:
- Bachelor’s or Master’s degree in Computer Science, Information Technology, Data Engineering, or a related field.
Preferred Qualifications:
- Experience in cloud environments (AWS, GCP, Azure) with big data components is a plus.
- Experience with version control tools (e.g., Git) and CI/CD practices in data engineering.
- Strong analytical, problem-solving, and communication skills.
Employment Type: Full-Time