We are seeking a Senior Data Engineer with 10+ years of experience in software development and expertise in Big Data technologies. The ideal candidate will work on designing, developing, and maintaining scalable data pipelines and analytics platforms to meet diverse business requirements.
Responsibilities:
Build and optimize large-scale data pipelines using Hadoop, Spark (Core, SQL, Streaming), and PySpark.
Manage and process structured and semi-structured data using Hive, Sqoop, Impala, and Kudu.
Develop and implement streaming data solutions with Apache Kafka.
Collaborate with cross-functional teams to understand technical requirements and implement data-driven solutions.
Perform data validation, quality checks, and recon reports to ensure high data accuracy.
Work on AWS services (S3, Glue, Athena) for cloud-based data workflows and analytics.
Translate legacy systems to open-source frameworks, leveraging technologies like Python and SQL.
Optimize performance and scalability for data processing and storage systems.
Utilize Java/J2EE, Hibernate, and Struts frameworks for backend application development.
Provide mentorship to team members and support in troubleshooting and debugging complex data issues.
Requirements:
Proficiency in Big Data ecosystems, Spark, Hadoop, and related technologies.
Strong experience with PySpark, data transformation, and aggregation techniques.
Hands-on with AWS services and cloud-native data pipelines.
Solid understanding of SQL, RDBMS, and data integration tools.
Expertise in Java/J2EE frameworks and application development.
Strong communication and problem-solving skills.