- Required skills:Minimum 2 years’ experience in Python and PySpark
Minimum 2 years’ experience in Cloudera Hadoop, Data Engineering, Data Pipelines (ETL (extract, transform, and load)
- Responsible for designing, developing, and maintaining data solutions for data generation, collection, and processing.
- Involve creating data pipelines, ensuring data quality, and implementing ETL (extract, transform, and load) processes to migrate and deploy data across systems.
- Act as an SME, collaborate and manage the team to perform.
- Responsible for team decisions and will engage with multiple teams and contribute to key decisions.
- Provide solutions to problems for your immediate team and across multiple teams.
- Hands on experience in big data engineering jobs using Python, PySpark, Linux.
- Develop innovative data solutions to optimize data generation, collection, and processing.
- Track record in implementing systems using Hive, Impala and Cloudera Data Platform will be preferred
- Implement advanced ETL processes to ensure efficient data migration and deployment.
- Collaborate with cross-functional teams to identify and address data quality issues.