Design, implement, and maintain scalable and efficient data pipelines using Spark, Python, and PySpark.
Ensure the smooth flow of data from various sources to destination systems.
Utilize SAS, Spark, Python, and PySpark for efficient data processing.
Design and implement data models to support business requirements.
Work closely with data architects to ensure data models align with organizational needs.
Implement data quality checks to ensure accuracy and consistency.
Adhere to data governance policies and maintain data integrity.
Optimize data processing and query performance using Spark, Python, and PySpark.
Identify and resolve bottlenecks in the data pipeline.
Collaborate with data scientists, analysts, and business stakeholders to understand data requirements.
Bachelor's degree in Computer Science, Information Technology, or a related field.
Proven experience as a Data Engineer with expertise in Hadoop Implementation, Scala, Spark, Python, and PySpark.
Experience with big data technologies and distributed computing.
Proficient in, Spark, Python, and PySpark.
Strong problem-solving and analytical skills.
Excellent collaboration and communication skills.
Familiarity with data security and privacy best practices.
Experience with big data technologies such as Hadoop is a plus.
Good to have Connexa