x
Get our mobile app
Fast & easy access to Jobstore
Use App
Congratulations!
You just received a job recommendation!
check it out now
Browse Jobs
Companies
Campus Hiring
Download App
Jobs in Singapore   »   Jobs in Singapore   »   Pyspark Developer - End User Environment
 banner picture 1  banner picture 2  banner picture 3

Pyspark Developer - End User Environment

Bgc Group Pte. Ltd.

Bgc Group Pte. Ltd. company logo


Job Overview:


We are seeking a talented PySpark Developer with a strong background in distributed computing, data splitting, and parallel programming to join our dynamic data engineering team. The ideal candidate will have hands-on experience working with Resilient Distributed Datasets (RDD) and deep knowledge of scalable data processing frameworks.


Key Responsibilities:


· Design and implement scalable, distributed data processing pipelines using PySpark.

· Work with RDDs to efficiently manage large datasets across multiple nodes in a distributed cluster environment.

· Develop and optimize PySpark applications for data splitting, parallel processing, and transformation.

· Collaborate with data scientists and analysts to support their data preparation and processing needs.

· Optimize performance of Spark jobs by tuning partitioning, shuffling, and caching strategies.

· Ensure fault-tolerant, resilient data pipelines using PySpark’s fault-tolerance mechanisms.

· Troubleshoot and resolve issues related to distributed data processing in a cloud or on-premise environment.

· Contribute to the overall architecture and design of large-scale data systems.


Qualifications:


· Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.

· Proven experience in PySpark development and a deep understanding of Spark’s RDD API.

· Expertise in data partitioning, parallel programming, and handling large datasets in a distributed environment.

· Strong knowledge of Spark’s DAG execution, lazy evaluation, and optimizations.

· Experience with Hadoop, HDFS, and other distributed storage systems.

· Knowledge of SQL and experience working with structured and unstructured data.

· Familiarity with cloud platforms (AWS, Azure, GCP) and distributed storage solutions is a plus.

· Strong problem-solving skills and the ability to optimize data flows for performance and scalability.


Preferred Skills:


· Experience with Spark Structured Streaming for real-time data processing.

· Proficiency in Python or Scala for data manipulation and pipeline development.

· Knowledge of data pipeline orchestration tools like Apache Airflow or Kubernetes.


Benefits:

· Opportunities for professional growth and learning in the field of big data


For interested candidates:


Kindly send you resume to: [email protected], OR please click ‘Apply Now’.

We regret to inform that only candidates will be contacted.


Dianne Balmaceda Antonio

R1105287

BGC Group Pte Ltd

EA 05C3053

Sharing is Caring

Know others who would be interested in this job?