Key Skills:
Hadoop , Data Lake, SQL , Cloudera, Informatica
Job Objectives
Manage Hadoop cluster in across all 4 environments (SIT, UAT, PROD and DR)
Key Responsibilities
Design, develop, and implement data processing pipelines to process large volumes of structured and unstructured data
Should have good knowledge and working experience in Database and Hadoop (Hive, Impala, Kudu).
Should have good knowledge and working experience in scripting using (Shell script, awk programming, quick automation to integrating any third party tools), BMC monitoring tools
Good understanding and knowledge in Data Modelling area using industry standard data model like (FSLDM)
Collaborate with data engineers, data scientists, and other stakeholders to understand requirements and translate them into technical specifications and solutions
It will be good to have experience in working with No SQL as well as virtualized Database Environment
Implement data transformations, aggregations, and computations using Spark RDDs, DataFrames, and Datasets, and integrate them with Elasticsearch
Develop and maintain scalable and fault-tolerant Spark applications, adhering to industry best practices and coding standards
Troubleshoot and resolve issues related to data processing, performance, and data quality in the Spark-Elasticsearch integration
Monitor and analyze job performance metrics, identify bottlenecks, and propose optimizations in both Spark and Elasticsearch components
Prior experience in developing banking application using ETL, Hadoop is mandatory. In depth knowledge of technology stack at global banks is mandatory.
Flexibility to stretch and take challenges
Communication & Interpersonal skills
Attitude to learn and execute
Key Requirements
· Strong experience in developing Hadoop/Spark.
· Strong experience in data lakes (integration of different data sources into the data lake)
· SQL Stored Procedures/Queries/Functions
· Unix Scripting
· Data Architecture (Hadoop)
· Solid understanding data modelling, indexing strategies, and query optimization
· Experience with distributed computing, parallel processing, and working with large datasets
· Familiarity with big data technologies such as Hadoop, Hive, and HDFS
· Job Scheduling in Control-M
· Strong problem-solving and analytical skills with the ability to debug and resolve complex issues
· Familiarity with version control systems (e.g., Git) and collaborative development workflows
· Excellent communication and teamwork skills with the ability to work effectively in cross-functional teams