Job Objectives
Work in Data Lake integration.
Key Responsibilities
· Design, develop, and implement data processing pipelines to process large volumes of structured and unstructured data
· Should have good knowledge and working experience in Database and Hadoop (Hive, Impala, Kudu).
· Should have good knowledge and working experience in scripting using (Shell script, awk programming, quick automation to integrating any third party tools), BMC monitoring tools
· Good understanding and knowledge in Data Modelling area using industry standard data model like (FSLDM)
· Collaborate with data engineers, data scientists, and other stakeholders to understand requirements and translate them into technical specifications and solutions
· It will be good to have experience in working with No SQL as well as virtualized Database Environment
· Implement data transformations, aggregations, and computations using Spark RDDs, DataFrames, and Datasets, and integrate them with Elasticsearch
· Develop and maintain scalable and fault-tolerant Spark applications, adhering to industry best practices and coding standards
· Troubleshoot and resolve issues related to data processing, performance, and data quality in the Spark-Elasticsearch integration
· Monitor and analyze job performance metrics, identify bottlenecks, and propose optimizations in both Spark and Elasticsearch components
· Prior experience in developing banking application using ETL, Hadoop is mandatory. In depth knowledge of technology stack at global banks is mandatory.
· Flexibility to stretch and take challenges
· Communication & Interpersonal skills
· Attitude to learn and execute
Key Requirements
· 7+ years of AS400, RPG, CLP & SQL
· Knowledge of Aldon Change Management System(ACMS) is a must·
· Experience in developing Hadoop
· Experience in data lake (integration of different data sources into the data lake)
· SQL Stored Procedures/Queries/Functions
· Unix Scripting
· Experience with distributed computing, parallel processing, and working with large datasets
· Familiarity with big data technologies such as Hadoop, Hive, and HDFS
· Job Scheduling in Control-M
· Strong problem-solving and analytical skills with the ability to debug and resolve complex issues
· Familiarity with version control systems (e.g., Git) and collaborative development workflows
· Excellent communication and teamwork skills with the ability to work effectively in cross-functional teams