Role: Data Engineer
Job level: More than 10 years of relevant experience (L4)
Key skills:
AS400, Hadoop, Data Lake, SQL, Informatica
Key Responsibilities:
· Design, develop, and implement data processing pipelines to process large volumes of structured and unstructured data
· Should have good knowledge and working experience in Database and Hadoop (Hive, Impala, Kudu).
· Should have good knowledge and working experience in scripting using (Shell script, awk programming, quick automation to integrating any third-party tools), BMC monitoring tools
· Good understanding and knowledge in Data Modelling area using industry standard data model like (FSLDM)
· Collaborate with data engineers, data scientists, and other stakeholders to understand requirements and translate them into technical specifications and solutions
· It will be good to have experience in working with No SQL as well as virtualized Database Environment
· Implement data transformations, aggregations, and computations using Spark RDDs, Data Frames, and Datasets, and integrate them with Elasticsearch
· Develop and maintain scalable and fault-tolerant Spark applications, adhering to industry best practices and coding standards
· Troubleshoot and resolve issues related to data processing, performance, and data quality in the Spark-Elasticsearch integration
· Monitor and analyze job performance metrics, identify bottlenecks, and propose optimizations in both Spark and Elasticsearch components
· Prior experience in developing banking application using ETL, Hadoop is mandatory. In depth knowledge of technology stack at global banks is mandatory.
· Flexibility to stretch and take challenges
· Communication & Interpersonal skills
· Attitude to learn and execute
Key Requirements:
• 7+ years of AS400, RPG, CLP & SQL
• Knowledge of Aldon Change Management System(ACMS) is a must·
• Experience in developing Hadoop
• Experience in data lake (integration of different data sources into the data lake)
• SQL Stored Procedures/Queries/Functions
• Unix Scripting
• Experience with distributed computing, parallel processing, and working with large datasets
• Familiarity with big data technologies such as Hadoop, Hive, and HDFS
• Job Scheduling in Control-M
• Strong problem-solving and analytical skills with the ability to debug and resolve complex issues
• Familiarity with version control systems (e.g., Git) and collaborative development workflows
• Excellent communication and teamwork skills with the ability to work effectively in cross-functional teams.