Role:
· Provide technical vision and create roadmaps to align with the long-term technology strategy
· Proficient in building data ingestion pipelines to ingest data from heterogeneous sources like RDBMS, Hadoop, Flat Files, REST APIs, AWS S3
· Key player in Hadoop Data Ingestion team which enables data science community to develop analytical/predictive models and implementing ingestion pipelines using Hadoop Eco-System Tools Flume, Sqoop, Hive, HDFS, Pyspark , Trino and Presto sql.
· Work on governance aspects for the Data Analytics applications such as documentation, design reviews, metadata etc
· Extensively use DataStage, Teradata\Oracle utility scripts and Data stage jobs to perform data transformation/Loading across multiple FSLDM layers
· Review and help streamline the design for big data applications and ensure that the right tools are used for the relevant use cases
· Engage users to achieve concurrence on technology provided solution. Conduct review of solution documents along with Functional Business Analyst and Business Unit for sign-off
· Create technical documents (functional/non-functional specification, design specification, training manual) for the solutions. Review interface design specifications created by development team
· Participate in selection of product/tools via RFP/POC.
· Provide inputs to help with the detailed estimation of projects and change requests
· Execute continuous service improvement and process improvement plans
Responsibilities
· 3 - 7 years of experience with Data Engineering experience in the banking domain including implementation of Data Lake, Data Warehouse, Data Marts, Lake Houses etc
· Experience in data modeling for large scale data warehouses, business marts on Hadoop based databases, Teradata, Oracle etc for a bank
· Expertise in Big Data Ecosystem such as Cloudera (Hive, Impala, Hbase, Ozone, Iceberg), Spark, Presto, Kafka
· Experience in a Metadata tool such as IDMC, Axon, Watson Knowledge Catalog, Collibra etc
· Expertise in designing frameworks using Java, Scala, Python and creation of applications, utilities using these tools
· Expertise in operationalizing machine learning models including optimizing feature pipelines and deployment using batch/API, model monitoring, implementation of feedback loops
· Knowledge of report/dashboards using a reporting tool such as Qliksense, PowerBI
· Expertise in integrating applications with Devops tools
· Knowledge of building applications on MPP appliances such as Teradata, Greenplum, Netezza is a mandatory
· Domain knowledge of the banking industry include subject areas such as Customer, Products, CASA, Cards, Loans, Trade, Treasury, General Ledger, Origination, Channels, Limits, Collaterals, Campaigns etc