· Write Spark code to Extract, Transform and Load data from heterogenous source (like CSV, JSON files, RDBMS and streaming viz Kafka) into Hive tables in different layers
· Writing complex queries using Joins, Sub Queries, Temporary Views and CTE, Map, Group Map in Hive
· Perform necessary aggregations, ranking data using window functions, removing duplicates from source data with spark over the extracted data which will be required for generating banking reports
· Perform various Join operations in spark to combine results from two or more source files and build a data warehouse
· Data pivoting, unpivoting, exception handling and logging
· Data Validation for quality checks using Python and Spark
· Read parameters and variables into spark code from configuration files specific to an environment
· Performance tuning in hive with partitioning and bucketing
· Configuring run parameters for spark job like memory configurations and available cluster resourcesü Data Migration and reconciliation between various banking core
ü Designing robust framework which eases development on product enhancement for each sprint
ü Performance tuning and monitoring performance of individual jobs in timely manner