§ Good understanding of concurrent software systems and building them in a way that is scalable, maintainable, and robust
§ Experience in designing application solutions in hadoop ecosystem
§ Deep understanding of the concepts in Hive, HDFS, yarn, Spark, Spark sql, Scala and Pyspark
§ HDFS file formats and their use cases (eg Parquet, ORC, Sequence etc)
§ Good knowledge in data warehousing system
§ Experienced in any scripting language (SHELL, PYTHON)
§ HortonWorks distribution and understanding on SQL engines (Tez, MR)
§ Java/RestServices/Maven experience is a value addition
§ Control-M development. Monitoring Resource utilization using Grafana Tool
§ Good skillset to create automation scripts in Jenkins & knowledge in working on automating builds, test frameworks, app configuration etc
§ Experience in implementing scalable applications with fully automated deployment and control using Bitbucket, Jenkins, ADO etc
§ Skillset:
o Mandatory: Big Data – Hive, HDFS, Spark, Scala, Pyspark.
o Good to have : Schedulers (Control-M), ETL Tool – Dataiku , Unix/Shell scripting , Knowledge of Integration services of FileIT/MQ and others , CI/CD tools – Jenkins, Jira, ADO Devops – tools suite , Oracle .