Responsibilities:
• Collaborate with business partners to understand business needs and functional requirements, translating them into Hadoop-based solutions.
• Ensure seamless integration and management of large datasets, working with HDFS, Hive, HBase, Apache Kudu, and other Hadoop ecosystem tools to implement efficient data processing workflows.
• Utilize Hadoop job schedulers such as Oozie and Run Deck to streamline job scheduling, execution, and workflow management.
• Leverage cloud data services like Azure and GCP (BigQuery, BigTable, DataProc) to integrate and process large-scale data sets on cloud platforms.
• Expertly use Sqoop, Apache Nifi, and other data ingestion tools to import and export data from various SQL and NoSQL databases.
• Apply Big Data analytics techniques to derive actionable insights from vast and complex datasets, enabling data-driven decision-making.
• Mentor and guide junior developers and team members, providing technical expertise and support to ensure the timely and successful completion of tasks.
• Regularly communicate project status and provide solutions to meet evolving business requirements, ensuring the delivery of high-quality Big Data solutions.
Requirements:
• B.Tech. in Computer Science or Data Science
• 10+ years of hands-on experience working in the Hadoop ecosystem with expertise in HDFS, Hive, Sqoop, Oozie, Impala, Spark, Pyspark, HBase, and Apache Kudu.
• Strong hands-on experience with Azure Data Services (Blob Storage, HDInsight, Azure SQL, Delta Tables) and GCP services (BigQuery, BigTable, DataProc).
• Proficient with data processing tools such as Spark, Kafka, Pig, and Apache Nifi.
• Expertise in scheduling and managing jobs using Oozie, Run Deck, and Cron jobs.
• Strong experience with NoSQL and SQL databases like Hive, HBase, Apache Kudu, and MySQL.
• Strong programming skills in Python, Java, and shell scripting to automate processes and workflows.
• Experience with version control tools like Git, TFS, and SVN.
• Experience with Cloudera and HortonWorks Hadoop distributions.
• Familiarity with BI tools for data visualization and reporting (e.g., Tableau, Power BI).
• Knowledge of advanced frameworks like Apache Kafka, Apache Flume, and Apache Solr.
• Strong analytical and problem-solving skills with the ability to quickly adapt to new technologies and complex data challenges.
• Proven experience in leading teams and mentoring junior developers and engineers.