- Work with stakeholders to understand needs for data structure, availability, scalability, and accessibility.
- Develop tools to improve data flows between internal/external systems and the data lake/warehouse.
- Build robust and reproducible data ingest pipelines to collect, clean, harmonize, merge, and consolidate data sources.
- Understanding existing data applications and infrastructure architecture
- Build and support new data feeds for various Data Management layers and Data Lakes
- Evaluate business needs and requirements
- Support migration of existing data transformation jobs in Oracle, and MS-SQL to Snowflake.
- Lead the migration of the existing data transformation jobs in Oracle, Hive, Impala etc. into Spark, Python on Glue etc.
- Able to document the processes and steps.
- Develop and maintain datasets.
- Improve data quality and efficiency.
- Lead Business requirements and deliver accordingly.
- Collaborate with Data Scientists, Architect and Team on several Data Analytics projects.
- Collaborate with DevOps Engineer to improve system deployment and monitoring process.
- Bachelor qualification in a computer science or STEM (science, technology, engineering, or mathematics) related field.
- At least 8+ years of strong data warehousing experience using RDBMS and Non-RDBMS databases.
- At least 5 years of recent hands-on professional experience (actively coding) working as a data engineer (back-end software engineer considered).
- Professional experience working in an agile, dynamic and customer facing environment is required.
- Understanding of distributed systems and cloud technologies (AWS) is highly preferred.
- Understanding of data streaming and scalable data processing is preferred to have.
- Experience with large scale datasets, data lake and data warehouse technologies such as AWS Redshift, Google BigQuery, Snowflake. Snowflake is highly preferred.
- Atleast 2+ years of experience in ETL (AWS Glue), Amazon S3, Amazon RDS, Amazon Kinesis, Amazon Lambda, Apache Airflows, Amazon Step Functions.
- Strong knowledge in scripting languages like Python, UNIX shell and Spark is required.
- Understanding of RDBMS, Data ingestions, Data flows, Data Integrations etc.
- Understanding of TCP/IP network protocols including TLS/SSL, HTTP, etc.
- Understanding of AWS VPC networking and IAM access control policy.
- Technical expertise with data models, data mining and segmentation techniques.
- Experience with full SDLC lifecycle and Lean or Agile development methodologies.
- Knowledge of CI/CD and GIT Deployments.
- Ability to work in team in diverse/ multiple stakeholder environment.
- Ability to communicate complex technology solutions to diverse teams namely, technical, business and management teams.
- Ability to work in a collaborative environment and coach other team members on coding practices, design principles, and implementation patterns that lead to high-quality maintainable solutions.
- Ability to work in a dynamic, agile environment within a geographically distributed team.
- Ability to focus on promptly addressing customer needs.
- Ability to work within a diverse and inclusive team.
- Technically curious, self-motivated, versatile and solution oriented.