- Perform data extraction, cleaning, transformation, and flow.
- Design, build, launch and maintain efficient and reliable large-scale batch and real time data pipelines with data processing frameworks.
- Integrate and collate data silos in a manner which is both scalable and compliant.
- Collaborate with Project Manager, Frontend Developers, UX Designers and Data
- Analyst to build scalable data-driven products.
- Be responsible for developing backend APIs & working on databases to support the applications.
- Bridge the gap between engineering and analytics.
- Work in an Agile Environment that practices Continuous Integration and Delivery
- Work closely with fellow developers through pair programming and code review process
Requirements:
- Experience with building production-grade data pipelines, ETL/ELT data integration with proper documentation
- Proficient in general data cleaning and transformation (e.g. pandas, pysparks, SQL)
- Knowledge in database design and various databases (e.g. structured dbase(postgres, mysql etc and unstructured -mongodb etc)
- Knowledge about system design, data structure and algorithms
- Knowledge with rest api and web requests/protocols in general
- Comfortable coding in at least 2 scripting language (eg. Python/SQL)
- Comfortable working in both windows and linux development environments
- Interest in data engineering in a big data environment using Cloud
- platforms (i.e. AWS, Azure, Google Cloud)
- Familiar with data modelling, data access, and data storage infrastructure like Data
- Mart, Data Lake, and Data Warehouse.
- Proficient in creating comprehensive data dictionaries to ensure clear and consistent data definitions.
- Skilled in designing Entity-Relationship Diagrams (ERDs) to visually represent data models and relationships.
- Capable of collaborating with stakeholders to understand business processes and develop effective data solutions.
- Familiar with BI Tools such as Tableau
- Familiar with Docker containers