Key Responsibilities:
- Develop programs to extract data from data lakes and curated data layers to fulfill business objectives.
- Design and implement ETL pipelines using open-source technologies for efficient data ingestion.
- Collaborate with various teams to gain insights into application requirements and to design effective ETL solutions.
- Collect and analyze business and functional requirements, translating them into scalable and robust solutions that align with the overall data architecture.
- Produce detailed documentation and metadata for datasets to ensure usability and clarity.
- Participate in the end-to-end development lifecycle, including design, implementation, testing, documentation, delivery, support, and maintenance.
Requirements:
- Strong skills in Python and Spark for data processing and transformation.
- Experience with Linux utilities and SQL for data manipulation.
- Familiarity with PySpark for effective data transformation
- Knowledge of AWS services, including Redshift, Glue, CloudFormation, EC2, S3, and Lambda.
- Experience with various ETL tools and methodologies.
- Familiarity with AWS environments and services.