We are seeking a highly motivated Data Engineering Intern to gain hands-on experience in building data infrastructure. You'll develop your problem-solving skills by contributing to the design and implementation of data pipelines and storage solutions. This internship offers an opportunity to learn from experienced engineers within a fast-paced environment, while contributing to the agile development cycle.
ROLES AND RESPONSIBILITIES
- Assist Data Scientists, Data Engineers, and Analysts in building and managing data products, including the SISTIC Data Warehouse/Data Lake.
- Contribute to data pipeline development by learning from experienced engineers and assisting with tasks like design, testing, and documentation.
- Learn to identify and troubleshoot issues in existing data pipelines, gaining experience for future improvements.
- Gain exposure to building data pipelines by working on smaller components or contributing to specific stages.
- Support data warehouse and database maintenance under the guidance of senior engineers.
- Learn to monitor data pipelines and identify potential issues.
- Write clean and well-documented code under the supervision of experienced engineers.
- Gain experience with data ingestion techniques like scripting, web scraping, APIs, and SQL queries.
- Participate in code review sessions to learn from best practices and improve understanding.
SKILLS AND EXPERIENCE REQUIRED
- Strong foundation in computer science or a related field (e.g., coursework, projects)
- Core engineering concepts: Demonstrate a strong desire to learn about core data engineering concepts, such as data pipelines (ETL processes) and data storage solutions (data warehouses and data lakes). Familiarity with Google Cloud Platform would be a plus.
- Strong foundation in Python and SQL: This demonstrates the intern's ability to work with data manipulation tools commonly used in data engineering.
- Interest in learning Apache Spark (PySpark): This highlights the intern's desire to explore big data processing frameworks relevant to the field.
- Big data analytics: Demonstrate a basic understanding of the Hadoop ecosystem, including concepts like distributed storage and parallel data processing.
- Understanding of data warehousing and data lake concepts.