Job Description
The senior data engineer is a senior software developer with strong software engineering skills who is responsible for building custom open-source-based data ingestion and MLOps platforms. He/she has deep appreciation of the complexity of the data engineering process, such as the challenges of data ingestion involving large or near-real-time datasets, the maintenance of high data quality, and the importance of automation for increasing pipeline robustness and reducing the need for human intervention.
Key Responsibilities include:
- Be an effective distributed-system implementer in the following core activities:
- Design and develop data engineering services and their ecosystem using distributed databases (relational, columnar, graph, in-memory); orchestration (Apache Airflow); and distributed stream/batch data processing (Kafka, Kinesis, Spark).
- Design and develop MLOps production pipelines; provide technical support to data scientists/ML engineers by getting their ML/DL models deployed at scale and meeting SLAs on both cloud and on-premises GPU and CPU instances.
- Design data models for mission-critical, high-volume, near-real-time/batch data; build idempotent/atomic production data pipelines to make data ingestion more fault tolerant.
- Design and develop intuitive, highly automated, self-service data platform functions for business users.
- Explore, evaluate and champion the introduction of next-generation technologies in the data-ingestion workflow. Participate in project planning and provide technical guidance on cloud architecture for data projects.
Requirements
- BS in Computer Science or other related discipline is required. Advanced degrees in Computer Science (PhD, MS) are highly desirable.
- 3+ years of relevant industry experience in some or most of the following technical areas:
- Advanced programming skills in Python. Conversant with data structures and algorithm design.
- Experience in building data pipelines (including data collection, warehousing, processing, analysis, monitoring, and governance) using open-source data ingestion platforms.
- Intermediate-level knowledge and experience with AWS cloud components and best practices. Good understanding in deploying data stores such as S3, RedShift, Elasticache, PostgreSQL, and ClickHouse.
- Prior experience in modern software development is required (such as web frontend UI, backend API microservices, understanding of CI/CD and Scrum/Kanban agile development). Strong grasp on object-oriented or functional programming (using e.g. Python, Java, Scala, or C#).