The Data Engineer plays a crucial role in the creation and maintenance of our advanced data analytics vital to the hospital's data-driven decision-making processes. The role focuses on the acquisition, storage, retrieval, and optimization of data from various sources to support clinical, operational, and research initiatives. Role also supports the work of Group Chief Data Officer (GCDO) for Data Analytics.
Data Sourcing and Collection:
• Identifying and understanding source systems.
• Gathering requirements on frequency, volume, and types of data.
• Ensuring data privacy and compliance measures are in place.
Data Ingestion using Informatica IDMC and/or Spark using Python:
• Designing, implementing, and optimizing data ingestion pipelines. • Ensuring reliability and fault tolerance of ingestion pipelines.
• Handling data transformations, if necessary, during ingestion.
• Collaborate and manage vendors if required to for the data ingestion
Data Lake Management:
• Proper structuring of the data in the DataBricks Lakehouse to ensure it is usable and performant.
• Setting up data partitioning, indexing, and archival strategies.
• Monitoring data growth and storage consumption.
Data Quality and Validation:
• Implementing data quality checks and validation rules during ingestion. • Ensuring data consistency, accuracy, and integrity.
• Setting up alerts for any data anomalies or issues.
Integration with DataBricks Lakehouse:
• Making sure the ingested data is accessible, query-able, and optimized for performance in DataBricks.
• Collaborating with data scientists and analysts to ensure the data meets their requirements.
Performance Tuning and Optimization:
• Monitoring ingestion pipeline performance and making and coordinate necessary adjustments.
• Ensuring data loads meet SLAs and optimizing as needed
Security and Compliance:
• Setting up appropriate access controls and permissions. • Ensuring data encryption at rest and in transit.
• Regularly auditing and reviewing data access patterns.
Data Extraction Requests:
- Assist with data extraction requests
Documentation and Knowledge Transfer:
• Creating thorough documentation on data ingestion pipelines, schedules, transformations, etc.
• Conducting regular knowledge sharing sessions with other team members.
Collaboration:
• Working closely with other data engineers, data scientists, business analysts, and stakeholders.
• Communicating any changes, downtimes, or issues proactively.