As a Database Engineer, you will play a crucial role in the design, development, and maintenance of our databases including HPC job, system monitoring, benchmarking, and scientific databases. You will collaborate with cross-functional teams of HPC domain experts, software engineers, and scientists to ensure the efficient storage, retrieval, and analysis of data. This position offers a unique opportunity to contribute to ground-breaking research for our stakeholders.
Key Responsibilities:
- Database Design and Development: Design and implement robust, scalable, and efficient databases and optimize database performance for data storage, retrieval, and analysis.
- Data Integration: Collaborate with stakeholders and NSCC consultants and technical team members to integrate diverse datasets into the database and Implement and maintain data pipelines for seamless integration of new data sources.
- Quality Assurance: Develop and implement data quality control measures to ensure accuracy and consistency as well as monitor and troubleshoot data integrity issues, implementing corrective actions as needed.
- Collaboration: Work closely with cross-functional teams, HPC domain experts, software engineers, and scientists, to understand data requirements and user needs and provide technical expertise and support to ensure effective collaboration.
- Documentation: Maintain detailed documentation of database architecture, data models, and processes and contribute to the development of best practices for data management and storage.
- Reporting: Communicate findings through presentations, and reports for NSCC as well as stakeholders/collaborators.
Qualifications:
- Bachelor"s or Master"s degree in Computer Science, or a related field.
- Proven experience in database design, development, and optimization. Experience with bioinformatics or life sciences is a plus.
- Proficiency in relevant programming languages (e.g., SQL, C/C++, Python) and experience with database management systems (e.g., MySQL, PostgreSQL).
- Familiarity with NoSQL and distributed databases (MongoDB, ElasticSearch, ETCD).
- Experience with server-based and high-performance computing.
- Familiarity with bioinformatics tools and resources, and an understanding of biological data types and formats.
- Familiarity with Linux command line, system administration, scripting, and software tools.
- Familiarity with software version control process and tools (e.g. git, github, gitlab).
- Strong problem-solving skills and the ability to work collaboratively in a dynamic team environment.