ROLES AND RESPONSIBILITIES
We are seeking for a Principal Infrastructure Architect to design our high-speed network architecture within our Hypercubes. The Principal Infrastructure Architect will also be required to work closely with our R&D teams, working together on the next generation of network solutions to support the growth of our AI infrastructure. The Principal Infrastructure Architect will also be responsible for:
Design of Supporting Facility Networks
- Structured Cabling: Develop and implement structured cabling solutions, both optical and copper, optimized for high-density and immersion cooling environments.
- Supporting Networks: Design and deploy IP networks to support BMS, SCADA, CCTV, EAC systems, and corporate networks specifically tailored for hypercube deployments.
- Configuration and Testing: Define detailed network configuration specifications, undertake quality assurance assessments, and validate performance to meet reliability and security standards.
Design of Compute Network Fabrics
- Network Architecture: Design and implement network configurations that support high-speed, low-latency Ethernet (200GbE, 400GbE, 800GbE) and InfiniBand (NDR & XDR) networks for HPC applications.
- Optical Network Design: Develop link budgeting, specify appropriate cabling with considerations for immersion environments, hardware component selection like transceivers, switches, and equipment that meet performance criteria.
- Deployment Layouts and Configuration Planning: Develop comprehensive deployment layouts, including rack elevations, cable management plans, patching schedules, and detailed configuration scripts to optimize network performance and scalability.
Supporting Infrastructure Planning
- GPU Compute: Plan and oversee the deployment of GPU nodes, ensuring optimal performance within immersion cooling systems and seamless integration with network fabrics.
- CPU / Supervisory Nodes: Design layouts and connectivity of supervisory control nodes and CPU clusters, focusing on redundancy, failover capabilities, and integration with monitoring systems.
- Storage Platforms: Architect high-throughput storage solutions, such as NVMe storage, parallel file systems and distributed storage technologies, to meet the demands of HPC workloads.
- Wide Area Networks: Design and implement external access networks, including next-generation firewalls, high-performance routers, and Wavelength Division Multiplexing networks for high performance interconnects to internet peering and data centre cross connects.
Project Development
- Material Schedules: Create detailed Bill of Materials for all networking and infrastructure components, considering current and future scalability requirements.
- Project Costing: Prepare project cost estimates, engage with finance teams, and ensure cost-effectiveness without compromising on quality or performance.
- Vendor Evaluation and Selection: Assess vendors and select suppliers based on technical requirements specifications, cost and reliability.
- Construction Detailing: Collaborate with engineering and construction teams to translate network designs into practical implementations, ensuring alignment with physical infrastructure constraints and immersion cooling requirements.
- Testing & Handover: Develop and implement deployment strategies, including staging, installation procedures, and testing methodologies to validate system performance and reliability.
R&D Collaboration
- Product Development Enhancement: Engage in R&D projects to advance product offerings, focusing on efficiency technologies, networking, HPC systems, and immersion cooling solutions.
- Technology Advancement: Stay ahead of industry trends, contribute to innovation by experimenting with new technologies (e.g., silicon photonics, next generation network protocols), and integrate viable solutions into immersion architectures.
SKILLS AND EXPERIENCE
- Minimum of 10 years of hands-on experience, designing and deploying large-scale telecommunications and data centre networks, with a significant focus on HPC environments.
- Expertise in optical networking technologies, such as WDM, optical filters, optical amplifiers, multiplexers, and demultiplexers.
- Knowledge of design implications and challenges in operating compute equipment within immersion cooling environments, including material compatibility and thermal dynamics.
- Familiarity with high-performance networking hardware from vendors like NVIDIA Mellanox, Arista, Cisco, Juniper, and their integration within HPC clusters.
- Understanding of high-speed storage technologies, including SSD arrays, NVMe over Fabrics, and tiered storage architectures.
- Proven ability to lead and collaborate with cross-functional teams, including engineers, technicians, and project managers.
- Exceptional ability to communicate complex technical concepts clearly and effectively to diverse audiences.
- Successful design and implementation of at least two large-scale HPC network deployments.
- Hands-on experience with networking in immersion cooling environments, addressing unique challenges related to such systems.
- Track record of developing innovative solutions that improve performance, reduce costs, or enhance scalability.