職缺描述
Job Description: We are seeking a skilled and motivated AI/HPC cluster engineer to design, develop, and maintain a virtualized HPC environment and then deploy to the lab environment and the factory. This role is crucial for enabling our team to develop and test AI/HPC tools in a virtualized environment first, and after verification the complete process, to deploy it on the physical infrastructure. [Key Responsibilities] Virtual Environment: - Build and manage a virtual AI/HPC cluster for development and testing. - Configure and optimize networking between virtual nodes to emulate real-world HPC environments. - Collaborate with teams to ensure the virtual cluster meets testing and performance requirements. - Troubleshoot and resolve networking and system-level issues within the virtual environment. - Document system configurations, workflows, and best practices for maintaining the virtual cluster. Physical Envrionment: - realize and impelemnt the validation test cases into the cluster DUTs.
收合內容