Job Scopes:
- Drive the implementation and refinement of distributed training strategies across multi-GPU and multi-node environments.
- Apply advanced optimization algorithms and their variants (e.g., SGD, Adam, Adagrad) to accelerate learning processes while maintaining model accuracy.
- Lead initiatives in model compression, pruning, and quantization techniques to reduce model footprint and enhance computational efficiency.
- Innovate in knowledge distillation methodologies to transfer learning from larger teacher models to more efficient student models.
- Optimize micro-tuning strategies, such as prompt-based tuning or parameter-efficient tuning methods, to minimize resource requirements.
- Explore and implement mixed precision training and hardware-specific optimizations (e.g., CUDA, Tensor Cores) to leverage hardware acceleration fully.
- Manage hyperparameter tuning processes using automated tools and algorithms to achieve optimal model configurations.
- Lead and collaborate with cross-functional teams to apply large model technology in practical business scenarios. Collaborate with researchers and engineers to integrate state-of-the-art research into production-ready systems.
- Design and optimize large language models, devise fine-tuning strategies, and streamline the training process.
- Explore deep learning architectures like Seq2Seq, Transformer, and advanced techniques including Fine-tuning, Prompt Engineering, and Soft Prompting (SFT).
- Develop systems for efficient model training and deployment, involving data preprocessing, parallel training, and resource management.
- Establish performance evaluation systems and monitor training metrics to ensure model quality and iteration efficiency.
Job Requirements:
- Bachelor degree or higher in Computer Science, Artificial Intelligence, Mathematics, or related fields.
- At least 5 years of experience in AI, with 3 years focused on large-scale language model development and optimization, preferably with successful projects.
- Proficient in deep learning theories, experienced with PyTorch, TensorFlow, and skilled in model fine-tuning and SFT.
- Strong algorithm design and optimization skills, familiar with large-scale data processing and high-performance computing.
- Demonstrated leadership, teamwork, communication, and project management skills, with the ability to track international research trends and innovate.