• Infrastructure Engineer - Post-training

    xAIPalo Alto, CA 94301

    Job #2813095760

  • About the Role

    The post-training team at xAI transforms powerful pre-trained models to become steerable, versatile, and capable of understanding and addressing real-world challenges.

    To accomplish this, we are looking for experienced AI infrastructure engineers to develop and optimize frameworks tailored for large-scale machine learning tasks, particularly in the areas of reinforcement learning and agent systems.

    The role involves building high-performance and scalable software to support cutting-edge AI research, employing advanced technologies to expand the limits of what AI can achieve with increased data and computational resources.

    Focus
    • Building efficient and user-friendly training and evaluation frameworks for model fine-tuning and reinforcement learning.
    • Building efficient and user-friendly software frameworks for large-scale agent simulation and execution.
    • Building flexible and performant bulking inference framework to enable synthetic data generation and model-based data improvement research.
    Ideal Experiences
    • Expert in developing software for large-scale distributed machine learning systems (e.g. language modeling training and reinforcement learning).
    • Expert in GPUs, Kubernetes, and JAX (or PyTorch).
    • Experienced in standard software engineering best practices (CI/CD) and care about code quality, testing, and performance.
    Location

    The role is based in the Bay Area [San Francisco and Palo Alto]. Candidates are expected to be located near the Bay Area or open to relocation.

    Tech Stack
    • Python
    • JAX
    • Rust
    • CUDA & NCCL
    Interview Process

    After submitting your application, the team will review your CV and statement of exceptional work. If your application passes this stage, you will be invited to a 15-minute interview ("phone interview") during which a member of our team will ask some basic questions. If you clear the initial phone interview, you will enter the main process, which consists of four technical interviews:

    1. 1 coding assessment in a programming language of your choice.
    2. 2 x post-training infra technical sessions: These sessions will be assessing your engineering skills to design and implement solutions to solve infra problems in post-training.
    3. Meet the Team: Present your past exceptional work and your vision with xAI to a small audience.

    Our goal is to finish the main process within one week. We don't rely on recruiters for assessments. Every application is reviewed by a member of our technical team. All interviews will be conducted via Google Meet.

    Annual Salary Range

    $180,000 - $440,000 USD