Disabled Vets
close

Nvidia Corporation

Apply for this job

Senior Engineering Manager - AI Research Clusters (Finance)



NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology-and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent. As an NVIDIAN, you'll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world. NVIDIA is at the forefront of the AI revolution, transforming industries with our brand new GPU technology. Our GPUs drive unparalleled innovations, from self-driving cars to powerful research in computer vision and speech recognition. As "the AI computing company," we are pushing the boundaries of what is possible in AI, big data, and deep learning.

We are looking for a dedicated Senior Engineering Leader to join our AI Research Clusters team in Santa Clara, CA. This role offers an outstanding opportunity to lead ambitious projects that demonstrate our world-class GPU technology to drive groundbreaking AI research and development. You will be responsible for ensuring that our AI infrastructure operates flawlessly by delivering scalable, efficient, and observable systems. You will lead efforts to build telemetry, policy enforcement, and automation tools, ensuring our infrastructure is resilient and easy to manage at scale.

What you'll be doing:

  • Lead the design and deployment of scalable storage systems optimized for AI workloads and high-performance compute clusters.
  • Drive readiness and operational enablement for upcoming hardware platforms, ensuring seamless integration and performance.
  • Coordinate the development of internal tools to enhance storage provisioning, usage traceability, and user self-service.
  • Guide the evaluation and implementation of new technologies to improve efficiency, reliability, and observability.
  • Collaborate with cross-functional teams to align storage architecture with GPU cluster requirements and evolving research needs.
  • Improve storage monitoring and metrics infrastructure to surface key insights and enable proactive management.
  • Find opportunities to modernize existing storage systems for improved quota management, compression, and automation.

What we need to see:

  • BS or equivalent experience.
  • 12+ overall years of relevant technical experience.
  • 5+ years of leadership experience.
  • Proven ability to lead engineering teams building infrastructure at scale, especially in environments combining storage and high-performance computing.
  • Deep technical knowledge in distributed storage systems, with experience improving data access patterns and platform observability.
  • Familiarity with infrastructure deployment lifecycle - from planning and vendor engagement to rollout and operational readiness.
  • Strong understanding of aligning storage performance with compute needs, and measuring system behavior based on real-world metrics.
  • Ability to guide teams through technology evaluations, balancing technical rigor with speed and pragmatism.

Ways to stand out from the crowd:

  • Experience with large-scale storage and networking systems in performance-sensitive environments such as HPC, AI, or scientific computing.
  • Success in building tools or automation for self-service, visibility, and governance in complex infrastructure environments.
  • Background in data observability and metrics correlation for infrastructure performance, cost efficiency, or capacity forecasting.
  • Leading teams through cross-functional technical evaluations or RFPs, turning those into successful infrastructure deployments.
  • Contributions to storage architecture improvements, including filesystem tuning, resource quota management, or data compression strategies.

NVIDIA provides competitive pay and benefits. Our skilled engineers contribute to our rapid team expansion.

The base salary range is 272,000 USD - 425,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits . NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. Apply

Apply Here done

© 2025 Disabled Vets