
This full-time GPU Systems Engineer role focuses on designing and optimizing high-performance compute workloads using CUDA for AI training, inference, and scientific computing. The position involves developing custom CUDA kernels, tuning memory hierarchies for peak efficiency, and integrating optimized libraries into ML frameworks like PyTorch. The role offers the appeal of working directly with an established in-house team on long-term strategic projects without third-party intermediaries. It provides significant career growth potential within a forward-thinking culture that values engineering excellence and mentorship. The position is fully remote for candidates located in the continental United States.












