CUDA Kernel Optimizer - ML Engineer
Mercor
Full time
Software Development
Canada
Hiring from: Canada
- Role Overview
- Key Responsibilities
- Develop, tune, and benchmark CUDA kernels for tensor and operator workloads.
- Optimize for occupancy, memory coalescing, instruction-level parallelism, and warp scheduling.
- Profile and diagnose performance bottlenecks using Nsight Systems, Nsight Compute, and comparable tools.
- Report performance metrics, analyze speedups, and propose architectural improvements.
- Collaborate asynchronously with PyTorch Operator Specialists to integrate kernels into production frameworks.
- Produce well-documented, reproducible benchmarks and performance write-ups.
- Ideal Qualifications
- Deep expertise in CUDA programming, GPU architecture, and memory optimization.
- Proven ability to achieve quantifiable performance improvements across hardware generations.
- Proficiency with mixed precision, Tensor Core usage, and low-level numerical stability considerations.
- Familiarity with frameworks like PyTorch, TensorFlow, or Triton (not required but beneficial).
- Strong communication skills and independent problem-solving ability.
- Demonstrated open-source, research, or performance benchmarking contributions.
- More About the Opportunity
- Ideal for independent contractors who thrive in performance-critical, systems-level work.
- Engagements focus on measurable, high-impact kernel optimizations and scalability studies.
- Work is fully remote and asynchronous; deliverables are outcome-driven.
- Access to shared benchmarking infrastructure and reproducibility tooling via Mercor support resources.
- Compensation & Contract Terms
- Typical range: $120–$250/hour, depending on scope, specialization, and results achieved. Payments will be based on accepted task output over flat hourly.
- Structured as a contract-based engagement, not an employment relationship.
- Compensation tied to measurable deliverables or agreed milestones.
- Confidentiality, IP, and NDA terms as defined per engagement.
- Application Process
- Submit a brief overview of prior CUDA optimization experience, profiling results, or performance reports.
- Include links to relevant GitHub repos, papers, or benchmarks if available.
- Indicate your hourly rate, time availability, and preferred engagement length.
- Selected experts may complete a small, paid pilot kernel optimization project
- About Mercor
- Mercor connects domain experts with top AI research and technology organizations through project-based contracts.
- Contractors operate independently, with full flexibility over methods, timelines, and tools.
- Our mission is to help top engineers and researchers access frontier technical work without rigid employment structures.
How to apply
To apply for this job you need to authorize on our website. If you don't have an account yet, please register.
Post a resumeSimilar jobs
Exciting Career Opportunity - Join Our Team! We are looking for dedicated and motivated individuals to join our team and contribute to our continued success. This position offers a chance to grow professionally in a supportive and inclusive environment, with...
Software Development
United States
Hiring from: United States
Strada is a technology-enabled, people powered company committed to delivering world-class payroll, human capital management, and financial management solutions to organizations globally. With a team of more than 8,000 experts and over 30 years of expertise, Strada blends leading-edge technology...
Software Development
Canada
Hiring from: Canada
Role Overview Mercor is partnering with a leading AI lab to source experienced photographers for a short-term creative project. This is a unique opportunity for visual creatives to apply their expertise in a highly innovative, research-driven context. Candidates with recent...
Software Development
United States
Hiring from: United States