
location_onNYU Paulson Center, 181, Mercer Street, University Village, Manhattan, New York County, New York, 10012, United States
At Capital One, we are creating trustworthy and reliable AI systems, changing banking for good. For years, Capital One has been leading the industry in using machine learning to create real-time, intelligent, automated customer experiences. From informing customers about unusual charges to answering their questions in real time, our applications of AI & ML are bringing humanity and simplicity to banking.
The AI Foundations team sits at the center of bringing our vision for AI at Capital One to life. Our work touches every aspect of the research life cycle, from partnering with Academia to building production systems. We work with product, technology, and business leaders to apply the state of the art in AI to our business, building world-class applied science and engineering teams with scalable, high-performance AI infrastructure.
Work model: On-site
NYU Paulson Center, 181, Mercer Street, University Village, Manhattan, New York County, New York, 10012, United States
New York, New York
Skills: Machine Learning, Ai, Pytorch, Aws, Huggingface, Lightning, Vectordbs, Deep Learning, LLM, NLP.
Education: PhD in Electrical Engineering, Computer Engineering, Computer Science, AI, Mathematics, or related fields required; Master's in Electrical Engineering, Computer Engineering, Computer Science, AI, Mathematics, or related fields with 8 years experience.
PhD in Computer Science, Machine Learning, Computer Engineering, Applied Mathematics, or Electrical Engineering. Experience with Large Language Models (LLMs), including training models from scratch (10B+ parameters, 500B+ tokens), publications at ACL, NAACL, EMNLP, Neurips, ICML, or ICLR, working on open source or commercial LLMs, guiding large-scale model training teams, experience with 500+ node GPU clusters, and knowledge of training optimization frameworks like DeepSpeed or NeMo. Expertise in Behavioral Models (Geometric Deep Learning, Graph Neural Networks, Sequential Models, Multivariate Time Series), including technical leadership for large user behavior models, publications at KDD, ICML, NeurIPS, or ICLR, scaling graph models to 50m+ nodes, experience with recommender systems, production real-time/streaming environments, contributions to open source frameworks like PyTorch Geometric or DGL, and experience with datasets of 100m+ users. Specialization in Optimization (Model Sparsification, Quantization, Training Parallelism, Gradient Checkpointing, Model Compression) with 5+ years of experience or publications. Knowledge of Finetuning (Supervised, Instruction, Dialogue, Parameter Tuning), transfer learning, model adaptation, and deploying fine-tuned LLMs. Publications on tokenization, data quality, dataset curation, or labeling, leading contributions to large open source corpora (1 Trillion+ tokens), or core contributions to open source libraries for data quality and labeling.
Capital One • New York, New York