
This research internship at Tencent AI Lab in Bellevue, Washington, focuses on advancing multimodal large language models across speech, music, audio, vision, and language domains. The role involves collaborating with senior researchers to develop novel techniques for multimodal pretraining, post-training strategies, and efficient large-model architectures. Key responsibilities include designing end-to-end systems for fully duplex conversations, enhancing memory and reasoning capabilities, and working on encoding and tokenization methods for diverse media types. The position offers a unique opportunity to contribute to the lab's long-term ambition of achieving artificial general intelligence while publishing results at top-tier conferences. Candidates will work in a collaborative environment that values creativity and intellectual flexibility, with the potential to extend the three-month engagement.















