We caught up with Kevin Millikin, a software engineer on the DevTools team. He’s in Salt Lake City this week to present at PyCon US, the largest annual gathering for those using and developing the open-source Python programming language.
Beyond supporting the event as sponsors and regular workshop organisers, our research teams are presenting 29 papers, including 10 collaborations this year. Here’s a brief glimpse into our upcoming oral, spotlight, and poster presentations.
We ask the question: “What is the optimal model size and number of training tokens for a given compute budget?” To answer this question, we train models of various sizes and with various numbers of tokens, and estimate this trade-off empirically. Our main finding is that the current large language models are far too large for their compute budget and are not being trained on enough data.