An empirical analysis of compute-optimal large language model training

An empirical analysis of compute-optimal large language model training

We ask the question: “What is the optimal model size and number of training tokens for a given compute budget?” To answer this question, we train models of various sizes and with various numbers of tokens, and estimate this trade-off empirically. Our main finding is that the current large language models are far too large for their compute budget and are not being trained on enough data.
Comments are closed.