Page 16 of 16
1 14 15 16

Dynamic language understanding: adaptation to new knowledge in parametric and semi-parametric models

To study how semi-parametric QA models and their underlying parametric language models (LMs) adapt to evolving knowledge, we construct a new large-scale dataset, StreamingQA, with human written and generated questions asked on a given date, to be answered from 14 years of time-stamped news articles. We evaluate our models quarterly as they read new articles not seen in pre-training. We show that parametric models can be updated without full retraining, while avoiding catastrophic forgetting.

Kyrgyzstan to King’s Cross: the star baker cooking up code

My day can vary, it really depends on which phase of the project I'm on. Let’s say we want to add a feature to our product – my tasks could range from designing solutions and working with the team to find the best one, to deploying new features into production and doing maintenance. Along the way, I’ll communicate changes to our stakeholders, write docs, code and test solutions, build analytics dashboards, clean-up old code, and fix bugs.

Building a culture of pioneering responsibly

When I joined DeepMind as COO, I did so in large part because I could tell that the founders and team had the same focus on positive social impact. In fact, at DeepMind, we now champion a term that perfectly captures my own values and hopes for integrating technology into people’s daily lives: pioneering responsibly. I believe pioneering responsibly should be a priority for anyone working in tech. But I also recognise that it’s especially important when it comes to powerful, widespread technologies like artificial intelligence. AI is arguably the most impactful technology being developed today. It has the potential to benefit humanity in innumerable ways – from combating climate change to preventing and treating disease. But it’s essential that we account for both its positive and negative downstream impacts.

Open-sourcing MuJoCo

In October 2021, we announced that we acquired the MuJoCo physics simulator, and made it freely available for everyone to support research everywhere. We also committed to developing and maintaining MuJoCo as a free, open-source, community-driven project with best-in-class capabilities. Today, we’re thrilled to report that open sourcing is complete and the entire codebase is on GitHub! Here, we explain why MuJoCo is a great platform for open-source collaboration and share a preview of our roadmap going forward.

From LEGO competitions to DeepMind’s robotics lab

If you want to be at DeepMind, go for it. Apply, interview, and just try. You might not get it the first time but that doesn’t mean you can’t try again. I never thought DeepMind would accept me, and when they did, I thought it was a mistake. Everyone doubts themselves – I’ve never felt like the smartest person in the room. I’ve often felt the opposite. But I’ve learned that, despite those feelings, I do belong and I do deserve to work at a place like this. And that journey, for me, started with just trying.

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

In our recent paper, we explore how populations of deep reinforcement learning (deep RL) agents can learn microeconomic behaviours, such as production, consumption, and trading of goods. We find that artificial agents learn to make economically rational decisions about production, consumption, and prices, and react appropriately to supply and demand changes.

A Generalist Agent

Inspired by progress in large-scale language modelling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens.

Active offline policy selection

To make RL more applicable to real-world applications like robotics, we propose using an intelligent evaluation procedure to select the policy for deployment, called active offline policy selection (A-OPS). In A-OPS, we make use of the prerecorded dataset and allow limited interactions with the real environment to boost the selection quality.

An empirical analysis of compute-optimal large language model training

We ask the question: “What is the optimal model size and number of training tokens for a given compute budget?” To answer this question, we train models of various sizes and with various numbers of tokens, and estimate this trade-off empirically. Our main finding is that the current large language models are far too large for their compute budget and are not being trained on enough data.
Page 16 of 16
1 14 15 16