My infant year as an AI researcher — Moving from physics to AI

Shortly after I left Berkeley postdoc and joined Anthropic, I was planning to write a short article, mostly as a note for myself, about my thought process behind leaving physics and join AI research.

Yet, I have never got time to write those down due to the intense work at Anthropic :) Until last Friday(Sept.19), I resigned from Anthropic and got a week’s break before I joined Google DeepMind.

Why did I leave physics, and why did I choose AI

Mostly because I want to find a direction that have more chances for young people. Theoretical physics is an amazing field for training: it is intellectual challenging, deep and require technics from wide variety of fields including math, computer science(eg.complexity theory) and of course, physics itself. Yet, this field has running out of experiments for many years. A field without experiments can be problematic in many different ways, for example, it will be hard to judge objectively the importance of a theoretical work. It will also be hard to unblock disagreements/confusions just by systematical experiments. 

Then it mainly comes down to AI or QC(quantum computing). Although I believe QC will become important in the future, my impression is the bottleneck now is mainly experimental platforms. Thus I choose AI, which is interestingly similar to physics research as follows:

How does working on AI feel as a physicist?

In some sense, it is similar to research on thermodynamics during the 17th century. Back then, people didn’t even know what was heat: in fact people still believed in Phlogiston theory. But this does not stop people from experimenting scientifically. For example, Boyle's law tells the relationship between pressure and volume when temperature is fixed. Thus by designing experiments systematically, people still learnt enough ‘laws’, which guided the invention/study of heat engine that changed the word.

From my naive point of view, it is similar in large scale AI models. On one hand, we still don’t have reliable theory or models describing the behavior of large neural networks. On the other hand, systematical research start to tell us lots of valuable lessons, eg scaling law. (And having those systematical research is becoming an essential element for making constant progress at large scale.)

Why Anthropic, and why leaving?

Even though I left anthropic, I still view ant as (one of) the best place for physicists(maybe also other STEM background PhD) to start their journey in AI research. I joined anthropic on Oct.1st 2024, when we start to do research for the later called Claude 3.7 sonnet. After being a physicist for many years, it was so exciting to see your research getting impact on the frontier model capability immediately, and witnessing people’s way of interacting with AI changes as new capabilities emerge.

Yet, I decided to leave due to two main reasons:

1. ~40% of the reason: I strongly disagree with the anti-china statements Anthropic has made. Especially from the recent public announcement, where China has been called “adversarial nation”. Although to be clear, I believe most of the people at anthropic will disagree with such a statement, yet, I don’t think there is a way for me to stay.

2. The remaining 60% is more complicated. Most of them contains internal anthropic informations thus I can’t tell.

Time to move on!

Relative to physics, AI moves insanely fast and looking back I am surprised by how much has happened in the past one year. It was a great honor to see Claude getting better from 3.7 to 4.5, and I personally learnt a lot. Yet it is time to move on.

From a personal perspective, Anthropic was my first, and the only, AI job, thus I don’t want my experience/knowledge being biased by a specific lab.(Especially because nowadays core-research do not write paper anymore.)

So Ant, it was good with you, but it is better without you :)

I joined Google DeepMind on Sept.29th.