How does Extropic thermodynamic computer work?

Recently, my X.com feed was filled with posts about Extropic. They claim to have built a thermodynamic computer, orders of magnitude more efficient than current computers, with the potential to scale AI without running out of energy. I was curious, so I decided to dig a bit deeper.

Extropic makes TSU, a chip like a CPU or a GPU but designed to sample random data.

Traditional random sampling

With existing computers, if you want to sample some random variables, you generate a bunch of random bits (which is quite cheap), and then apply some expensive math functions so that your sample follows a distribution of your choice.

For example, if you want to sample some variables distributed according to this bell curve: Gaussian distribution

You can generate a lot of random bits, split them in two numbers: u1 and u2, and then compute your result z:

z = \sqrt{-2 \log(u_1)} \cdot \cos(2\pi u_2)

Which is fucking expensive.

Thermodynamic circuits

With TSU instead, you explicity tell your random bits to appear with the right distribution. Their value will update constantly and according to Extropic’s documentation, you can “measure them” millions to hundreds of millions times per second.

Physically, each bit is produced by a transistor (a little circuit with an input, an output and a gate) called a pbit. When the gate is powered OFF, input and output have the same tension (1), otherwise output has none (0).

Because of heat, the voltage is a bit noisy (electrons are bouncing around), so if you power the gate exactly at the threshold between ON and OFF, the output is randomly 0 or 1. If you slightly increase it, it is more likely to be 1, and if you slightly decrease it it is slightly more likely to be zero. Extropic was able to model this so they can provide you the exact voltage desired to simulate your desired probability.

Here is an illustration from their Inside X0 and XTR-0 article, of a pbit voltage according to Gate voltage:

In addition to the pbit (random bit), they provide a few other circuits providing other types of random data:

pdit: a random integer betwen 0 and k (with a per integer probability)
pmode: a random float (with a gaussian distribution)
pmog: also a random float (with a mixture of gaussians distribution)

To get z following a gaussian distribution as before, you would simply sample a pmode. They do not go into detail about its hardware implementation, but we can assume the thermal noise (see Johnson-Nyquist noise) comes from the transistor conduction path. You would read the output as an analog voltage and the gate would not be used for thresholding.

Composing thermodynamic “bits”

Ok, but what if you want to sample from a much more complex distribution?

Illustration of a complex distribution

Well there is an algorithm called Gibbs sampling that turns one hard, multi-variable sampling problem into many easy, single-variable steps. Each step updates one variable by looking at the current values of the variables it is connected to, setting its odds accordingly, taking a fresh sample, and then sharing that new value back to its neighbors.

On the TSU, this is implemented as a grid of cells each made of one sampler circuit (pbit, pdit, etc) and one conditioner circuit which computes the conditional probability of the variable given the values of the neighbors.

To update many cells safely in parallel, the wiring is arranged like a checkerboard (a two-coloring graph). All black cells update together, then all white cells. In general you can use more colors (any graph coloring works), but that just means more phases and fewer cells per phase (1/k for k colors).

In practice this should yield strong parallelism unless every variable depends on every other.

Practical use cases

Extropic details ideas on how we could use TSUs to perform generative AI tasks. I didn’t dig into this part but got interested in another type of problem that we could solve more efficiently than with the current technology.

It felt a bit like this sci-fi book “Theft of Fire” by Devon Eriksen where humans rely on old alien technology.

Optimization problem

Imagine a factory with two shifts: Day and Night. We need to assign n employees to the two shifts. Some employees don’t get along and create conflicts, so we want to avoid putting them on the same shift.

The associated decision problem (is there such a shift?) is NP-complete which means that even though there is no polynomial time (aka easy) algorithm to solve it, we can verify very efficiently if a given solution is valid. Any problem in NP can be reduced to this problem, so if we can solve this problem efficiently, we can solve any problem in NP efficiently.

TSU doesn’t decrease the complexity, but it allows to try shitloads (or as we say in french, “tétrachiées”) of potential solutions very efficiently.

Translating english to TSU language

The standard algorithm looks like this:

We randomly decide for each employee if they are on the Day or Night shift.
We then check for each employee the people they would clash with in Day or Night, and decide to update their shift or not with a biased random choice (if they mostly get along with Day, it is more likely to end up on Day).
After each round, we compute the total clash score and repeat while making the randomness “less random”

We repeat this process until we find a good enough solution.

This can be mapped to the TSU hardware using an Ising Energy Based Model. An implementation with their THRML library is available here. I also included a “naive” implementation for reference. As I don’t have access to an actual Extropic hardware, I could only test the THRML implementation with GPU accelerated performance.

I got 3.9 seconds for the THRML implementation and 0.055 seconds for the naive implementation.

Not surprisingly, it appears that even when accelerating with a GPU, running code designed for that other architecture is extraordinarily slower than the already slow version running naively. I will try to get access to the actual advice and update this post.

Conclusion

Apparently AI will need a lot of electricity. Sam Altman said it will need so much that there is no way to scale it without a breakthrough. This might be yet another strategy to appeal to investors but he personally invested $385 millions in nuclear fusion so it sounds like he is serious.

For now, inference accounts for 80 to 90% of ai consumption, and Extropic claims we could reformulate most of its generative process as a multi-step denoising sequence with ~10,000x energy efficiency gains.

This sounds plausible but far from certain. What is certain, however, is that even in the most optimistic scenario, we are still going to need Tétrachiées of Watt hours.

GPT is already better than most people in most domains, I use it all day, and yet it consumes less than 1% of the world’s electricity. AGI could consume nothing and still cause massive increases in electricity demand.