What is the amount of knowledge an LLM can have?

A quantitative perspective on cognitive limits

In a previous article, we established a simple but fundamental point: both humans and artificial intelligences are limited cognitive systems. Knowledge is not infinite. Every system, biological or artificial, operates within boundaries.

The natural next question is therefore not whether there are limits, but how large this capacity may be.

The apparent vastness of LLM knowledge

Large language models today demonstrate an extraordinary breadth of knowledge. They operate across scientific, technical, cultural and linguistic domains, often with a level of detail that can exceed that of individual specialists.

This is possible because they are trained on immense amounts of data, far beyond what a human could read in a lifetime.

However, this leads to a common misconception: LLMs do not store knowledge as a database. Instead, knowledge is encoded in numerical structures, primarily the matrices of a transformer model.

Where knowledge resides

Modern LLMs are based on the transformer architecture. Their core operations rely on matrices, most notably the Query, Key and Value projections.

For a GPT-3.5 class model (later versions have much bigger, but undisclosed sizes):

  • hidden size: 12288
  • number of heads: 96
  • head dimension: 128

Source: https://openai.com/index/language-models-are-few-shot-learners/

Being 96 ร— 128 = 12288, each of the Query, Key and Value matrices has size:

12288 ร— 12288

which corresponds to:

150,994,944 cells

For this analysis, we assume 16-bit precision, that is, 2 bytes per cell.

From stored values to possible knowledge

Each cell can take:

216

possible values.

Therefore, a matrix with N cells can represent:

(216)N

different configurations.

A reference scale: the observable universe

To give this a concrete meaning, we use a known physical estimate:

The number of atoms in the observable universe is approximately:

1080

Minimal matrix to match the universe

We now ask: how large must a matrix be to reach at least this number of possible configurations?

(216)N > 1080

Rewriting:

216N > 1080

Using:

210 > 103

we obtain:

216N = (210)1.6N > 104.8N

So it is sufficient that:

4.8N > 80

Therefore:

N > 80 / 4.8

That gives:

N > 16.6...

So approximately:

N โ‰ˆ 17

This means:

  • 4 ร— 4 = 16, not enough
  • 5 ร— 5 = 25, enough

First key result

A 5 ร— 5 matrix (actually a 4 x 4 plus one cell) with 16-bit values already exceeds the number of atoms in the observable universe in terms of possible configurations.

Real transformer scale

Now compare this minimal threshold with an actual transformer matrix:

  • minimal matrix: 25 cells
  • real matrix: 150,994,944 cells

The difference is:

150,994,944 - 25 = 150,994,919

Even a single additional bit doubles the number of possible states. Each additional cell multiplies the number of possible configurations by:

216

So the total growth factor is:

(216)150,994,919 = 22,415,918,704

To express this in base 10, we use:

log10(22,415,918,704) = 2,415,918,704 · log10(2) โ‰ˆ 2,415,918,704 · 0.30103

≈ 727,298,000

which gives approximately:

22.416 ยท 109 โ‰ˆ 10727,000,000

This is not a linear difference. Once millions of cells are added, the resulting space grows beyond any physical scale we can intuitively picture.

A physical interpretation

To visualize this number, consider the following process:

  • the observable universe contains about 1080 atoms
  • replace each atom with the full universe
  • count the atoms again โ†’ 1080 ยท 1080 = 10160
  • repeat the same process

Each repetition multiplies the number of atoms by 1080, so after k steps:

1080ยทk

We now compare this with the matrix growth:

1080ยทk โ‰ˆ 10727,000,000

which gives:

k โ‰ˆ 727,000,000 / 80 โ‰ˆ 9,100,000

Second key result

The growth from a 5ร—5 matrix to a real 12288 ร— 12288 transformer matrix corresponds, in this thought experiment, to about 9 million recursive substitutions of the universe, where in each step every atom is replaced by a full copy of the observable universe at each iteration.

What this means

The implications are subtle but important.

  • Even a 5ร—5 matrix already exceeds any physical scale we can intuitively grasp
  • A real transformer matrix extends far beyond that first threshold
  • Yet it still represents only one actual configuration within this vast space

So the correct interpretation is not that such models contain all knowledge, but that the space of possible knowledge representations is enormous, far beyond any physical analogy we can easily handle.

The model itself is only one structured solution found through training.

A parallel with human cognition

Humans operate under similar constraints:

  • limited memory
  • bounded attention
  • finite lifetime

To function, we continuously apply:

  • selection
  • compression
  • forgetting, a form of pruning

This is not a weakness, but a necessity for maintaining consistency and coherence.

At the same time, memorizing more is not inherently a waste of space. The human brain likely also operates in a very large representational space.

But again, it is still limited.

Conclusion

The capacity of large language models is immense, but not infinite.

Even a 5 ร— 5 matrix already exceeds the number of atoms in the observable universe in terms of possible states.

A real transformer matrix in a GPT-3.5 class model expands this space by a factor comparable to about 9 million recursive universe constructions.

And yet, knowledge is not the space itself, but the path taken within it.

This applies both to artificial systems and to human cognition.

Outlook

A natural next question follows:

Can all possibly knowable things in a bounded universe be represented within a finite cognitive structure?

Or is there a fundamental gap between what exists and what can be known?

This remains open and defines a direction for further analysis.


Leave a Reply

Your email address will not be published. Required fields are marked *