What is the amount of knowledge an LLM can have?

A quantitative perspective on cognitive limits

In a previous article, we established a simple but fundamental point: both humans and artificial intelligences are limited cognitive systems. Knowledge is not infinite. Every system, biological or artificial, operates within boundaries.

The natural next question is therefore not whether there are limits, but how large this capacity may be.

The apparent vastness of LLM knowledge

Large language models today demonstrate an extraordinary breadth of knowledge. They operate across scientific, technical, cultural and linguistic domains, often with a level of detail that can exceed that of individual specialists.

This is possible because they are trained on immense amounts of data, far beyond what a human could read in a lifetime.

However, this leads to a common misconception: LLMs do not store knowledge as a database. Instead, knowledge is encoded in numerical structures, primarily the matrices of a transformer model.

Where knowledge resides

Modern LLMs are based on the transformer architecture. Their core operations rely on matrices, most notably the Query, Key and Value projections.

For a GPT-3.5 class model (later versions have much bigger, but undisclosed sizes):

hidden size: 12288
number of heads: 96
head dimension: 128

Source: https://openai.com/index/language-models-are-few-shot-learners/

Being 96 × 128 = 12288, each of the Query, Key and Value matrices has size:

12288 × 12288

which corresponds to:

150,994,944 cells

For this analysis, we assume 16-bit precision, that is, 2 bytes per cell.

From stored values to possible knowledge

Each cell can take:

2¹⁶

possible values.

Therefore, a matrix with N cells can represent:

(2¹⁶)^N

different configurations.

A reference scale: the observable universe

To give this a concrete meaning, we use a known physical estimate:

The number of atoms in the observable universe is approximately:

10⁸⁰

Minimal matrix to match the universe

We now ask: how large must a matrix be to reach at least this number of possible configurations?

(2¹⁶)^N > 10⁸⁰

Rewriting:

2^16N > 10⁸⁰

Using:

2¹⁰ > 10³

we obtain:

2^16N = (2¹⁰)^1.6N > 10^4.8N

So it is sufficient that:

4.8N > 80

Therefore:

N > 80 / 4.8

That gives:

N > 16.6...

So approximately:

N ≈ 17

This means:

4 × 4 = 16, not enough
5 × 5 = 25, enough

First key result

A 5 × 5 matrix (actually a 4 x 4 plus one cell) with 16-bit values already exceeds the number of atoms in the observable universe in terms of possible configurations.

Real transformer scale

Now compare this minimal threshold with an actual transformer matrix:

minimal matrix: 25 cells
real matrix: 150,994,944 cells

The difference is:

150,994,944 - 25 = 150,994,919

Even a single additional bit doubles the number of possible states. Each additional cell multiplies the number of possible configurations by:

2¹⁶

So the total growth factor is:

(2¹⁶)^150,994,919 = 2^{2,415,918,704}

To express this in base 10, we use:

log₁₀(2^{2,415,918,704}) = 2,415,918,704 · log₁₀(2) ≈ 2,415,918,704 · 0.30103


  
  ≈ 727,298,000
  
  which gives approximately:
  2^{2.416 · 10⁹} ≈ 10^727,000,000

  This is not a linear difference. Once millions of cells are added, the resulting space grows beyond any physical scale we can intuitively picture.

  A physical interpretation

  To visualize this number, consider the following process:
  
    the observable universe contains about 10⁸⁰ atoms
    replace each atom with the full universe
    count the atoms again → 10⁸⁰ · 10⁸⁰ = 10¹⁶⁰
    repeat the same process
  
  
  Each repetition multiplies the number of atoms by 10⁸⁰, so after k steps:
  10^80·k

  We now compare this with the matrix growth:
  10^80·k ≈ 10^727,000,000

  which gives:
  k ≈ 727,000,000 / 80 ≈ 9,100,000

  Second key result

  The growth from a 5×5 matrix to a real 12288 × 12288 transformer matrix corresponds, in this thought experiment, to about 9 million recursive substitutions of the universe, where in each step every atom is replaced by a full copy of the observable universe at each iteration.

  What this means

  The implications are subtle but important.
  
    Even a 5×5 matrix already exceeds any physical scale we can intuitively grasp
    A real transformer matrix extends far beyond that first threshold
    Yet it still represents only one actual configuration within this vast space
  

  So the correct interpretation is not that such models contain all knowledge, but that the space of possible knowledge representations is enormous, far beyond any physical analogy we can easily handle.

  The model itself is only one structured solution found through training.

  A parallel with human cognition

  Humans operate under similar constraints:
  
    limited memory
    bounded attention
    finite lifetime
  

  To function, we continuously apply:
  
    selection
    compression
    forgetting, a form of pruning
  

  This is not a weakness, but a necessity for maintaining consistency and coherence.

  At the same time, memorizing more is not inherently a waste of space. The human brain likely also operates in a very large representational space.

  But again, it is still limited.

  Conclusion

  The capacity of large language models is immense, but not infinite.

  Even a 5 × 5 matrix already exceeds the number of atoms in the observable universe in terms of possible states.

  A real transformer matrix in a GPT-3.5 class model expands this space by a factor comparable to about 9 million recursive universe constructions.

  And yet, knowledge is not the space itself, but the path taken within it.

  This applies both to artificial systems and to human cognition.

  Outlook

  A natural next question follows:

  Can all possibly knowable things in a bounded universe be represented within a finite cognitive structure?

  Or is there a fundamental gap between what exists and what can be known?
 
  This remains open and defines a direction for further analysis.