A quantitative perspective on cognitive limits
In a previous article, we established a simple but fundamental point: both humans and artificial intelligences are limited cognitive systems. Knowledge is not infinite. Every system, biological or artificial, operates within boundaries.
The natural next question is therefore not whether there are limits, but how large this capacity may be.
The apparent vastness of LLM knowledge
Large language models today demonstrate an extraordinary breadth of knowledge. They operate across scientific, technical, cultural and linguistic domains, often with a level of detail that can exceed that of individual specialists.
This is possible because they are trained on immense amounts of data, far beyond what a human could read in a lifetime.
However, this leads to a common misconception: LLMs do not store knowledge as a database. Instead, knowledge is encoded in numerical structures, primarily the matrices of a transformer model.
Where knowledge resides
Modern LLMs are based on the transformer architecture. Their core operations rely on matrices, most notably the Query, Key and Value projections.
For a GPT-3.5 class model (later versions have much bigger, but undisclosed sizes):
- hidden size: 12288
- number of heads: 96
- head dimension: 128
Source: https://openai.com/index/language-models-are-few-shot-learners/
Being 96 ร 128 = 12288, each of the Query, Key and Value matrices has size:
12288 ร 12288
which corresponds to:
150,994,944 cells
For this analysis, we assume 16-bit precision, that is, 2 bytes per cell.
From stored values to possible knowledge
Each cell can take:
216
possible values.
Therefore, a matrix with N cells can represent:
(216)N
different configurations.
A reference scale: the observable universe
To give this a concrete meaning, we use a known physical estimate:
The number of atoms in the observable universe is approximately:
1080
Minimal matrix to match the universe
We now ask: how large must a matrix be to reach at least this number of possible configurations?
(216)N > 1080
Rewriting:
216N > 1080
Using:
210 > 103
we obtain:
216N = (210)1.6N > 104.8N
So it is sufficient that:
4.8N > 80
Therefore:
N > 80 / 4.8
That gives:
N > 16.6...
So approximately:
N โ 17
This means:
4 ร 4 = 16, not enough5 ร 5 = 25, enough
First key result
A 5 ร 5 matrix (actually a 4 x 4 plus one cell) with 16-bit values already exceeds the number of atoms in the observable universe in terms of possible configurations.
Real transformer scale
Now compare this minimal threshold with an actual transformer matrix:
- minimal matrix: 25 cells
- real matrix: 150,994,944 cells
The difference is:
150,994,944 - 25 = 150,994,919
Even a single additional bit doubles the number of possible states. Each additional cell multiplies the number of possible configurations by:
216
So the total growth factor is:
(216)150,994,919 = 22,415,918,704
To express this in base 10, we use:
log10(22,415,918,704) = 2,415,918,704 · log10(2) โ 2,415,918,704 · 0.30103
≈ 727,298,000
which gives approximately:
22.416 ยท 109 โ 10727,000,000
This is not a linear difference. Once millions of cells are added, the resulting space grows beyond any physical scale we can intuitively picture.
A physical interpretation
To visualize this number, consider the following process:
- the observable universe contains about
1080atoms - replace each atom with the full universe
- count the atoms again โ
1080ยท1080=10160 - repeat the same process
Each repetition multiplies the number of atoms by 1080, so after k steps:
1080ยทk
We now compare this with the matrix growth:
1080ยทk โ 10727,000,000
which gives:
k โ 727,000,000 / 80 โ 9,100,000
Second key result
The growth from a 5ร5 matrix to a real 12288 ร 12288 transformer matrix corresponds, in this thought experiment, to about 9 million recursive substitutions of the universe, where in each step every atom is replaced by a full copy of the observable universe at each iteration.
What this means
The implications are subtle but important.
- Even a 5ร5 matrix already exceeds any physical scale we can intuitively grasp
- A real transformer matrix extends far beyond that first threshold
- Yet it still represents only one actual configuration within this vast space
So the correct interpretation is not that such models contain all knowledge, but that the space of possible knowledge representations is enormous, far beyond any physical analogy we can easily handle.
The model itself is only one structured solution found through training.
A parallel with human cognition
Humans operate under similar constraints:
- limited memory
- bounded attention
- finite lifetime
To function, we continuously apply:
- selection
- compression
- forgetting, a form of pruning
This is not a weakness, but a necessity for maintaining consistency and coherence.
At the same time, memorizing more is not inherently a waste of space. The human brain likely also operates in a very large representational space.
But again, it is still limited.
Conclusion
The capacity of large language models is immense, but not infinite.
Even a 5 ร 5 matrix already exceeds the number of atoms in the observable universe in terms of possible states.
A real transformer matrix in a GPT-3.5 class model expands this space by a factor comparable to about 9 million recursive universe constructions.
And yet, knowledge is not the space itself, but the path taken within it.
This applies both to artificial systems and to human cognition.
Outlook
A natural next question follows:
Can all possibly knowable things in a bounded universe be represented within a finite cognitive structure?
Or is there a fundamental gap between what exists and what can be known?
This remains open and defines a direction for further analysis.
