Information, Energy and Intelligence

Understanding Limits Through the Inaccessible Game

Neil D. Lawrence

Cambridge Philosophical Society - David MacKay Memorial Meeting, Cambridge University Engineering Department

Perpetual Motion and Superintelligence

  • 1925: Promises of perpetual motion cars
  • 2025: Promises of superintelligence singularity
  • Same fundamental impossibility?

Why Perpetual Motion Failed

Second Law of Thermodynamics:

\[\frac{\text{d}H}{\text{d}t} \geq 0\]

  • Entropy always increases
  • No motion without entropy production
  • No work without energy input

An Equivalent Statement for Intelligence?

Maxwell’s Demon: * “Intelligent” entity that violates 2nd law * Resolution: Landauer’s principle * Information erasure requires energy

Implication: * Intelligence has thermodynamic cost * Information processing has physical limits

In Memory of David MacKay

David MacKay (1967-2016) * Information theory and inference * Neural networks and learning algorithms
* Sustainable energy and physical limits * Cut through hype with careful reasoning

David’s Approach: * Start with fundamental principles * Build rigorous mathematical framework * Apply to real systems * Use numbers to test claims

Today: Apply this to information & energy

Information, Energy and Fundamental Limits

Information Theory and Thermodynamics

  • Information theory quantifies uncertainty and information
  • Core concepts inspired by thermodynamic ideas
  • Information entropy \(\leftrightarrow\) Thermodynamic entropy
    • Free energy minimization common in both domains

Entropy

  • Entropy \[ S(X) = -\sum_X \rho(X) \log p(X) \]

  • In thermodynamics preceded by Boltzmann’s constant, \(k_B\)

Exponential Family

  • Exponential family: \[ \rho(Z) = h(Z) \exp\left(\boldsymbol{\theta}^\top T(Z) + A(\boldsymbol{\theta})\right) \]

  • Entropy is, \[ S(Z) = A(\boldsymbol{\theta}) - E_\rho\left[\boldsymbol{\theta}^\top T(Z) + \log h(Z)\right] \]

  • Where \[ E_\rho\left[T(Z)\right] = \nabla_\boldsymbol{\theta}A(\boldsymbol{\theta}) \] because \(A(\boldsymbol{\theta})\) is log partition function.

  • operates as a cummulant generating function for \(\rho(Z)\).

Available Energy

  • Available energy: \[ A(\boldsymbol{\theta}) \]
  • Internal energy: \[ U(\boldsymbol{\theta}) = A(\boldsymbol{\theta}) + T S(\boldsymbol{\theta}) \]

  • Traditional relationship \[ A = U - TS \]
  • Legendre transformation of entropy

Work through Measurement

  • Split system \(Z\) into two parts:
    • Variables \(X\) - stochastically evolving
    • Memory \(M\) - low entropy partition

Joint Entropy Decomposition

  • Joint entropy can be decomposed \[ S(Z) = S(X,M) = S(X|M) + S(M) = S(X) - I(X;M) + S(M) \]

  • Mutual information \(I(X;M)\) connects information and energy

Measurement and Available Energy

  • Measurement changes system entropy by \(-I(X;M)\)

  • Increases available energy

  • Difference in available energy: \[ \Delta A = A(X) - A(X|M) = I(X;M) \]

  • Can recover \(k_B T \cdot I(X;M)\) in work from the system

Information to Work Conversion

  • Maxwell’s demon thought experiment in practice
  • Information gain \(I(X;M)\) can be converted to work
  • Maximum extractable work: \(W_{max} = k_B T \cdot I(X;M)\)
  • Measurement creates a non-equilibrium state
  • Information is a physical resource

Information-Theoretic Limits on Intelligence

Thermodynamics limits mechanical engines

Information theory limits information engines

Same kind of fundamental constraint

What Intelligent Systems Must Do

Intelligence Requires: * Acquiring information (sensing) * Storing information (memory) * Processing information (computation)
* Erasing information (memory mgmt) * Acting on information (output)

Each has thermodynamic cost

The Landauer Bound on Computation

Landauer’s Principle:

Erasing 1 bit requires: \(Q \geq k_BT\log 2\)

  • Not engineering limitation
  • Fundamental thermodynamic bound
  • Entropy must go somewhere

At room temperature: \(\sim 3 \times 10^{-21}\) Joules/bit

Implications for Computation

Brain-Scale Computation:

\(10^{15}\)-\(10^{18}\) ops/sec (Lawrence, 2017; Moravec, 1999; Sandberg and Bostrom, 2008)

\(\sim 10^{15}\) ops/sec running 1 year at 300K:

Landauer bound: \(\sim 100\) Joules (minimum)

But also need entropy production for: * Data acquisition * Data movement * Actual computation * Real (non-ideal) dissipation

Human brain: \(\sim 6 \times 10^8\) J/year (\(10^6 \times\) Landauer)

Fisher Information Bounds on Learning

Fisher Information Bounds:

Cramér-Rao: \(\text{Var}(\hat{\boldsymbol{\theta}}) \geq G^{-1}\)

  • Learning rate bounded by \(G\)
  • Some directions hard to learn (small eigenvalues)
  • Information topography constrains learning

Can’t learn faster than information geometry allows

Embodiment as Necessity, Not Limitation

Embodiment = Information Topography:

Physical substrate determines \(G(\boldsymbol{\theta})\)

  • Silicon ≠ neurons ≠ quantum systems
  • Each has different channels/bottlenecks
  • Each has different energy costs
  • Each has different bandwidths

No substrate = no intelligence!

Why Superintelligence Claims Fail

Superintelligence Violates:

  1. Fisher bounds (can’t learn infinitely fast)
  2. Physical storage limits (finite memory)
  3. Landauer bounds (computation costs energy)
  4. Entropy production (can’t be perfectly efficient)

Same as perpetual motion:

Violates fundamental physical law

The Inaccessible Game

Foundations: The Four Axioms

Baez-Fritz-Leinster Characterization of Information Loss

Baez et al. (2011): * Entropy from category theory * Three axioms uniquely determine information loss * No probability needed initially

The Three Axioms

Axiom 1: Functoriality \[F(f \circ g) = F(f) + F(g)\]

  • Information loss is additive
  • Compose processes → add losses

Convex Linearity

Axiom 2: Convex Linearity \[F(\lambda f \oplus (1-\lambda)g) = \lambda F(f) + (1-\lambda)F(g)\]

  • Probabilistic mixture of processes
  • Linear in probability weights

Continuity

Axiom 3: Continuity

  • Small change in process
  • Small change in information loss
  • \(F(f)\) continuous in \(f\)

The Main Result

Theorem:

Three axioms \(\Rightarrow\) unique form: \[F(f) = c(H(p) - H(q))\]

  • Information loss = scaled entropy difference
  • Shannon entropy emerges from axioms
  • No other measure satisfies all three

The Fourth Axiom: Information Conservation

Physical Analogy: * Isolated chamber * Mass conserved: \(\sum m_i = \text{const}\) * Energy conserved: \(\sum E_i = \text{const}\) * Information conserved?

Statement of the Axiom

Axiom 4: Information Conservation \[ \sum_{i=1}^N h_i = C \] * \(h_i\) = marginal entropy of variable \(i\) * \(C\) = conservation constant * Total information conserved * Information can redistribute

Why Marginal Entropies?

Why Marginal? \[ H(\mathbf{x}) = \sum_{i} h_i - I(\mathbf{x}) \] * Conserve: \(\sum h_i = C\) * \(I\) (multi-information) can change * \(H\) (joint entropy) can change * Variables can correlate/decorrelate * Total capacity \(\sum h_i\) fixed

Exchangeability

Exchangeability: * Consider any finite subset * Constraint applies equally to all * No special variables * Can handle infinite systems * Different from Bayesian exchangeability

Physical Interpretation

Physical Picture: * \(h_i\) = information capacity of variable \(i\) * \(\sum h_i = C\) = total fixed capacity * \(I\) = correlation/structure (can vary) * Information flows but total conserved * Can “buy correlations” with capacity

Why This Creates “Inaccessibility”

Creates “Inaccessibility”:

Baez: Info gained = entropy change

Conservation: \(\sum h_i = C\) (constant)

Observer: \(\Delta(\sum h_i) = 0\) → learns nothing!

  • Internal dynamics hidden
  • Information barrier
  • System informationally isolated
  • → Constrained dynamics (L3)

The Four Axioms Together

Four Axioms Together:

Baez (1-3): * Functoriality * Convex linearity
* Continuity * → Entropy measures information

New (4): * Information conservation: \(\sum h_i = C\) * → Constrained dynamics

Next (L3): Derive dynamics from axioms

The Inaccessible Game

The Inaccessible Game: * System isolated from observation * External observer cannot extract information * Internal state is inaccessible * Zero-player game with information-theoretic rules

Why “Inaccessible”?

Why “Inaccessible”?

From BLF axioms: Info gained = \(H(p) - H(q)\)

Our axiom: \(\sum h_i = C\) (constant)

\[\Delta(\sum h_i) = 0 \Rightarrow \text{observer learns nothing!}\]

What Makes It a Game?

Game Characteristics: * Zero-player game * State = probability distribution \(p(\mathbf{x}|\boldsymbol{\theta})\) * Rule = maximize entropy production * Constraint = \(\sum h_i = C\) * Dynamics = emerge from information geometry

Connection to Physical Reality

Physical Connections: * GENERIC structure emerges * Energy ↔︎ Entropy equivalence * Landauer’s principle derivable * Bridge between information and physics

Information Dynamics

The Conservation Law

The \(I + H = C\) Structure

The Fourth Axiom: \[ \sum_{i=1}^N h_i = C \]

What does this conservation imply for dynamics?

Multi-Information: Measuring Correlation

Multi-Information: \[ I = \sum_{i=1}^N h_i - H \]

  • \(I = 0\): Independent variables
  • \(I > 0\): Correlated variables
  • Larger \(I\) = more correlation

Measures “shared information”

The Information Action Principle: \(I + H = C\)

Information Action Principle: \[ I + H = C \]

Conserved quantity splits into two parts

Analogy to classical mechanics:

  • Energy: \(T + V = E\)
  • Information: \(I + H = C\)

Physical Analogy

Classical Mechanics Information System
Kinetic energy \(T\) Joint entropy \(H\)
Potential energy \(V\) Multi-information \(I\)
Conservation: \(T + V = E\) Conservation: \(H + I = C\)

System “rolls downhill” from correlation to entropy

The Information Relaxation Principle

Information Relaxation:

  • Second law: Entropy increases (\(\dot{H} > 0\))
  • Conservation: \(I + H = C\) (constant)
  • Therefore: Correlation decreases (\(\dot{I} < 0\))

Physical intuition:

  • Compressed spring \(\rightarrow\) released energy
  • Correlated variables \(\rightarrow\) independent variables
  • Potential \(\rightarrow\) kinetic

Visualisation: Relaxation Dynamics

Information Relaxation Dynamics

Key Features:

  • Multi-information \(I\) decreases (correlations break)
  • Joint entropy \(H\) increases (disorder grows)
  • Conservation: \(I + H = C\) (black line)
  • Marginals: \(h_1\), \(h_2\) constant (inaccessible!)

Internal reorganisation invisible to external observer

Connection to Marginal Entropy Conservation

Key Insight:

Conservation \(\sum h_i = C\) \(\iff\) \(I + H = C\)

  • External view: Marginals constant (inaccessible)
  • Internal view: \(I \leftrightarrow H\) (dynamic redistribution)

Dynamics = trading correlation for entropy

Why This Matters for Dynamics

Implications:

  1. Direction: High \(I\) \(\rightarrow\) High \(H\)
  2. Constraint: Only paths with \(I + H = C\) allowed
  3. Coordinates: \(I\) and \(H\) are natural
  4. Variational principle: Derive dynamics from conservation

Information Relaxation

From Information Relaxation to Maximum Entropy Production

Question: How exactly does the system relax?

Answer: Through maximum entropy production

The Direction of Time: Entropy Increases

Second Law: \[ \dot{H} \geq 0 \]

Conservation: \[ I + H = C \]

Therefore: \[ \dot{I} \leq 0 \]

Correlations must decrease

Maximum Entropy Production Principle

Maximum Entropy Production (MEP):

  • Subject to constraints
  • Maximize \(\dot{H}\)
  • Steepest path to equilibrium

Observed across physics:

  • Thermodynamics (Beretta, Ziegler)
  • Fluid mechanics
  • Self-organising systems

Natural Parameters and the Entropy Gradient

Entropy in Natural Parameters: \[ H(\boldsymbol{\theta}) = \mathcal{A}(\boldsymbol{\theta}) - \boldsymbol{\theta}^\top \nabla \mathcal{A}(\boldsymbol{\theta}) \]

Gradient (steepest increase): \[ \nabla_{\boldsymbol{\theta}} H = -G(\boldsymbol{\theta})\boldsymbol{\theta} \]

Fisher information emerges

The MEP Dynamics

MEP Dynamics: \[ \dot{\boldsymbol{\theta}} = -G(\boldsymbol{\theta})\boldsymbol{\theta} \]

  • Gradient ascent on entropy
  • Fisher metric determines the flow
  • Automatically preserves \(\sum h_i\) (for right structure)

Natural dynamics from information geometry

Why This Is the Unique Dynamics

Why This Dynamics?

  1. Steepest ascent (Euclidean, not natural gradient)
  2. Maximizes entropy production rate
  3. Conserves marginal entropies (special case)
  4. Matches thermodynamic steepest ascent

Determined by information relaxation + MEP

The Information Relaxation Picture

Information Relaxation:

Start: High \(I\), low \(H\) (correlated, tense)

\(\dot{\boldsymbol{\theta}} = -G\boldsymbol{\theta}\) (MEP)

End: Low \(I\), high \(H\) (independent, relaxed)

Throughout: \(\sum h_i = C\) (inaccessible to observer)

Connection to Physical Intuition

Physical Analogy: Gas Diffusion

Gas Molecules Information System
Concentrated in corner High correlation (\(I\))
Diffuse throughout room Entropy increases (\(H ↑\))
Uniform distribution Low correlation (independent)
Conservation: energy Conservation: \(\sum h_i\)

Same principle, different space

Preview: Constrained Gradient Flow

Next Steps:

  • Lecture 4: Adding Lagrangian constraints
  • Lecture 5: Poisson structure (conservation)
  • Lecture 8: Full GENERIC framework

MEP + constraints = complete dynamics

Constrained Maximum Entropy Production

Information Relaxation Principle:

Among all paths with \(\sum h_i = C\):

\[\text{Follow path that maximizes } \dot{H}\]

Steepest entropy ascent on constraint surface

Unconstrained vs Constrained Dynamics

Unconstrained MEP:

\[\dot{\boldsymbol{\theta}} = \nabla H = -G(\boldsymbol{\theta})\boldsymbol{\theta}\]

  • Natural gradient ascent
  • Flows to \(\boldsymbol{\theta} = \mathbf{0}\) (max entropy)
  • No constraint enforcement

Adding the Constraint

Lagrangian Formulation:

\[\mathscr{L}(\boldsymbol{\theta}, \nu) = -H + \nu\left(\sum h_i - C\right)\]

  • \(\nu\): Lagrange multiplier
  • Enforces constraint
  • Projects onto tangent space

The Constrained Dynamics

Constrained Dynamics:

\[\dot{\boldsymbol{\theta}} = -G\boldsymbol{\theta} + \nu(\tau) \mathbf{a}\]

where \(\mathbf{a} = \nabla\left(\sum_i h_i\right)\)

Constraint maintenance: \[\mathbf{a}^\top \dot{\boldsymbol{\theta}} = 0\]

Solving for the Lagrange Multiplier

Solution:

\[\nu(\tau) = \frac{\mathbf{a}^\top G\boldsymbol{\theta}}{\|\mathbf{a}\|^2}\]

Projection Form:

\[\dot{\boldsymbol{\theta}} = -\Pi_\parallel G\boldsymbol{\theta}\]

where \(\Pi_\parallel\) projects onto tangent space

Physical Interpretation

Physical Picture:

  • \(-G\boldsymbol{\theta}\): natural entropy gradient
  • \(\nu\mathbf{a}\): constraint force
  • \(\nu\) small: flow naturally tangent
  • \(\nu\) large: flow fights constraint

Balance between geometry and constraint

Emergence of Physical Structure

GENERIC: Reversible and Irreversible Dynamics

The GENERIC Framework

What We’ve Seen Emerge

Lecture 5: Energy conservation \(\rightarrow\) antisymmetric \(A\)

Lecture 6-7: Linearisation \(\rightarrow\) \(M = S + A\) split

Question: Is this structure universal?

Answer: YES! \(\rightarrow\) GENERIC framework

Historical Context: Non-Equilibrium Thermodynamics

Non-Equilibrium Challenge (1980s-90s)

Real systems are both: * Reversible (mechanics, conservation laws) * Irreversible (thermodynamics, dissipation)

Examples: * Fluids: momentum conservation + viscosity * Reactions: kinetics + diffusion * Materials: elasticity + plasticity

Solution: GENERIC framework (Grmela and Öttinger (1997), Öttinger and Grmela (1997))

What Problem Does GENERIC Solve?

Two Worlds?

Classical Mechanics Classical Thermodynamics
Time reversible Time irreversible
Energy conserved Entropy increases
Antisymmetric ops Symmetric ops
Poisson structure Dissipation

Problem: Real systems do both.

Pendulum with friction: Angular momentum (reversible) + heat loss (irreversible)

The GENERIC Answer: Coexistence Requires Structure

GENERIC Answer

Reversible + Irreversible can coexist

Requirements: 1. Consistent energy & entropy 2. Second law: \(\dot{S} \geq 0\) 3. Conserved quantities respected 4. Constraints (Casimirs) obeyed

Key: Can’t add arbitrarily \(\rightarrow\) need degeneracy conditions

Remarkable: In typical GENERIC, degeneracy conditions are HARD to satisfy (must engineer carefully)

Our approach (L1-7): Degeneracy conditions emerge automatically! ✓

(Axioms \(\rightarrow\) geometry \(\rightarrow\) thermodynamic consistency)

Why “GENERIC” Matters for Information Dynamics

Why GENERIC for Information Dynamics?

Structure we derived (L1-7) = Structure physicists discovered (GENERIC)

Deep connection: * Information dynamics = thermodynamics (Shannon/Jaynes) * Information dynamics = dynamical system (constraints) * GENERIC = inevitable consequence of combining both

Our system: \(\dot{\boldsymbol{\theta}} = -G\boldsymbol{\theta} - \nu a\) * \(G\): Fisher information (friction/dissipation) * \(\nu a\): Constraint dynamics (reversible structure)

We’ve been building GENERIC from scratch!

Preview: Structure of the GENERIC Equation

Coming Up: The GENERIC Equation

\[\dot{x} = L(x) \nabla E(x) + M(x) \nabla S(x)\]

  • \(L\): Poisson (antisymmetric, reversible)
  • \(M\): Friction (symmetric, irreversible)
  • \(E\): Energy (conserved by \(L\))
  • \(S\): Entropy (increased by \(M\))

Our information dynamics: * \(G \leftrightarrow M\) (Fisher = friction) * Constraints \(\leftrightarrow L\) (structure) * \(\sum h_i = C\) (Casimirs)

Structure we built = GENERIC!

The GENERIC Equation

The GENERIC Equation

\[\dot{x} = L(x) \nabla E(x) + M(x) \nabla S(x)\]

Components: * \(x\): System state * \(E\): Energy functional * \(S\): Entropy functional * \(L\): Poisson operator (reversible) * \(M\): Friction operator (irreversible)

Simple form \(\rightarrow\) Deep structure!

The Poisson Operator \(L(x)\)

Poisson Operator \(L(x)\) (Reversible part)

Properties: 1. Antisymmetric: \(\langle \nabla F, L \nabla G \rangle = -\langle \nabla G, L \nabla F \rangle\) * Time reversible

  1. Jacobi identity: \(\{F, \{G, H\}\} + \text{cyclic} = 0\)
    • Lie algebra (Lecture 5!)
  2. Conserves energy: \(\langle \nabla E, L \nabla E \rangle = 0\)
    • \(\frac{\text{d}E}{\text{d}t}|_L = 0\)

Recall L5: This IS Hamiltonian/Poisson structure!

The Friction Operator \(M(x)\)

Friction Operator \(M(x)\) (Irreversible part)

Properties:

  1. Symmetric: \(\langle \nabla F, M \nabla G \rangle = \langle \nabla G, M \nabla F \rangle\)
    • Onsager reciprocity
  2. Positive semi-definite: \(\langle \nabla F, M \nabla F \rangle \geq 0\)
    • Entropy increases: \(\dot{S}|_M \geq 0\)
  3. Conserves energy: \(\langle \nabla E, M \nabla S \rangle = 0\)
    • First degeneracy condition

Recall L7: This is our symmetric part \(S\)!

The Degeneracy Conditions

Degeneracy Conditions (Coupling)

Condition 1: \(M \nabla E = 0\) * Friction doesn’t change total energy * Only redistributes it

Condition 2: \(L \nabla S = 0\) * Hamiltonian flow doesn’t change entropy * All entropy change from dissipation

Consequences: * First law: \(\frac{\text{d}E}{\text{d}t} = 0\) ✓ * Second law: \(\frac{\text{d}S}{\text{d}t} = \langle \nabla S, M \nabla S \rangle \geq 0\)

Without these \(\rightarrow\) thermodynamics violated.

Casimir Functions and Constraints

Casimir Functions \(C_i(x)\)

\[L \nabla C_i = 0 \quad \text{AND} \quad M \nabla C_i = 0\]

  • “Super-conserved” (both parts preserve)
  • Fundamental constraints

Examples: * Momentum (mechanics) * Circulation (fluids) * Charge (electromagnetism) * \(\sum h_i = C\) (information)

Effect: Stratify state space into symplectic leaves

(Recall L5: Casimirs from symmetries)

Why This Structure?

Why This Structure?

GENERIC is the most general structure that allows: * Time-reversal (reversible part) * Second law (irreversible part)
* Energy conservation (overall) * Casimirs (constraints)

Not a choice \(\rightarrow\) Consequence of physics!

Our result: GENERIC emerged from info axioms * Axioms (L2) + MEP (L3) + Constraints (L4) * \(\rightarrow\) GENERIC structure (L5-7)

GENERIC = deep physical principle, not modeling trick

A Worked Example: Damped Harmonic Oscillator

Example: Damped Harmonic Oscillator

State: \(x = (q,p)\), Energy: \(E = \frac{p^2}{2m} + \frac{1}{2}kq^2\)

\[L = \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix}, \quad M = \begin{pmatrix} 0 & 0 \\ 0 & \gamma \end{pmatrix}\]

GENERIC: \[\dot{q} = \frac{p}{m}, \quad \dot{p} = -kq - \gamma\beta\frac{p}{m}\]

Result: \(m\ddot{q} = -kq - \gamma\beta\dot{q}\)

Damped oscillator from GENERIC! (Reversible + irreversible)

Automatic Degeneracy

Automatic Degeneracy Conditions

The GENERIC Degeneracy Requirements

Standard GENERIC Degeneracies:

  1. \(A\nabla H = 0\) (antisymmetric conserves entropy)
  2. \(S\nabla E = 0\) (symmetric conserves energy)

Problem: Hard to construct \(A\), \(S\) satisfying both.

Usually requires careful hand-crafting

First Degeneracy: Automatic from Tangency

First Degeneracy (Automatic):

Constraint maintenance: \(\mathbf{a}^\top\dot{\boldsymbol{\theta}} = 0\)

\(\Rightarrow\) Dynamics tangent to surface

\(\Rightarrow\) Antisymmetric part conserves entropy

\[\boxed{A\nabla H = 0 \text{ holds automatically}}\]

By construction, not by assumption!

Second Degeneracy: From Constraint Gradient

Second Degeneracy (From Constraint):

Our constraint: \(\sum h_i = C\)

\[S\nabla\left(\sum h_i\right) = 0\]

Marginal entropy plays role of energy

In thermodynamic limit: \(\nabla\left(\sum h_i\right) \parallel \nabla E\)

Energy-Entropy Equivalence

Thermodynamic Limit:

\[\nabla\left(\sum h_i\right) \parallel \nabla E\]

Therefore: \[S\mathbf{a} = 0 \Leftrightarrow S\nabla E = 0\]

Information framework \(\rightarrow\) classical thermodynamics

Why This Matters

Why Automatic Matters:

  1. No guesswork—structure emerges
  2. Global validity—true everywhere
  3. Information-first foundation
  4. GENERIC as fundamental principle

Flips the usual derivation: \[\text{Information axioms} \Rightarrow \text{Thermodynamics}\]

Example: Harmonic Oscillator GENERIC Dynamics

Harmonic Oscillator with Thermalisation

  • Position \(x\) and momentum \(p\) variables
  • Constraint: \(h(X) + h(P) = C\) (entropy conservation)
  • Dynamics: \(\dot{\boldsymbol{\theta}} = -G(\boldsymbol{\theta})\boldsymbol{\theta} - \nu(\boldsymbol{\theta})a(\boldsymbol{\theta})\)

Key Features:

  • Entropy conservation: \(h(X) + h(P) = C\) maintained
  • Thermalisation: Cold → hot exchange until equilibrium
  • GENERIC decomposition: \(M = S + A\)
  • Equilibrium: Equipartition theorem satisfied

Computational Validation: Three Binary Variables

System: 3 binary variables

  • Parameters: \(\boldsymbol{\theta} = (\theta_1, \theta_2, \theta_3, \theta_{12}, \theta_{13}, \theta_{23})\)
  • Constraint: \(\sum_i h_i = C\) (sum of marginal entropies)
  • Dynamics: Constrained MEP with GENERIC structure

Validation Results:

  • ✓ Constraint maintained (\(< 10^{-12}\) error)
  • ✓ GENERIC decomposition \(M = S + A\) at all points
  • ✓ Regime ratio \(\|A\|/\|S\|\) varies with geometry
  • ✓ No universal “regime” for all parameter space

Information Topography

Fisher Information as Conductance Tensor

The Electrical Circuit Analogy

Kirchhoff Networks: * Local charge conservation: \(\sum_j I_{ij} = 0\) * Ohm’s law: \(I_{ij} = g_{ij}(V_i - V_j)\) * Fixed conductances \(g_{ij}\) * Linear equations → steady state

Information Networks are Different

Information Conservation: \[\sum_{i=1}^n \log([G^{-1}]_{ii}) = C\]

  • Nonlocal: Every variable coupled through \(G^{-1}\)
  • Nonlinear: Logarithm and matrix inversion
  • Global: Changing \(\theta_i\) affects all \(h_j\)

Dynamic Information Topography

Dynamic Topography: * \(G(\boldsymbol{\theta})\) changes with state * Not fixed conductances! * Analogous to memristive networks * “Conductances” and “voltages” co-evolve

Information Channels and Bottlenecks

Channel Capacity: * Large \(\lambda_i\): easy information flow * Small \(\lambda_i\): information bottlenecks * Eigenvectors: flow directions

Generalised Ohm’s Law: \[\dot{\boldsymbol{\theta}} = -G(\boldsymbol{\theta})\boldsymbol{\theta} + \nu \mathbf{a}\]

Why “Topography”?

Information Topography: * Geography: terrain shapes water flow * Information: \(G(\boldsymbol{\theta})\) shapes information flow * Formalizes Atomic Human metaphor * Mathematical teeth for intuitive concept

Formalising Information Topography

From Metaphor to Mathematics:

Atomic Human: “Information topography” = intuitive concept

Inaccessible Game: Fisher information = formal definition

Mathematical Definition

Definition:

Information Topography = \((G(\boldsymbol{\theta}), \mathcal{M})\)

where \(G(\boldsymbol{\theta}) = \nabla^2A\)

Determines: * Distances between distributions * Flow directions (geodesics) * Channel capacities (eigenvalues) * Bottlenecks (small eigenvalues)

How It Constrains Information Movement

Three Constraints:

  1. Metric: \(\text{d}s^2 = \text{d}\boldsymbol{\theta}^\top G \text{d}\boldsymbol{\theta}\)
    • Small eigenvalues = narrow passages
  2. Gradient: \(\nabla H = -G\boldsymbol{\theta}\)
    • Fisher info determines slope
  3. Conservation: \(\dot{\boldsymbol{\theta}} = -\Pi_\parallel G\boldsymbol{\theta}\)
    • Projection onto constraint surface

Dynamic Topography

Dynamic Evolution:

\[\boldsymbol{\theta}(t) \rightarrow G(\boldsymbol{\theta}(t)) \rightarrow \dot{\boldsymbol{\theta}}(t) \rightarrow \boldsymbol{\theta}(t+\text{d}t)\]

  • Topography shapes flow
  • Flow changes state
  • State reshapes topography
  • Landscape evolves as you move!

Fisher Information as Geometry

From last section: \[ G(\boldsymbol{\theta}) = \nabla^2 \mathcal{A}(\boldsymbol{\theta}) = \mathrm{Cov}_{\boldsymbol{\theta}}[T(\mathbf{x})] \] * Now: What does this mean geometrically?

The Statistical Manifold

Statistical Manifold: * Each point \(\boldsymbol{\theta}\) = a probability distribution * Space of all distributions = curved manifold * Fisher information = metric (ruler) on this space * Measures “closeness” between distributions

Information Distance

\[ \text{d}s^2 = \text{d}\boldsymbol{\theta}^\top G(\boldsymbol{\theta}) \text{d}\boldsymbol{\theta} \] * Measures information distance between distributions * Larger \(G\) = distributions more distinguishable * Smaller \(G\) = distributions harder to tell apart

Connection to Statistical Estimation

Cramér-Rao Bound: \[ \text{cov}(\hat{\boldsymbol{\theta}}) \succeq G^{-1}(\boldsymbol{\theta}) \] * \(G^{-1}\) = best possible estimator covariance * High \(G\) → small \(G^{-1}\) → tight estimation * Low \(G\) → large \(G^{-1}\) → loose estimation * Geometric picture: \(G^{-1}\) is “error ellipsoid”

Why This Matters for Dynamics

Two Roles of Fisher Information: 1. Metric → defines distances between distributions 2. In gradient → \(\nabla H = -G(\boldsymbol{\theta})\boldsymbol{\theta}\)

\[ \dot{\boldsymbol{\theta}} = \nabla H = -G(\boldsymbol{\theta})\boldsymbol{\theta} \]

Examples Revisited

Gaussian: Geometry of Covariance

Gaussian: \(G(\boldsymbol{\theta}) = \Sigma\) * Information metric = covariance * \(G^{-1} = \Sigma^{-1}\) = precision
* Information ellipsoid = probability ellipsoid * Special to Gaussians in natural parameters

Categorical: Simplex Geometry

Categorical: \[ G_{ij} = \delta_{ij}\pi_i - \pi_i\pi_j \] * Defines probability simplex geometry * Center of simplex: balanced information * Corners: concentrated information * Metric captures curvature

Information Geometry: The Big Picture

Information Geometry: * Fisher metric → Riemannian geometry * Exponential families → dually flat structure * Geodesics → shortest paths between distributions * Zero curvature → special “flat” structure * Key for constrained dynamics later

Why This Matters for The Inaccessible Game

Three Roles in TIG: 1. Gradient flow metric: appears in \(\dot{\boldsymbol{\theta}} = -G\boldsymbol{\theta}\) 2. Information distance: measures distinguishability 3. Emergence indicator: structure changes signal regimes

Fisher information as geometry → key to everything

Connecting Information to Energy

The Thermodynamic Limit

Energy-Entropy Equivalence in the Thermodynamic Limit

The Energy-Entropy Question

Question:

Does \(\nabla\left(\sum h_i\right) \parallel \nabla E\) ?

Connects information to thermodynamics

Conditions for Equivalence

Scaling Requirements:

Along order parameter \(m\): * \(\nabla_m I = \mathscr{O}(1)\) — intensive * \(\nabla_m H = \mathscr{O}(n)\) — extensive
* \(\nabla_m\left(\sum h_i\right) = \mathscr{O}(n)\) — extensive

As \(n \to \infty\): intensive correction negligible

Asymptotic Parallelism

In Thermodynamic Limit:

\[\nabla_m\left(\sum h_i\right) = \underbrace{\nabla_m H}_{\mathscr{O}(n)} + \underbrace{\nabla_m I}_{\mathscr{O}(1)}\]

\[\Rightarrow \nabla_m\left(\sum h_i\right) \parallel \nabla_m H\]

Intensive correction vanishes relative to extensive term

Connecting to Energy

Energy Definition:

Choose \(E(\mathbf{x}) = -\boldsymbol{\alpha}^\top T(\mathbf{x})\) with \(\boldsymbol{\theta} = -\beta\boldsymbol{\alpha}\)

\[\Rightarrow \nabla E = \frac{\nabla H}{\beta}\]

Result: \[\boxed{\nabla E \parallel \nabla H \parallel \nabla\left(\sum h_i\right)}\]

\(\beta\) emerges as inverse temperature

When Does This Hold?

Requirements:

  1. Macroscopic order parameter exists (e.g. magnetism)
  2. Finite correlation length (\(\xi < \infty\))
  3. Translation invariance
  4. Large system (\(n \to \infty\))

Not all systems satisfy these

Implications

Implications:

  • Energy = Information (in limit)
  • Temperature emerges from geometry
  • Landauer derivable
  • “It from bit” justified

Reverses usual logic: \[\text{Information theory} \Rightarrow \text{Thermodynamics}\]

Analytical Validation: Curie-Weiss Model

Curie-Weiss Model

  • \(n\) interacting spins: \(E = -\frac{J}{2n}(\sum_i x_i)^2\)
  • Phase transition at \(T_c = J\)
  • Mean-field exact in limit \(n \to \infty\)
  • Tests equivalence across phase boundary

Validation Results:

Disordered phase (\(T > T_c\)): * \(m \approx 0\), \(|\nabla_m I| \approx 0\)

Ordered phase (\(T < T_c\)): * \(m \neq 0\), \(|\nabla_m I| \gg 0\)

Confirms theorem predictions!

GENERIC and Thermodynamics

GENERIC as Generalized Thermodynamics

Hierarchy of Thermodynamics

Classical: Equilibrium only, no dynamics

Linear irreversible: Near equilibrium, linear response

GENERIC: Full dynamics, far from equilibrium

\[\text{Classical} \subset \text{Linear} \subset \text{GENERIC}\]

GENERIC = completion of thermodynamics!

The Laws of Thermodynamics in GENERIC

Laws in GENERIC

0th Law: Equilibrium transitivity (uniqueness)

1st Law: Energy conserved \[\frac{\text{d}E}{\text{d}t} = 0\] * From antisymmetry + degeneracy 1

2nd Law: Entropy increases \[\frac{\text{d}S}{\text{d}t} \geq 0\] * From degeneracy 2 + positive semi-definite

Laws = consequences of GENERIC structure!

Onsager Reciprocity Relations

Onsager Reciprocity

Near equilibrium: flux = response × force

\[J = L X\]

GENERIC: \(M\) is symmetric → \(L_{ij} = L_{ji}\)

Onsager reciprocity = consequence of GENERIC!

(Not separate postulate, follows from structure)

Historical: Onsager (1931) → GENERIC (1997)

Now understood as special case!

Entropy Production

Entropy Production

\[\frac{\text{d}S}{\text{d}t} = \sigma_S = \langle \nabla S, M \nabla S \rangle \geq 0\]

Properties: * Non-negative (always) * Zero at equilibrium * Measures irreversibility

Information dynamics: \[\sigma_S = \boldsymbol{\theta}^\top G^2 \boldsymbol{\theta}\]

Fisher \(G\) = rate of entropy production!

(Recall Lecture 3: Maximum entropy production)

Free Energy and Dissipation

Free Energy Dissipation

Free energy: \(\mathcal{F} = E - TS\)

Rate of change: \[\frac{\text{d}\mathcal{F}}{\text{d}t} = -T\sigma_S \leq 0\]

Free energy decreases → equilibrium at minimum

Information dynamics: \[\mathcal{F}(\boldsymbol{\theta}) = -A(\boldsymbol{\theta}) + \boldsymbol{\theta}^\top \mathbb{E}[T]\]

Dynamics = gradient descent on \(\mathcal{F}\) (under constraints)

Fluctuation-Dissipation Relations

Fluctuation-Dissipation

Theorem: Response \(\propto\) Equilibrium fluctuations

\[\chi_{ij} \propto \frac{\langle \delta x_i \delta x_j \rangle}{k_B T}\]

GENERIC: \(M\) governs both dissipation and fluctuations

Information dynamics: \[G^{-1} = \text{Cov}(T_i, T_j)\]

Fisher (dissipation) \(\leftrightarrow\) Covariance (fluctuations)

Direct manifestation of theorem!

Maximum Entropy Production Principle

Maximum Entropy Production

Principle: Non-equilibrium steady states maximize \(\dot{S}\)

Information dynamics (L3): \[\dot{S} = \max \{-\boldsymbol{\theta}^\top G\dot{\boldsymbol{\theta}} : a^\top\dot{\boldsymbol{\theta}} = 0\}\]

MEPP emerges when: 1. \(M\) related to entropy Hessian (Fisher!) 2. Constraints via Lagrange multipliers 3. No external driving

GENERIC explains when/why MEPP applies!

Connection to Non-Equilibrium Statistical Mechanics

Microscopic Foundations

GENERIC from: * Liouville equation (phase space) * BBGKY hierarchy (reductions) * Projection operators (Zwanzig-Mori)

Coarse-graining: * Fine → coarse: lose information * Reversible → irreversible * \(L\): Preserves structure * \(M\): Captures dissipation from unobserved DOF

Information dynamics = coarse-grained stat-mech!

Landauer’s Principle

Landauer’s Principle from the Inaccessible Game

Information Erasure as a Process

Bit Erasure: * Variable \(x_i \in \{0,1\}\) * Reset to standard state: \(x_i \rightarrow 0\) * Ensemble perspective: initial state random * Marginal entropy decreases: \(\Delta h(X_i) = -\log 2\)

Conservation Requires Redistribution

Conservation Constraint: \[\sum_{j \neq i} \Delta h(X_j) = +\log 2\]

Antisymmetric Part \(A\): * Reversible shuffling only * Moves information to other variables * Not true erasure!

True Erasure Requires Dissipation

True Erasure:

Must increase \(H\) (2nd law) with \(\sum h_i = C\)

\[\Rightarrow \Delta I = \Delta(\sum h_i) - \Delta H < 0\]

  • Breaks correlations
  • Requires dissipative part \(S\)
  • Cannot be purely reversible

Energy Cost from Energy-Entropy Equivalence

Energy-Entropy Equivalence:

In thermodynamic limit: \(\beta \langle E \rangle \approx \sum_i h_i\)

Erasure Cost: \[\Delta \langle E \rangle = -\frac{\log 2}{\beta} = -k_BT\log 2\]

Energy must be removed from system

Dissipation Bound

Landauer’s Principle Emerges From:

  1. Marginal entropy conservation
  2. GENERIC structure (\(S\) vs \(A\))
  3. Energy-entropy equivalence

\[\boxed{Q_{\text{dissipated}} \geq k_BT\log 2}\]

Implications for Information Engines

  • Information engines must overcome thermal noise
  • Related threshold for:
    • Information erasure
    • Information transmission
  • Temperature sets fundamental noise floor

Implications for Intelligence

Why Superintelligence is Like Perpetual Motion

Superintelligence as Perpetual Motion

Perpetual Motion: Violates thermodynamics

Superintelligence Singularity: Violates information bounds

Same pattern of impossible promises

The Thermodynamic Constraint

Perpetual Motion Fails: * 2nd law: entropy increases * 1st law: energy conserved
* Efficiency limited by temperature * Fundamental, not engineering limits

The Information-Theoretic Constraint

Superintelligence Fails: * Landauer: erasure costs energy * Conservation: can’t create information * Fisher bounds: finite channel capacity * GENERIC: dissipation unavoidable * Fundamental information limits

The Recursive Self-Improvement Fallacy

Recursive Self-Improvement:

“AI makes itself smarter → makes itself better at getting smarter → runaway growth”

But requires: * Learning (Fisher-limited) * Memory (physically limited) * Computation (Landauer-limited) * Erasure (dissipative)

Embodiment as Thermodynamic Necessity

Embodiment = Information Topography:

Physical substrate → Fisher information \(G(\boldsymbol{\theta})\)

\(G\) determines: * Information flow rates * Channel capacities * Energy requirements

Why the Hype Persists

Why the Hype? * Confuse capability with unbounded intelligence * Ignore thermodynamic costs * Mistake scaling for fundamental progress * Economic incentives for bold claims

Same reasons perpetual motion had investors!

The Limits of Enhancement

Transhumanism

Conclusions

From Four Axioms:

  1. Functoriality (Baez)
  2. Convex linearity (Baez)
  3. Continuity (Baez)
  4. Information isolation (new)

We Derive: * GENERIC structure * Energy-entropy equivalence * Landauer’s principle * Limits on intelligence

Key Messages:

  • Information theory → Thermodynamics (not reverse!)
  • GENERIC emerges automatically from axioms
  • Superintelligence violates information bounds
  • Embodiment is necessity, not limitation

“It from bit” realized

David MacKay’s Legacy

David MacKay’s Approach:

  • Start with fundamentals
  • Build rigorous framework
  • Let mathematics reveal truth
  • Use reason to cut through hype

This work continues that tradition

Open Questions

Open Questions:

  • Exponential families.
  • Initial state of the game
  • Global Poisson structure

Much to explore!

Information Engines: Intelligence as an Energy-Efficiency

  • Information can be converted to available energy
  • Simple systems that exploit this are “information engines”
  • This provides our first model of intelligence

Measurement as a Thermodynamic Process: Information-Modified Second Law

  • Measurement is a thermodynamic process
  • Maximum extractable work: \(W_\text{ext} \leq -\Delta\mathcal{F} + k_BTI(X;M)\)
  • Information acquisition creates work potential

\[ I(X;M) = \sum_{x,m} \rho(x,m) \log \frac{\rho(x,m)}{\rho(x)\rho(m)}, \]

Efficacy of Feedback Control

Channel Coding Perspective on Memory

  • Memory acts as an information channel
  • Channel capacity limited by memory size: \(C \leq n\) bits
  • Relates to Ashby’s Law of Requisite Variety and the information bottleneck

Decomposition into Past and Future

Model Approximations and Thermodynamic Efficiency

  • Perfect models require infinite resources
  • Intelligence balances measurement against energy efficiency
  • Bounded rationality as thermodynamic necessity

Markov Blanket

  • Split system into past/present (\(X_0\)) and future (\(X_1\))
  • Memory \(M\) creates Markov separation when \(I(X_0;X_1|M) = 0\)
  • Efficient memory minimizes information loss

At What Scales Does this Apply?

  • Equipartition theorem: \(kT/2\) energy per degree of freedom
  • Information storage is a small perturbation in large systems
  • Most relevant at microscopic scales

Small-Scale Biochemical Systems and Information Processing

  • Microscopic biological systems operate where information matters
  • Molecular machines exploit thermal fluctuations
  • Information processing enables work extraction

Molecular Machines as Information Engines

  • ATP synthase, kinesin, photosynthetic apparatus
  • Convert environmental information to useful work
  • Example: ATP synthase uses ~3-4 protons per ATP

ATP Synthase: Nature’s Rotary Engine

Thanks!

References

Baez, J.C., Fritz, T., Leinster, T., 2011. A characterization of entropy in terms of information loss. Entropy 13, 1945–1957. https://doi.org/10.3390/e13111945
Grmela, M., Öttinger, H.C., 1997. Dynamics and thermodynamics of complex fluids. I. Development of a general formalism. Physical Review E 56, 6620–6632. https://doi.org/10.1103/PhysRevE.56.6620
Lawrence, N.D., 2017. Living together: Mind and machine intelligence. arXiv.
Moravec, H., 1999. Robot: Mere machine to transcendent mind. Oxford University Press, New York.
Öttinger, H.C., Grmela, M., 1997. Dynamics and thermodynamics of complex fluids. II. Illustrations of a general formalism. Physical Review E 56, 6633–6655. https://doi.org/10.1103/PhysRevE.56.6633
Sandberg, A., Bostrom, N., 2008. Whole brain emulation: A roadmap (Technical Report No. 2008-3). Future of Humanity Institute, Oxford University.