Information, Energy and Intelligence

What Emerges from Internal Adjudicability?

Neil D. Lawrence

Cambridge Philosophical Society - David MacKay Memorial Meeting, Cambridge University Engineering Department

In Memory of David MacKay

David MacKay (1967-2016) * Information theory and inference * Neural networks and learning algorithms * Sustainable energy and physical limits * Cut through hype with careful reasoning

How many Lightbulbs?

This Talk

David’s Approach: * Start with fundamental principles * Build rigorous mathematical framework * Apply to real systems * Use numbers to test claims

Today: Ideas on applying this to information, energy and intelligence

Playful

  • David was a playful person.
  • Physics puzzles
  • Ultimate Frisbee.
    • Ultimate “spirit of the game”
  • Create a self-adjudicating zero-player information game

The No-Barber Principle

Russell’s Barber Paradox:

Barber shaves all who don’t shave themselves

Does the barber shave themselves?

Paradox: Definition includes itself in scope

The Munchkin Provision

Munchkin Card Game (Jackson-munchkin01?):

Rules may be inconsistent

Resolution: “Loud arguments, with owner having last word”

For foundations: Need something better!

No external referee for mathematics

No External Adjudicators

No-Barber Principle:

Rules must be internally adjudicable

Forbidden: * External observer * Pre-specified outcome space * Privileged decomposition * External time parameter

No appeal to structure outside the game

Entropic Exchangeability

Entropic Exchangeability:

Admissible rules must: 1. Use only reduced descriptions 2. Be relabeling-invariant 3. Not require global distinguishability

What This Excludes

Violations of No-Barber: * Partial conservation (some variables isolated) → privileges variables * Time-varying \(C\) → needs external clock * Observer-relative isolation → needs external observer * Probabilistic isolation → needs external measure

All smuggle in external structure

Foundations: Information Loss and Entropy

The Three Axioms

Baez-Fritz-Leinster Characterization of Information Loss

Baez et al. (2011): * Entropy from category theory * Three axioms uniquely determine information loss * No probability needed initially

The Three Axioms

Axiom 1: Functoriality \[F(f \circ g) = F(f) + F(g)\]

  • Information loss is additive
  • Compose processes → add losses

Convex Linearity

Axiom 2: Convex Linearity \[F(\lambda f \oplus (1-\lambda)g) = \lambda F(f) + (1-\lambda)F(g)\]

  • Probabilistic mixture of processes
  • Linear in probability weights

Continuity

Axiom 3: Continuity

  • Small change in process
  • Small change in information loss
  • \(F(f)\) continuous in \(f\)

The Main Result

Theorem:

Three axioms \(\Rightarrow\) unique form: \[F(f) = c(H(p) - H(q))\]

  • Information loss = scaled entropy difference
  • Shannon entropy emerges from axioms
  • No other measure satisfies all three

Information Isolation: Selected by No-Barber

Information Isolation: Selected, Not Assumed

Earlier: Fourth axiom (assumed)

Now: Selected by no-barber principle

Not independent assumption

Why This Selection?

Alternatives Violate No-Barber:

  • Partial conservation → privileges variables
  • Time-varying \(C(t)\) → needs external clock
  • Observer-relative → needs external observer
  • Probabilistic → needs external measure

All require external structure

The Unique Internal Choice

Why \(\sum h_i = C\)?

Properties: * Exchangeable across subsystems * Extensive (scales with \(n\)) * Internal (reduced descriptions only) * Time-independent (constant \(C\)) * Observer-independent (absolute)

Strongest constraint without external structure

Implication: Not “Just Another Axiom”

Conceptual Shift:

Not: “Here’s a fourth axiom”

But: “Internal adjudicability forces this”

\[\text{No-barber principle} \Rightarrow \sum_i h_i = C\]

Derived necessity, not arbitrary choice

Constraints vs Selections

Two Kinds of Claims:

  • Constraints: required to avoid external structure
  • Selections: internally motivated choices (not yet unique)
  • Open: which selections are actually forced?

Smuggled Outcomes: Shannon vs von Neumann

Smuggled Outcomes:

  • Shannon needs labelled outcomes / measure structure (FinProb in category theory)
  • That labelling is not in the game’s internal language
  • von Neumann entropy is basis-free / algebraic (C* algebra in category theory)

The Inaccessible Game

The Inaccessible Game

The Inaccessible Game: * System isolated from observation * External observer cannot extract information * Internal state is inaccessible * Zero-player game with information-theoretic rules

Why “Inaccessible?”

Why “Inaccessible?”

From BLF axioms: Info gained = \(H(p) - H(q)\)

Our axiom: \(\sum h_i = C\) (constant)

\[\Delta(\sum h_i) = 0 \Rightarrow \text{observer learns nothing!}\]

What Makes It a Game?

Game Characteristics: * Zero-player game * State = probability distribution \(p(\mathbf{x}|\boldsymbol{\theta})\) * Rule = maximize entropy production * Constraint = \(\sum h_i = C\) * Dynamics = emerge from information geometry

Connection to Physical Reality

Physical Connections: * GENERIC structure emerges * Energy ↔︎ Entropy equivalence * Landauer’s principle derivable * Bridge between information and physics

Information Dynamics

The Conservation Law

The \(I + H = C\) Structure

The Fourth Axiom: \[ \sum_{i=1}^N h_i = C \]

What does this conservation imply for dynamics?

Multi-Information: Measuring Correlation

Multi-Information: \[ I = \sum_{i=1}^N h_i - H \]

  • \(I = 0\): Independent variables
  • \(I > 0\): Correlated variables
  • Larger \(I\) = more correlation

Measures “shared information”

The Information Action Principle: \(I + H = C\)

Information Action Principle: \[ I + H = C \]

Conserved quantity splits into two parts

Analogy to classical mechanics:

  • Energy: \(T + V = E\)
  • Information: \(I + H = C\)

Physical Analogy

Classical Mechanics Information System
Kinetic energy \(T\) Joint entropy \(H\)
Potential energy \(V\) Multi-information \(I\)
Conservation: \(T + V = E\) Conservation: \(H + I = C\)

System “rolls downhill” from correlation to entropy

The Information Relaxation Principle

Information Relaxation:

  • Second law: Entropy increases (\(\dot{H} > 0\))
  • Conservation: \(I + H = C\) (constant)
  • Therefore: Correlation decreases (\(\dot{I} < 0\))

Physical intuition:

  • Compressed spring \(\rightarrow\) released energy
  • Correlated variables \(\rightarrow\) independent variables
  • Potential \(\rightarrow\) kinetic

Visualisation: Relaxation Dynamics

Information Relaxation Dynamics

Key Features:

  • Multi-information \(I\) decreases (correlations break)
  • Joint entropy \(H\) increases (disorder grows)
  • Conservation: \(I + H = C\) (black line)
  • Marginals: \(h_1\), \(h_2\) constant (inaccessible!)

Internal reorganisation invisible to external observer

Connection to Marginal Entropy Conservation

Key Insight:

Conservation \(\sum h_i = C\) \(\iff\) \(I + H = C\)

  • External view: Marginals constant (inaccessible)
  • Internal view: \(I \leftrightarrow H\) (dynamic redistribution)

Dynamics = trading correlation for entropy

Why This Matters for Dynamics

Implications:

  1. Direction: High \(I\) \(\rightarrow\) High \(H\)
  2. Constraint: Only paths with \(I + H = C\) allowed
  3. Coordinates: \(I\) and \(H\) are natural
  4. Variational principle: Derive dynamics from conservation

Information Relaxation

From Information Relaxation to Maximum Entropy Production

Question: How exactly does the system relax?

Answer: Through maximum entropy production

The Direction of Time: Entropy Increases

Second Law: \[ \dot{H} \geq 0 \]

Conservation: \[ I + H = C \]

Therefore: \[ \dot{I} \leq 0 \]

Correlations must decrease

Maximum Entropy Production Principle

Maximum Entropy Production (MEP):

  • Subject to constraints
  • Maximize \(\dot{H}\)
  • Steepest path to equilibrium

Observed across physics:

  • Thermodynamics (Beretta, Ziegler)
  • Fluid mechanics
  • Self-organising systems

Natural Parameters and the Entropy Gradient

Entropy in Natural Parameters: \[ H(\boldsymbol{\theta}) = \mathcal{A}(\boldsymbol{\theta}) - \boldsymbol{\theta}^\top \nabla \mathcal{A}(\boldsymbol{\theta}) \]

Gradient (steepest increase): \[ \nabla_{\boldsymbol{\theta}} H = -G(\boldsymbol{\theta})\boldsymbol{\theta} \]

Fisher information emerges

The MEP Dynamics

MEP Dynamics: \[ \dot{\boldsymbol{\theta}} = -G(\boldsymbol{\theta})\boldsymbol{\theta} \]

  • Gradient ascent on entropy
  • Fisher metric determines the flow
  • Automatically preserves \(\sum h_i\) (for right structure)

Natural dynamics from information geometry

Why This Is the Unique Dynamics

Why This Dynamics?

  1. Steepest ascent (Euclidean, not natural gradient)
  2. Maximizes entropy production rate
  3. Conserves marginal entropies (special case)
  4. Matches thermodynamic steepest ascent

Determined by information relaxation + MEP

The Information Relaxation Picture

Information Relaxation:

Start: High \(I\), low \(H\) (correlated, tense)

\(\dot{\boldsymbol{\theta}} = -G\boldsymbol{\theta}\) (MEP)

End: Low \(I\), high \(H\) (independent, relaxed)

Throughout: \(\sum h_i = C\) (inaccessible to observer)

Connection to Physical Intuition

Physical Analogy: Gas Diffusion

Gas Molecules Information System
Concentrated in corner High correlation (\(I\))
Diffuse throughout room Entropy increases (\(H ↑\))
Uniform distribution Low correlation (independent)
Conservation: energy Conservation: \(\sum h_i\)

Same principle, different space

Preview: Constrained Gradient Flow

Next Steps:

  • Lecture 4: Adding Lagrangian constraints
  • Lecture 5: Poisson structure (conservation)
  • Lecture 8: Full GENERIC framework

MEP + constraints = complete dynamics

Constrained Maximum Entropy Production

Information Relaxation Principle:

Among all paths with \(\sum h_i = C\):

\[\text{Follow path that maximizes } \dot{H}\]

Steepest entropy ascent on constraint surface

Unconstrained vs Constrained Dynamics

Unconstrained MEP:

\[\dot{\boldsymbol{\theta}} = \nabla H = -G(\boldsymbol{\theta})\boldsymbol{\theta}\]

  • Natural gradient ascent
  • Flows to \(\boldsymbol{\theta} = \mathbf{0}\) (max entropy)
  • No constraint enforcement

Adding the Constraint

Lagrangian Formulation:

\[\mathscr{L}(\boldsymbol{\theta}, \nu) = -H + \nu\left(\sum h_i - C\right)\]

  • \(\nu\): Lagrange multiplier
  • Enforces constraint
  • Projects onto tangent space

The Constrained Dynamics

Constrained Dynamics:

\[\dot{\boldsymbol{\theta}} = -G\boldsymbol{\theta} + \nu(\tau) \mathbf{a}\]

where \(\mathbf{a} = \nabla\left(\sum_i h_i\right)\)

Constraint maintenance: \[\mathbf{a}^\top \dot{\boldsymbol{\theta}} = 0\]

Solving for the Lagrange Multiplier

Solution:

\[\nu(\tau) = \frac{\mathbf{a}^\top G\boldsymbol{\theta}}{\|\mathbf{a}\|^2}\]

Projection Form:

\[\dot{\boldsymbol{\theta}} = -\Pi_\parallel G\boldsymbol{\theta}\]

where \(\Pi_\parallel\) projects onto tangent space

Physical Interpretation

Physical Picture:

  • \(-G\boldsymbol{\theta}\): natural entropy gradient
  • \(\nu\mathbf{a}\): constraint force
  • \(\nu\) small: flow naturally tangent
  • \(\nu\) large: flow fights constraint

Balance between geometry and constraint

Entropy Time (Internal Clock)

Entropy Time:

  • Avoid an externally supplied clock
  • Use entropy production to parameterise flow
  • Scale/offset are just unit conventions

Emergent Structure: GENERIC

What is GENERIC?

The GENERIC Framework

What We’ve Seen Emerge

Lecture 5: Energy conservation \(\rightarrow\) antisymmetric \(A\)

Lecture 6-7: Linearisation \(\rightarrow\) \(M = S + A\) split

Question: Is this structure universal?

Answer: YES! \(\rightarrow\) GENERIC framework

Historical Context: Non-Equilibrium Thermodynamics

Non-Equilibrium Challenge (1980s-90s)

Real systems are both: * Reversible (mechanics, conservation laws) * Irreversible (thermodynamics, dissipation)

Examples: * Fluids: momentum conservation + viscosity * Reactions: kinetics + diffusion * Materials: elasticity + plasticity

Solution: GENERIC framework (Grmela and Öttinger (1997), Öttinger and Grmela (1997))

What Problem Does GENERIC Solve?

Two Worlds?

Classical Mechanics Classical Thermodynamics
Time reversible Time irreversible
Energy conserved Entropy increases
Antisymmetric ops Symmetric ops
Poisson structure Dissipation

Problem: Real systems do both.

Pendulum with friction: Angular momentum (reversible) + heat loss (irreversible)

The GENERIC Answer: Coexistence Requires Structure

GENERIC Answer

Reversible + Irreversible can coexist

Requirements: 1. Consistent energy & entropy 2. Second law: \(\dot{S} \geq 0\) 3. Conserved quantities respected 4. Constraints (Casimirs) obeyed

Key: Can’t add arbitrarily \(\rightarrow\) need degeneracy conditions

Remarkable: In typical GENERIC, degeneracy conditions are HARD to satisfy (must engineer carefully)

Our approach (L1-7): Degeneracy conditions emerge automatically! ✓

(Axioms \(\rightarrow\) geometry \(\rightarrow\) thermodynamic consistency)

Why “GENERIC” Matters for Information Dynamics

Why GENERIC for Information Dynamics?

Structure we derived (L1-7) = Structure physicists discovered (GENERIC)

Deep connection: * Information dynamics = thermodynamics (Shannon/Jaynes) * Information dynamics = dynamical system (constraints) * GENERIC = inevitable consequence of combining both

Our system: \(\dot{\boldsymbol{\theta}} = -G\boldsymbol{\theta} - \nu a\) * \(G\): Fisher information (friction/dissipation) * \(\nu a\): Constraint dynamics (reversible structure)

We’ve been building GENERIC from scratch!

Preview: Structure of the GENERIC Equation

Coming Up: The GENERIC Equation

\[\dot{x} = L(x) \nabla E(x) + M(x) \nabla S(x)\]

  • \(L\): Poisson (antisymmetric, reversible)
  • \(M\): Friction (symmetric, irreversible)
  • \(E\): Energy (conserved by \(L\))
  • \(S\): Entropy (increased by \(M\))

Our information dynamics: * \(G \leftrightarrow M\) (Fisher = friction) * Constraints \(\leftrightarrow L\) (structure) * \(\sum h_i = C\) (Casimirs)

Structure we built = GENERIC!

The GENERIC Equation

The GENERIC Equation

\[\dot{x} = L(x) \nabla E(x) + M(x) \nabla S(x)\]

Components: * \(x\): System state * \(E\): Energy functional * \(S\): Entropy functional * \(L\): Poisson operator (reversible) * \(M\): Friction operator (irreversible)

Simple form \(\rightarrow\) Deep structure!

The Poisson Operator \(L(x)\)

Poisson Operator \(L(x)\) (Reversible part)

Properties: 1. Antisymmetric: \(\langle \nabla F, L \nabla G \rangle = -\langle \nabla G, L \nabla F \rangle\) * Time reversible

  1. Jacobi identity: \(\{F, \{G, H\}\} + \text{cyclic} = 0\)
    • Lie algebra (Lecture 5!)
  2. Conserves energy: \(\langle \nabla E, L \nabla E \rangle = 0\)
    • \(\frac{\text{d}E}{\text{d}t}|_L = 0\)

Recall L5: This IS Hamiltonian/Poisson structure!

The Friction Operator \(M(x)\)

Friction Operator \(M(x)\) (Irreversible part)

Properties:

  1. Symmetric: \(\langle \nabla F, M \nabla G \rangle = \langle \nabla G, M \nabla F \rangle\)
    • Onsager reciprocity
  2. Positive semi-definite: \(\langle \nabla F, M \nabla F \rangle \geq 0\)
    • Entropy increases: \(\dot{S}|_M \geq 0\)
  3. Conserves energy: \(\langle \nabla E, M \nabla S \rangle = 0\)
    • First degeneracy condition

Recall L7: This is our symmetric part \(S\)!

The Degeneracy Conditions

Degeneracy Conditions (Coupling)

Condition 1: \(M \nabla E = 0\) * Friction doesn’t change total energy * Only redistributes it

Condition 2: \(L \nabla S = 0\) * Hamiltonian flow doesn’t change entropy * All entropy change from dissipation

Consequences: * First law: \(\frac{\text{d}E}{\text{d}t} = 0\) ✓ * Second law: \(\frac{\text{d}S}{\text{d}t} = \langle \nabla S, M \nabla S \rangle \geq 0\)

Without these \(\rightarrow\) thermodynamics violated.

Casimir Functions and Constraints

Casimir Functions \(C_i(x)\)

\[L \nabla C_i = 0 \quad \text{AND} \quad M \nabla C_i = 0\]

  • “Super-conserved” (both parts preserve)
  • Fundamental constraints

Examples: * Momentum (mechanics) * Circulation (fluids) * Charge (electromagnetism) * \(\sum h_i = C\) (information)

Effect: Stratify state space into symplectic leaves

(Recall L5: Casimirs from symmetries)

Why This Structure?

Why This Structure?

GENERIC is the most general structure that allows: * Time-reversal (reversible part) * Second law (irreversible part)
* Energy conservation (overall) * Casimirs (constraints)

Not a choice \(\rightarrow\) Consequence of physics!

Our result: GENERIC emerged from info axioms * Axioms (L2) + MEP (L3) + Constraints (L4) * \(\rightarrow\) GENERIC structure (L5-7)

GENERIC = deep physical principle, not modeling trick

A Worked Example: Damped Harmonic Oscillator

Example: Damped Harmonic Oscillator

State: \(x = (q,p)\), Energy: \(E = \frac{p^2}{2m} + \frac{1}{2}kq^2\)

\[L = \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix}, \quad M = \begin{pmatrix} 0 & 0 \\ 0 & \gamma \end{pmatrix}\]

GENERIC: \[\dot{q} = \frac{p}{m}, \quad \dot{p} = -kq - \gamma\beta\frac{p}{m}\]

Result: \(m\ddot{q} = -kq - \gamma\beta\dot{q}\)

Damped oscillator from GENERIC! (Reversible + irreversible)

Automatic Degeneracy

Automatic Degeneracy Conditions

The GENERIC Degeneracy Requirements

Standard GENERIC Degeneracies:

  1. \(A\nabla H = 0\) (antisymmetric conserves entropy)
  2. \(S\nabla E = 0\) (symmetric conserves energy)

Problem: Hard to construct \(A\), \(S\) satisfying both.

Usually requires careful hand-crafting

First Degeneracy: Automatic from Tangency

First Degeneracy (Automatic):

Constraint maintenance: \(\mathbf{a}^\top\dot{\boldsymbol{\theta}} = 0\)

\(\Rightarrow\) Dynamics tangent to surface

\(\Rightarrow\) Antisymmetric part conserves entropy

\[\boxed{A\nabla H = 0 \text{ holds automatically}}\]

By construction, not by assumption!

Second Degeneracy: From Constraint Gradient

Second Degeneracy (From Constraint):

Our constraint: \(\sum h_i = C\)

\[S\nabla\left(\sum h_i\right) = 0\]

Marginal entropy plays role of energy

In thermodynamic limit: \(\nabla\left(\sum h_i\right) \parallel \nabla E\)

Energy-Entropy Equivalence

Thermodynamic Limit:

\[\nabla\left(\sum h_i\right) \parallel \nabla E\]

Therefore: \[S\mathbf{a} = 0 \Leftrightarrow S\nabla E = 0\]

Information framework \(\rightarrow\) classical thermodynamics

Why This Matters

Why Automatic Matters:

  1. No guesswork—structure emerges
  2. Global validity—true everywhere
  3. Information-first foundation
  4. GENERIC as fundamental principle

Flips the usual derivation: \[\text{Information axioms} \Rightarrow \text{Thermodynamics}\]

Information Topography

Fisher Information as Conductance Tensor

The Electrical Circuit Analogy

Kirchhoff Networks: * Local charge conservation: \(\sum_j I_{ij} = 0\) * Ohm’s law: \(I_{ij} = g_{ij}(V_i - V_j)\) * Fixed conductances \(g_{ij}\) * Linear equations → steady state

Information Networks are Different

Information Conservation: \[\sum_{i=1}^n \log([G^{-1}]_{ii}) = C\]

  • Nonlocal: Every variable coupled through \(G^{-1}\)
  • Nonlinear: Logarithm and matrix inversion
  • Global: Changing \(\theta_i\) affects all \(h_j\)

Dynamic Information Topography

Dynamic Topography: * \(G(\boldsymbol{\theta})\) changes with state * Not fixed conductances! * Analogous to memristive networks * “Conductances” and “voltages” co-evolve

Information Channels and Bottlenecks

Channel Capacity: * Large \(\lambda_i\): easy information flow * Small \(\lambda_i\): information bottlenecks * Eigenvectors: flow directions

Generalised Ohm’s Law: \[\dot{\boldsymbol{\theta}} = -G(\boldsymbol{\theta})\boldsymbol{\theta} + \nu \mathbf{a}\]

Why “Topography?”

Information Topography: * Geography: terrain shapes water flow * Information: \(G(\boldsymbol{\theta})\) shapes information flow * Formalizes Atomic Human metaphor * Mathematical teeth for intuitive concept

Formalising Information Topography

From Metaphor to Mathematics:

Atomic Human: “Information topography” = intuitive concept

Inaccessible Game: Fisher information = formal definition

Mathematical Definition

Definition:

Information Topography = \((G(\boldsymbol{\theta}), \mathcal{M})\)

where \(G(\boldsymbol{\theta}) = \nabla^2A\)

Determines: * Distances between distributions * Flow directions (geodesics) * Channel capacities (eigenvalues) * Bottlenecks (small eigenvalues)

How It Constrains Information Movement

Three Constraints:

  1. Metric: \(\text{d}s^2 = \text{d}\boldsymbol{\theta}^\top G \text{d}\boldsymbol{\theta}\)
    • Small eigenvalues = narrow passages
  2. Gradient: \(\nabla H = -G\boldsymbol{\theta}\)
    • Fisher info determines slope
  3. Conservation: \(\dot{\boldsymbol{\theta}} = -\Pi_\parallel G\boldsymbol{\theta}\)
    • Projection onto constraint surface

Dynamic Topography

Dynamic Evolution:

\[\boldsymbol{\theta}(t) \rightarrow G(\boldsymbol{\theta}(t)) \rightarrow \dot{\boldsymbol{\theta}}(t) \rightarrow \boldsymbol{\theta}(t+\text{d}t)\]

  • Topography shapes flow
  • Flow changes state
  • State reshapes topography
  • Landscape evolves as you move!

Fisher Information as Geometry

From last section: \[ G(\boldsymbol{\theta}) = \nabla^2 \mathcal{A}(\boldsymbol{\theta}) = \mathrm{Cov}_{\boldsymbol{\theta}}[T(\mathbf{x})] \] * Now: What does this mean geometrically?

The Statistical Manifold

Statistical Manifold: * Each point \(\boldsymbol{\theta}\) = a probability distribution * Space of all distributions = curved manifold * Fisher information = metric (ruler) on this space * Measures “closeness” between distributions

Information Distance

\[ \text{d}s^2 = \text{d}\boldsymbol{\theta}^\top G(\boldsymbol{\theta}) \text{d}\boldsymbol{\theta} \] * Measures information distance between distributions * Larger \(G\) = distributions more distinguishable * Smaller \(G\) = distributions harder to tell apart

Connection to Statistical Estimation

Cramér-Rao Bound: \[ \text{cov}(\hat{\boldsymbol{\theta}}) \succeq G^{-1}(\boldsymbol{\theta}) \] * \(G^{-1}\) = best possible estimator covariance * High \(G\) → small \(G^{-1}\) → tight estimation * Low \(G\) → large \(G^{-1}\) → loose estimation * Geometric picture: \(G^{-1}\) is “error ellipsoid”

Why This Matters for Dynamics

Two Roles of Fisher Information: 1. Metric → defines distances between distributions 2. In gradient → \(\nabla H = -G(\boldsymbol{\theta})\boldsymbol{\theta}\)

\[ \dot{\boldsymbol{\theta}} = \nabla H = -G(\boldsymbol{\theta})\boldsymbol{\theta} \]

Examples Revisited

Gaussian: Geometry of Covariance

Gaussian: \(G(\boldsymbol{\theta}) = \Sigma\) * Information metric = covariance * \(G^{-1} = \Sigma^{-1}\) = precision
* Information ellipsoid = probability ellipsoid * Special to Gaussians in natural parameters

Categorical: Simplex Geometry

Categorical: \[ G_{ij} = \delta_{ij}\pi_i - \pi_i\pi_j \] * Defines probability simplex geometry * Center of simplex: balanced information * Corners: concentrated information * Metric captures curvature

Information Geometry: The Big Picture

Information Geometry: * Fisher metric → Riemannian geometry * Exponential families → dually flat structure * Geodesics → shortest paths between distributions * Zero curvature → special “flat” structure * Key for constrained dynamics later

Why This Matters for The Inaccessible Game

Three Roles in TIG: 1. Gradient flow metric: appears in \(\dot{\boldsymbol{\theta}} = -G\boldsymbol{\theta}\) 2. Information distance: measures distinguishability 3. Emergence indicator: structure changes signal regimes

Fisher information as geometry → key to everything

Connecting Information to Energy

The Thermodynamic Limit

Energy-Entropy Equivalence in the Thermodynamic Limit

The Energy-Entropy Question

Question:

Does \(\nabla\left(\sum h_i\right) \parallel \nabla E\) ?

Connects information to thermodynamics

Conditions for Equivalence

Scaling Requirements:

Along order parameter \(m\): * \(\nabla_m I = \mathscr{O}(1)\) — intensive * \(\nabla_m H = \mathscr{O}(n)\) — extensive
* \(\nabla_m\left(\sum h_i\right) = \mathscr{O}(n)\) — extensive

As \(n \to \infty\): intensive correction negligible

Asymptotic Parallelism

In Thermodynamic Limit:

\[\nabla_m\left(\sum h_i\right) = \underbrace{\nabla_m H}_{\mathscr{O}(n)} + \underbrace{\nabla_m I}_{\mathscr{O}(1)}\]

\[\Rightarrow \nabla_m\left(\sum h_i\right) \parallel \nabla_m H\]

Intensive correction vanishes relative to extensive term

Connecting to Energy

Energy Definition:

Choose \(E(\mathbf{x}) = -\boldsymbol{\alpha}^\top T(\mathbf{x})\) with \(\boldsymbol{\theta} = -\beta\boldsymbol{\alpha}\)

\[\Rightarrow \nabla E = \frac{\nabla H}{\beta}\]

Result: \[\boxed{\nabla E \parallel \nabla H \parallel \nabla\left(\sum h_i\right)}\]

\(\beta\) emerges as inverse temperature

When Does This Hold?

Requirements:

  1. Macroscopic order parameter exists (e.g. magnetism)
  2. Finite correlation length (\(\xi < \infty\))
  3. Translation invariance
  4. Large system (\(n \to \infty\))

Not all systems satisfy these

Implications

Implications:

  • Energy = Information (in limit)
  • Temperature emerges from geometry
  • Landauer derivable
  • “It from bit” justified

Reverses usual logic: \[\text{Information theory} \Rightarrow \text{Thermodynamics}\]

GENERIC and Thermodynamics

GENERIC as Generalized Thermodynamics

Hierarchy of Thermodynamics

Classical: Equilibrium only, no dynamics

Linear irreversible: Near equilibrium, linear response

GENERIC: Full dynamics, far from equilibrium

\[\text{Classical} \subset \text{Linear} \subset \text{GENERIC}\]

GENERIC = completion of thermodynamics!

The Laws of Thermodynamics in GENERIC

Laws in GENERIC

0th Law: Equilibrium transitivity (uniqueness)

1st Law: Energy conserved \[\frac{\text{d}E}{\text{d}t} = 0\] * From antisymmetry + degeneracy 1

2nd Law: Entropy increases \[\frac{\text{d}S}{\text{d}t} \geq 0\] * From degeneracy 2 + positive semi-definite

Laws = consequences of GENERIC structure!

Onsager Reciprocity Relations

Onsager Reciprocity

Near equilibrium: flux = response × force

\[J = L X\]

GENERIC: \(M\) is symmetric → \(L_{ij} = L_{ji}\)

Onsager reciprocity = consequence of GENERIC!

(Not separate postulate, follows from structure)

Historical: Onsager (1931) → GENERIC (1997)

Now understood as special case!

Entropy Production

Entropy Production

\[\frac{\text{d}S}{\text{d}t} = \sigma_S = \langle \nabla S, M \nabla S \rangle \geq 0\]

Properties: * Non-negative (always) * Zero at equilibrium * Measures irreversibility

Information dynamics: \[\sigma_S = \boldsymbol{\theta}^\top G^2 \boldsymbol{\theta}\]

Fisher \(G\) = rate of entropy production!

(Recall Lecture 3: Maximum entropy production)

Free Energy and Dissipation

Free Energy Dissipation

Free energy: \(\mathcal{F} = E - TS\)

Rate of change: \[\frac{\text{d}\mathcal{F}}{\text{d}t} = -T\sigma_S \leq 0\]

Free energy decreases → equilibrium at minimum

Information dynamics: \[\mathcal{F}(\boldsymbol{\theta}) = -A(\boldsymbol{\theta}) + \boldsymbol{\theta}^\top \mathbb{E}[T]\]

Dynamics = gradient descent on \(\mathcal{F}\) (under constraints)

Fluctuation-Dissipation Relations

Fluctuation-Dissipation

Theorem: Response \(\propto\) Equilibrium fluctuations

\[\chi_{ij} \propto \frac{\langle \delta x_i \delta x_j \rangle}{k_B T}\]

GENERIC: \(M\) governs both dissipation and fluctuations

Information dynamics: \[G^{-1} = \text{Cov}(T_i, T_j)\]

Fisher (dissipation) \(\leftrightarrow\) Covariance (fluctuations)

Direct manifestation of theorem!

Maximum Entropy Production Principle

Maximum Entropy Production

Principle: Non-equilibrium steady states maximize \(\dot{S}\)

Information dynamics (L3): \[\dot{S} = \max \{-\boldsymbol{\theta}^\top G\dot{\boldsymbol{\theta}} : a^\top\dot{\boldsymbol{\theta}} = 0\}\]

MEPP emerges when: 1. \(M\) related to entropy Hessian (Fisher!) 2. Constraints via Lagrange multipliers 3. No external driving

GENERIC explains when/why MEPP applies!

Connection to Non-Equilibrium Statistical Mechanics

Microscopic Foundations

GENERIC from: * Liouville equation (phase space) * BBGKY hierarchy (reductions) * Projection operators (Zwanzig-Mori)

Coarse-graining: * Fine → coarse: lose information * Reversible → irreversible * \(L\): Preserves structure * \(M\): Captures dissipation from unobserved DOF

Information dynamics = coarse-grained stat-mech!

Landauer’s Principle

Landauer’s Principle from the Inaccessible Game

Information Erasure as a Process

Bit Erasure: * Variable \(x_i \in \{0,1\}\) * Reset to standard state: \(x_i \rightarrow 0\) * Ensemble perspective: initial state random * Marginal entropy decreases: \(\Delta h(X_i) = -\log 2\)

Conservation Requires Redistribution

Conservation Constraint: \[\sum_{j \neq i} \Delta h(X_j) = +\log 2\]

Antisymmetric Part \(A\): * Reversible shuffling only * Moves information to other variables * Not true erasure!

True Erasure Requires Dissipation

True Erasure:

Must increase \(H\) (2nd law) with \(\sum h_i = C\)

\[\Rightarrow \Delta I = \Delta(\sum h_i) - \Delta H < 0\]

  • Breaks correlations
  • Requires dissipative part \(S\)
  • Cannot be purely reversible

Energy Cost from Energy-Entropy Equivalence

Energy-Entropy Equivalence:

In thermodynamic limit: \(\beta \langle E \rangle \approx \sum_i h_i\)

Erasure Cost: \[\Delta \langle E \rangle = -\frac{\log 2}{\beta} = -k_BT\log 2\]

Energy must be removed from system

Dissipation Bound

Landauer’s Principle Emerges From:

  1. Marginal entropy conservation
  2. GENERIC structure (\(S\) vs \(A\))
  3. Energy-entropy equivalence

\[\boxed{Q_{\text{dissipated}} \geq k_BT\log 2}\]

Implications for Information Engines

  • Information engines must overcome thermal noise
  • Related threshold for:
    • Information erasure
    • Information transmission
  • Temperature sets fundamental noise floor

Implications

Information-Theoretic Limits

Information-Theoretic Limits on Intelligence

Thermodynamics limits mechanical engines

Information theory limits information engines

Same kind of fundamental constraint

What Intelligent Systems Must Do

Intelligence Requires: * Acquiring information (sensing) * Storing information (memory) * Processing information (computation)
* Erasing information (memory mgmt) * Acting on information (output)

Each has thermodynamic cost

The Landauer Bound on Computation

Landauer’s Principle:

Erasing 1 bit requires: \(Q \geq k_BT\log 2\)

  • Not engineering limitation
  • Fundamental thermodynamic bound
  • Entropy must go somewhere

At room temperature: \(\sim 3 \times 10^{-21}\) Joules/bit

Implications for Computation

Brain-Scale Computation:

\(10^{15}\)-\(10^{18}\) ops/sec (Lawrence, 2017; Moravec, 1999; Sandberg and Bostrom, 2008)

\(\sim 10^{15}\) ops/sec running 1 year at 300K:

Landauer bound: \(\sim 100\) Joules (minimum)

But also need entropy production for: * Data acquisition * Data movement * Actual computation * Real (non-ideal) dissipation

Human brain: \(\sim 6 \times 10^8\) J/year (\(10^6 \times\) Landauer)

Fisher Information Bounds on Learning

Fisher Information Bounds:

Cramér-Rao: \(\text{Var}(\hat{\boldsymbol{\theta}}) \geq G^{-1}\)

  • Learning rate bounded by \(G\)
  • Some directions hard to learn (small eigenvalues)
  • Information topography constrains learning

Can’t learn faster than information geometry allows

Embodiment as Necessity, Not Limitation

Embodiment = Information Topography:

Physical substrate determines \(G(\boldsymbol{\theta})\)

  • Silicon ≠ neurons ≠ quantum systems
  • Each has different channels/bottlenecks
  • Each has different energy costs
  • Each has different bandwidths

No substrate = no intelligence!

Why Superintelligence Claims Fail

Superintelligence Violates:

  1. Fisher bounds (can’t learn infinitely fast)
  2. Physical storage limits (finite memory)
  3. Landauer bounds (computation costs energy)
  4. Entropy production (can’t be perfectly efficient)

Same as perpetual motion:

Violates fundamental physical law

A Thought on Intelligence

Perpetual Motion and Superintelligence

  • 1925: Promises of perpetual motion cars
  • 2025: Promises of superintelligence singularity
  • Same fundamental impossibility?

Why Perpetual Motion Failed

Second Law of Thermodynamics:

\[\frac{\text{d}H}{\text{d}t} \geq 0\]

  • Entropy always increases
  • No motion without entropy production
  • No work without energy input

An Equivalent Statement for Intelligence?

Maxwell’s Demon: * “Intelligent” entity that violates 2nd law * Resolution: Landauer’s principle * Information erasure requires energy

Implication: * Intelligence has thermodynamic cost * Information processing has physical limits

Superintelligence as Perpetual Motion

Perpetual Motion: Violates thermodynamics

Superintelligence Singularity: Violates information bounds

Same pattern of impossible promises

The Thermodynamic Constraint

Perpetual Motion Fails: * 2nd law: entropy increases * 1st law: energy conserved
* Efficiency limited by temperature * Fundamental, not engineering limits

The Information-Theoretic Constraint

Superintelligence Fails: * Landauer: erasure costs energy * Conservation: can’t create information * Fisher bounds: finite channel capacity * GENERIC: dissipation unavoidable * Fundamental information limits

The Recursive Self-Improvement Fallacy

Recursive Self-Improvement:

“AI makes itself smarter → makes itself better at getting smarter → runaway growth”

But requires: * Learning (Fisher-limited) * Memory (physically limited) * Computation (Landauer-limited) * Erasure (dissipative)

Embodiment as Thermodynamic Necessity

Embodiment = Information Topography:

Physical substrate → Fisher information \(G(\boldsymbol{\theta})\)

\(G\) determines: * Information flow rates * Channel capacities * Energy requirements

Why the Hype Persists

Why the Hype? * Confuse capability with unbounded intelligence * Ignore thermodynamic costs * Mistake scaling for fundamental progress * Economic incentives for bold claims

Same reasons perpetual motion had investors!

Conclusions

From Internal Adjudicability:

No-barber principle \(\Downarrow\) Information isolation: \(\sum h_i = C\) \(\Downarrow\) * GENERIC structure emerges * Energy-entropy equivalence * Landauer’s principle * Information bounds

Broader Relevance?

Implications for Theory Construction

A Thought:

Scientific theories = games against nature?

Question: Should theory rules avoid external adjudicators?

Constraint on how we construct theories

What Would This Mean?

Common External Appeals: * Pre-existing spacetime * External simultaneity * Privileged coordinates * External observers * System/environment split

No-barber asks: What if these must emerge?

Not a Grand Claim

Not claiming: * Reality = inaccessible game * All theories need no-barber * This solves foundations

Offering: * Principled constraint * Exploration of consequences * Mathematical illumination

Question, not conclusion

David MacKay’s Legacy

MacKay’s Approach:

  • Make assumptions explicit
  • Explore consequences rigorously
  • Let mathematics reveal structure
  • Use reasoning to illuminate constraints

This work continues that tradition

What This Selects

No-Barber Selects:

  1. \(\sum h_i = C\) (information isolation)
  2. Von Neumann entropy (outcome-independent)
  3. Max entropy production (internal ordering)
  4. Qutrit substrate (\(d=3\) optimal)
  5. Countable system (no arbitrary \(N\))

According to (Lawrence-nobarber26?).

Open Questions

Open Questions:

  • Formalise no barber principle
  • What is the stage/game board/space

Much to explore

Thanks!

References

Baez, J.C., Fritz, T., Leinster, T., 2011. A characterization of entropy in terms of information loss. Entropy 13, 1945–1957. https://doi.org/10.3390/e13111945
Grmela, M., Öttinger, H.C., 1997. Dynamics and thermodynamics of complex fluids. I. Development of a general formalism. Physical Review E 56, 6620–6632. https://doi.org/10.1103/PhysRevE.56.6620
Lawrence, N.D., 2017. Living together: Mind and machine intelligence. arXiv.
Moravec, H., 1999. Robot: Mere machine to transcendent mind. Oxford University Press, New York.
Öttinger, H.C., Grmela, M., 1997. Dynamics and thermodynamics of complex fluids. II. Illustrations of a general formalism. Physical Review E 56, 6633–6655. https://doi.org/10.1103/PhysRevE.56.6633
Sandberg, A., Bostrom, N., 2008. Whole brain emulation: A roadmap (Technical Report No. 2008-3). Future of Humanity Institute, Oxford University.