Understanding Limits Through the Inaccessible Game
Cambridge Philosophical Society - David MacKay Memorial Meeting, Cambridge University Engineering Department
Second Law of Thermodynamics:
\[\frac{\text{d}H}{\text{d}t} \geq 0\]
Maxwell’s Demon: * “Intelligent” entity that violates 2nd law * Resolution: Landauer’s principle * Information erasure requires energy
Implication: * Intelligence has thermodynamic cost * Information processing has physical limits
David MacKay (1967-2016) * Information theory and
inference * Neural networks and learning algorithms
* Sustainable energy and physical limits * Cut through hype with careful
reasoning
David’s Approach: * Start with fundamental principles * Build rigorous mathematical framework * Apply to real systems * Use numbers to test claims
Today: Apply this to information & energy
Entropy \[ S(X) = -\sum_X \rho(X) \log p(X) \]
In thermodynamics preceded by Boltzmann’s constant, \(k_B\)
Where \[ E_\rho\left[T(Z)\right] = \nabla_\boldsymbol{\theta}A(\boldsymbol{\theta}) \] because \(A(\boldsymbol{\theta})\) is log partition function.
operates as a cummulant generating function for \(\rho(Z)\).
Joint entropy can be decomposed \[ S(Z) = S(X,M) = S(X|M) + S(M) = S(X) - I(X;M) + S(M) \]
Mutual information \(I(X;M)\) connects information and energy
Measurement changes system entropy by \(-I(X;M)\)
Increases available energy
Difference in available energy: \[ \Delta A = A(X) - A(X|M) = I(X;M) \]
Can recover \(k_B T \cdot I(X;M)\) in work from the system
Thermodynamics limits mechanical engines
Information theory limits information engines
Same kind of fundamental constraint
Intelligence Requires: * Acquiring information
(sensing) * Storing information (memory) * Processing information
(computation)
* Erasing information (memory mgmt) * Acting on information (output)
Each has thermodynamic cost
Landauer’s Principle:
Erasing 1 bit requires: \(Q \geq k_BT\log 2\)
At room temperature: \(\sim 3 \times 10^{-21}\) Joules/bit
Brain-Scale Computation:
\(10^{15}\)-\(10^{18}\) ops/sec (Lawrence, 2017; Moravec, 1999; Sandberg and Bostrom, 2008)
\(\sim 10^{15}\) ops/sec running 1 year at 300K:
Landauer bound: \(\sim 100\) Joules (minimum)
But also need entropy production for: * Data acquisition * Data movement * Actual computation * Real (non-ideal) dissipation
Human brain: \(\sim 6 \times 10^8\) J/year (\(10^6 \times\) Landauer)
Fisher Information Bounds:
Cramér-Rao: \(\text{Var}(\hat{\boldsymbol{\theta}}) \geq G^{-1}\)
Can’t learn faster than information geometry allows
Embodiment = Information Topography:
Physical substrate determines \(G(\boldsymbol{\theta})\)
No substrate = no intelligence!
Superintelligence Violates:
Same as perpetual motion:
Violates fundamental physical law
Baez et al. (2011): * Entropy from category theory * Three axioms uniquely determine information loss * No probability needed initially
Axiom 1: Functoriality \[F(f \circ g) = F(f) + F(g)\]
Axiom 2: Convex Linearity \[F(\lambda f \oplus (1-\lambda)g) = \lambda F(f) + (1-\lambda)F(g)\]
Axiom 3: Continuity
Theorem:
Three axioms \(\Rightarrow\) unique form: \[F(f) = c(H(p) - H(q))\]
Physical Analogy: * Isolated chamber * Mass conserved: \(\sum m_i = \text{const}\) * Energy conserved: \(\sum E_i = \text{const}\) * Information conserved?
Axiom 4: Information Conservation \[ \sum_{i=1}^N h_i = C \] * \(h_i\) = marginal entropy of variable \(i\) * \(C\) = conservation constant * Total information conserved * Information can redistribute
Why Marginal? \[ H(\mathbf{x}) = \sum_{i} h_i - I(\mathbf{x}) \] * Conserve: \(\sum h_i = C\) * \(I\) (multi-information) can change * \(H\) (joint entropy) can change * Variables can correlate/decorrelate * Total capacity \(\sum h_i\) fixed
Exchangeability: * Consider any finite subset * Constraint applies equally to all * No special variables * Can handle infinite systems * Different from Bayesian exchangeability
Physical Picture: * \(h_i\) = information capacity of variable \(i\) * \(\sum h_i = C\) = total fixed capacity * \(I\) = correlation/structure (can vary) * Information flows but total conserved * Can “buy correlations” with capacity
Creates “Inaccessibility”:
Baez: Info gained = entropy change
Conservation: \(\sum h_i = C\) (constant)
Observer: \(\Delta(\sum h_i) = 0\) → learns nothing!
Four Axioms Together:
Baez (1-3): * Functoriality * Convex linearity
* Continuity * → Entropy measures information
New (4): * Information conservation: \(\sum h_i = C\) * → Constrained dynamics
Next (L3): Derive dynamics from axioms
The Inaccessible Game: * System isolated from observation * External observer cannot extract information * Internal state is inaccessible * Zero-player game with information-theoretic rules
Why “Inaccessible”?
From BLF axioms: Info gained = \(H(p) - H(q)\)
Our axiom: \(\sum h_i = C\) (constant)
\[\Delta(\sum h_i) = 0 \Rightarrow \text{observer learns nothing!}\]
Game Characteristics: * Zero-player game * State = probability distribution \(p(\mathbf{x}|\boldsymbol{\theta})\) * Rule = maximize entropy production * Constraint = \(\sum h_i = C\) * Dynamics = emerge from information geometry
Physical Connections: * GENERIC structure emerges * Energy ↔︎ Entropy equivalence * Landauer’s principle derivable * Bridge between information and physics
The Fourth Axiom: \[ \sum_{i=1}^N h_i = C \]
What does this conservation imply for dynamics?
Multi-Information: \[ I = \sum_{i=1}^N h_i - H \]
Measures “shared information”
Information Action Principle: \[ I + H = C \]
Conserved quantity splits into two parts
Analogy to classical mechanics:
| Classical Mechanics | Information System |
|---|---|
| Kinetic energy \(T\) | Joint entropy \(H\) |
| Potential energy \(V\) | Multi-information \(I\) |
| Conservation: \(T + V = E\) | Conservation: \(H + I = C\) |
System “rolls downhill” from correlation to entropy
Information Relaxation:
Physical intuition:
Key Features:
Internal reorganisation invisible to external observer
Key Insight:
Conservation \(\sum h_i = C\) \(\iff\) \(I + H = C\)
Dynamics = trading correlation for entropy
Implications:
Question: How exactly does the system relax?
Answer: Through maximum entropy production
Second Law: \[ \dot{H} \geq 0 \]
Conservation: \[ I + H = C \]
Therefore: \[ \dot{I} \leq 0 \]
Correlations must decrease
Maximum Entropy Production (MEP):
Observed across physics:
Entropy in Natural Parameters: \[ H(\boldsymbol{\theta}) = \mathcal{A}(\boldsymbol{\theta}) - \boldsymbol{\theta}^\top \nabla \mathcal{A}(\boldsymbol{\theta}) \]
Gradient (steepest increase): \[ \nabla_{\boldsymbol{\theta}} H = -G(\boldsymbol{\theta})\boldsymbol{\theta} \]
Fisher information emerges
MEP Dynamics: \[ \dot{\boldsymbol{\theta}} = -G(\boldsymbol{\theta})\boldsymbol{\theta} \]
Natural dynamics from information geometry
Why This Dynamics?
Determined by information relaxation + MEP
Information Relaxation:
Start: High \(I\), low \(H\) (correlated, tense)
↓ \(\dot{\boldsymbol{\theta}} = -G\boldsymbol{\theta}\) (MEP)
End: Low \(I\), high \(H\) (independent, relaxed)
Throughout: \(\sum h_i = C\) (inaccessible to observer)
Physical Analogy: Gas Diffusion
| Gas Molecules | Information System |
|---|---|
| Concentrated in corner | High correlation (\(I\)) |
| Diffuse throughout room | Entropy increases (\(H ↑\)) |
| Uniform distribution | Low correlation (independent) |
| Conservation: energy | Conservation: \(\sum h_i\) |
Same principle, different space
Next Steps:
MEP + constraints = complete dynamics
Information Relaxation Principle:
Among all paths with \(\sum h_i = C\):
\[\text{Follow path that maximizes } \dot{H}\]
Steepest entropy ascent on constraint surface
Unconstrained MEP:
\[\dot{\boldsymbol{\theta}} = \nabla H = -G(\boldsymbol{\theta})\boldsymbol{\theta}\]
Lagrangian Formulation:
\[\mathscr{L}(\boldsymbol{\theta}, \nu) = -H + \nu\left(\sum h_i - C\right)\]
Constrained Dynamics:
\[\dot{\boldsymbol{\theta}} = -G\boldsymbol{\theta} + \nu(\tau) \mathbf{a}\]
where \(\mathbf{a} = \nabla\left(\sum_i h_i\right)\)
Constraint maintenance: \[\mathbf{a}^\top \dot{\boldsymbol{\theta}} = 0\]
Solution:
\[\nu(\tau) = \frac{\mathbf{a}^\top G\boldsymbol{\theta}}{\|\mathbf{a}\|^2}\]
Projection Form:
\[\dot{\boldsymbol{\theta}} = -\Pi_\parallel G\boldsymbol{\theta}\]
where \(\Pi_\parallel\) projects onto tangent space
Physical Picture:
Balance between geometry and constraint
What We’ve Seen Emerge
Lecture 5: Energy conservation \(\rightarrow\) antisymmetric \(A\)
Lecture 6-7: Linearisation \(\rightarrow\) \(M = S + A\) split
Question: Is this structure universal?
Answer: YES! \(\rightarrow\) GENERIC framework
Non-Equilibrium Challenge (1980s-90s)
Real systems are both: * Reversible (mechanics, conservation laws) * Irreversible (thermodynamics, dissipation)
Examples: * Fluids: momentum conservation + viscosity * Reactions: kinetics + diffusion * Materials: elasticity + plasticity
Solution: GENERIC framework (Grmela and Öttinger (1997), Öttinger and Grmela (1997))
Two Worlds?
| Classical Mechanics | Classical Thermodynamics |
|---|---|
| Time reversible | Time irreversible |
| Energy conserved | Entropy increases |
| Antisymmetric ops | Symmetric ops |
| Poisson structure | Dissipation |
Problem: Real systems do both.
Pendulum with friction: Angular momentum (reversible) + heat loss (irreversible)
GENERIC Answer
Reversible + Irreversible can coexist
Requirements: 1. Consistent energy & entropy 2. Second law: \(\dot{S} \geq 0\) 3. Conserved quantities respected 4. Constraints (Casimirs) obeyed
Key: Can’t add arbitrarily \(\rightarrow\) need degeneracy conditions
Remarkable: In typical GENERIC, degeneracy conditions are HARD to satisfy (must engineer carefully)
Our approach (L1-7): Degeneracy conditions emerge automatically! ✓
(Axioms \(\rightarrow\) geometry \(\rightarrow\) thermodynamic consistency)
Why GENERIC for Information Dynamics?
Structure we derived (L1-7) = Structure physicists discovered (GENERIC)
Deep connection: * Information dynamics = thermodynamics (Shannon/Jaynes) * Information dynamics = dynamical system (constraints) * GENERIC = inevitable consequence of combining both
Our system: \(\dot{\boldsymbol{\theta}} = -G\boldsymbol{\theta} - \nu a\) * \(G\): Fisher information (friction/dissipation) * \(\nu a\): Constraint dynamics (reversible structure)
We’ve been building GENERIC from scratch!
Coming Up: The GENERIC Equation
\[\dot{x} = L(x) \nabla E(x) + M(x) \nabla S(x)\]
Our information dynamics: * \(G \leftrightarrow M\) (Fisher = friction) * Constraints \(\leftrightarrow L\) (structure) * \(\sum h_i = C\) (Casimirs)
Structure we built = GENERIC!
The GENERIC Equation
\[\dot{x} = L(x) \nabla E(x) + M(x) \nabla S(x)\]
Components: * \(x\): System state * \(E\): Energy functional * \(S\): Entropy functional * \(L\): Poisson operator (reversible) * \(M\): Friction operator (irreversible)
Simple form \(\rightarrow\) Deep structure!
Poisson Operator \(L(x)\) (Reversible part)
Properties: 1. Antisymmetric: \(\langle \nabla F, L \nabla G \rangle = -\langle \nabla G, L \nabla F \rangle\) * Time reversible
Recall L5: This IS Hamiltonian/Poisson structure!
Friction Operator \(M(x)\) (Irreversible part)
Properties:
Recall L7: This is our symmetric part \(S\)!
Degeneracy Conditions (Coupling)
Condition 1: \(M \nabla E = 0\) * Friction doesn’t change total energy * Only redistributes it
Condition 2: \(L \nabla S = 0\) * Hamiltonian flow doesn’t change entropy * All entropy change from dissipation
Consequences: * First law: \(\frac{\text{d}E}{\text{d}t} = 0\) ✓ * Second law: \(\frac{\text{d}S}{\text{d}t} = \langle \nabla S, M \nabla S \rangle \geq 0\) ✓
Without these \(\rightarrow\) thermodynamics violated.
Casimir Functions \(C_i(x)\)
\[L \nabla C_i = 0 \quad \text{AND} \quad M \nabla C_i = 0\]
Examples: * Momentum (mechanics) * Circulation (fluids) * Charge (electromagnetism) * \(\sum h_i = C\) (information)
Effect: Stratify state space into symplectic leaves
(Recall L5: Casimirs from symmetries)
Why This Structure?
GENERIC is the most general structure that allows: *
Time-reversal (reversible part) * Second law (irreversible part)
* Energy conservation (overall) * Casimirs (constraints)
Not a choice \(\rightarrow\) Consequence of physics!
Our result: GENERIC emerged from info axioms * Axioms (L2) + MEP (L3) + Constraints (L4) * \(\rightarrow\) GENERIC structure (L5-7)
GENERIC = deep physical principle, not modeling trick
Example: Damped Harmonic Oscillator
State: \(x = (q,p)\), Energy: \(E = \frac{p^2}{2m} + \frac{1}{2}kq^2\)
\[L = \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix}, \quad M = \begin{pmatrix} 0 & 0 \\ 0 & \gamma \end{pmatrix}\]
GENERIC: \[\dot{q} = \frac{p}{m}, \quad \dot{p} = -kq - \gamma\beta\frac{p}{m}\]
Result: \(m\ddot{q} = -kq - \gamma\beta\dot{q}\)
Damped oscillator from GENERIC! (Reversible + irreversible)
Standard GENERIC Degeneracies:
Problem: Hard to construct \(A\), \(S\) satisfying both.
Usually requires careful hand-crafting
First Degeneracy (Automatic):
Constraint maintenance: \(\mathbf{a}^\top\dot{\boldsymbol{\theta}} = 0\)
\(\Rightarrow\) Dynamics tangent to surface
\(\Rightarrow\) Antisymmetric part conserves entropy
\[\boxed{A\nabla H = 0 \text{ holds automatically}}\]
By construction, not by assumption!
Second Degeneracy (From Constraint):
Our constraint: \(\sum h_i = C\)
\[S\nabla\left(\sum h_i\right) = 0\]
Marginal entropy plays role of energy
In thermodynamic limit: \(\nabla\left(\sum h_i\right) \parallel \nabla E\)
Thermodynamic Limit:
\[\nabla\left(\sum h_i\right) \parallel \nabla E\]
Therefore: \[S\mathbf{a} = 0 \Leftrightarrow S\nabla E = 0\]
Information framework \(\rightarrow\) classical thermodynamics
Why Automatic Matters:
Flips the usual derivation: \[\text{Information axioms} \Rightarrow \text{Thermodynamics}\]
Harmonic Oscillator with Thermalisation
Key Features:
System: 3 binary variables
Validation Results:
Kirchhoff Networks: * Local charge conservation: \(\sum_j I_{ij} = 0\) * Ohm’s law: \(I_{ij} = g_{ij}(V_i - V_j)\) * Fixed conductances \(g_{ij}\) * Linear equations → steady state
Information Conservation: \[\sum_{i=1}^n \log([G^{-1}]_{ii}) = C\]
Dynamic Topography: * \(G(\boldsymbol{\theta})\) changes with state * Not fixed conductances! * Analogous to memristive networks * “Conductances” and “voltages” co-evolve
Channel Capacity: * Large \(\lambda_i\): easy information flow * Small \(\lambda_i\): information bottlenecks * Eigenvectors: flow directions
Generalised Ohm’s Law: \[\dot{\boldsymbol{\theta}} = -G(\boldsymbol{\theta})\boldsymbol{\theta} + \nu \mathbf{a}\]
Information Topography: * Geography: terrain shapes water flow * Information: \(G(\boldsymbol{\theta})\) shapes information flow * Formalizes Atomic Human metaphor * Mathematical teeth for intuitive concept
From Metaphor to Mathematics:
Atomic Human: “Information topography” = intuitive concept
Inaccessible Game: Fisher information = formal definition
Definition:
Information Topography = \((G(\boldsymbol{\theta}), \mathcal{M})\)
where \(G(\boldsymbol{\theta}) = \nabla^2A\)
Determines: * Distances between distributions * Flow directions (geodesics) * Channel capacities (eigenvalues) * Bottlenecks (small eigenvalues)
Three Constraints:
Dynamic Evolution:
\[\boldsymbol{\theta}(t) \rightarrow G(\boldsymbol{\theta}(t)) \rightarrow \dot{\boldsymbol{\theta}}(t) \rightarrow \boldsymbol{\theta}(t+\text{d}t)\]
From last section: \[ G(\boldsymbol{\theta}) = \nabla^2 \mathcal{A}(\boldsymbol{\theta}) = \mathrm{Cov}_{\boldsymbol{\theta}}[T(\mathbf{x})] \] * Now: What does this mean geometrically?
Statistical Manifold: * Each point \(\boldsymbol{\theta}\) = a probability distribution * Space of all distributions = curved manifold * Fisher information = metric (ruler) on this space * Measures “closeness” between distributions
\[ \text{d}s^2 = \text{d}\boldsymbol{\theta}^\top G(\boldsymbol{\theta}) \text{d}\boldsymbol{\theta} \] * Measures information distance between distributions * Larger \(G\) = distributions more distinguishable * Smaller \(G\) = distributions harder to tell apart
Cramér-Rao Bound: \[ \text{cov}(\hat{\boldsymbol{\theta}}) \succeq G^{-1}(\boldsymbol{\theta}) \] * \(G^{-1}\) = best possible estimator covariance * High \(G\) → small \(G^{-1}\) → tight estimation * Low \(G\) → large \(G^{-1}\) → loose estimation * Geometric picture: \(G^{-1}\) is “error ellipsoid”
Two Roles of Fisher Information: 1. Metric → defines distances between distributions 2. In gradient → \(\nabla H = -G(\boldsymbol{\theta})\boldsymbol{\theta}\)
\[ \dot{\boldsymbol{\theta}} = \nabla H = -G(\boldsymbol{\theta})\boldsymbol{\theta} \]
Gaussian: \(G(\boldsymbol{\theta}) = \Sigma\) *
Information metric = covariance * \(G^{-1} =
\Sigma^{-1}\) = precision
* Information ellipsoid = probability ellipsoid * Special to Gaussians
in natural parameters
Categorical: \[ G_{ij} = \delta_{ij}\pi_i - \pi_i\pi_j \] * Defines probability simplex geometry * Center of simplex: balanced information * Corners: concentrated information * Metric captures curvature
Information Geometry: * Fisher metric → Riemannian geometry * Exponential families → dually flat structure * Geodesics → shortest paths between distributions * Zero curvature → special “flat” structure * Key for constrained dynamics later
Three Roles in TIG: 1. Gradient flow metric: appears in \(\dot{\boldsymbol{\theta}} = -G\boldsymbol{\theta}\) 2. Information distance: measures distinguishability 3. Emergence indicator: structure changes signal regimes
Fisher information as geometry → key to everything
Question:
Does \(\nabla\left(\sum h_i\right) \parallel \nabla E\) ?
Connects information to thermodynamics
Scaling Requirements:
Along order parameter \(m\): * \(\nabla_m I = \mathscr{O}(1)\) — intensive *
\(\nabla_m H = \mathscr{O}(n)\) —
extensive
* \(\nabla_m\left(\sum h_i\right) =
\mathscr{O}(n)\) — extensive
As \(n \to \infty\): intensive correction negligible
In Thermodynamic Limit:
\[\nabla_m\left(\sum h_i\right) = \underbrace{\nabla_m H}_{\mathscr{O}(n)} + \underbrace{\nabla_m I}_{\mathscr{O}(1)}\]
\[\Rightarrow \nabla_m\left(\sum h_i\right) \parallel \nabla_m H\]
Intensive correction vanishes relative to extensive term
Energy Definition:
Choose \(E(\mathbf{x}) = -\boldsymbol{\alpha}^\top T(\mathbf{x})\) with \(\boldsymbol{\theta} = -\beta\boldsymbol{\alpha}\)
\[\Rightarrow \nabla E = \frac{\nabla H}{\beta}\]
Result: \[\boxed{\nabla E \parallel \nabla H \parallel \nabla\left(\sum h_i\right)}\]
\(\beta\) emerges as inverse temperature
Requirements:
Not all systems satisfy these
Implications:
Reverses usual logic: \[\text{Information theory} \Rightarrow \text{Thermodynamics}\]
Curie-Weiss Model
Validation Results:
Disordered phase (\(T > T_c\)): * \(m \approx 0\), \(|\nabla_m I| \approx 0\) ✓
Ordered phase (\(T < T_c\)): * \(m \neq 0\), \(|\nabla_m I| \gg 0\) ✓
Confirms theorem predictions!
Hierarchy of Thermodynamics
Classical: Equilibrium only, no dynamics
Linear irreversible: Near equilibrium, linear response
GENERIC: Full dynamics, far from equilibrium
\[\text{Classical} \subset \text{Linear} \subset \text{GENERIC}\]
GENERIC = completion of thermodynamics!
Laws in GENERIC
0th Law: Equilibrium transitivity (uniqueness)
1st Law: Energy conserved \[\frac{\text{d}E}{\text{d}t} = 0\] * From antisymmetry + degeneracy 1
2nd Law: Entropy increases \[\frac{\text{d}S}{\text{d}t} \geq 0\] * From degeneracy 2 + positive semi-definite
Laws = consequences of GENERIC structure!
Onsager Reciprocity
Near equilibrium: flux = response × force
\[J = L X\]
GENERIC: \(M\) is symmetric → \(L_{ij} = L_{ji}\)
Onsager reciprocity = consequence of GENERIC!
(Not separate postulate, follows from structure)
Historical: Onsager (1931) → GENERIC (1997)
Now understood as special case!
Entropy Production
\[\frac{\text{d}S}{\text{d}t} = \sigma_S = \langle \nabla S, M \nabla S \rangle \geq 0\]
Properties: * Non-negative (always) * Zero at equilibrium * Measures irreversibility
Information dynamics: \[\sigma_S = \boldsymbol{\theta}^\top G^2 \boldsymbol{\theta}\]
Fisher \(G\) = rate of entropy production!
(Recall Lecture 3: Maximum entropy production)
Free Energy Dissipation
Free energy: \(\mathcal{F} = E - TS\)
Rate of change: \[\frac{\text{d}\mathcal{F}}{\text{d}t} = -T\sigma_S \leq 0\]
Free energy decreases → equilibrium at minimum
Information dynamics: \[\mathcal{F}(\boldsymbol{\theta}) = -A(\boldsymbol{\theta}) + \boldsymbol{\theta}^\top \mathbb{E}[T]\]
Dynamics = gradient descent on \(\mathcal{F}\) (under constraints)
Fluctuation-Dissipation
Theorem: Response \(\propto\) Equilibrium fluctuations
\[\chi_{ij} \propto \frac{\langle \delta x_i \delta x_j \rangle}{k_B T}\]
GENERIC: \(M\) governs both dissipation and fluctuations
Information dynamics: \[G^{-1} = \text{Cov}(T_i, T_j)\]
Fisher (dissipation) \(\leftrightarrow\) Covariance (fluctuations)
Direct manifestation of theorem!
Maximum Entropy Production
Principle: Non-equilibrium steady states maximize \(\dot{S}\)
Information dynamics (L3): \[\dot{S} = \max \{-\boldsymbol{\theta}^\top G\dot{\boldsymbol{\theta}} : a^\top\dot{\boldsymbol{\theta}} = 0\}\]
MEPP emerges when: 1. \(M\) related to entropy Hessian (Fisher!) 2. Constraints via Lagrange multipliers 3. No external driving
GENERIC explains when/why MEPP applies!
Microscopic Foundations
GENERIC from: * Liouville equation (phase space) * BBGKY hierarchy (reductions) * Projection operators (Zwanzig-Mori)
Coarse-graining: * Fine → coarse: lose information * Reversible → irreversible * \(L\): Preserves structure * \(M\): Captures dissipation from unobserved DOF
Information dynamics = coarse-grained stat-mech!
Bit Erasure: * Variable \(x_i \in \{0,1\}\) * Reset to standard state: \(x_i \rightarrow 0\) * Ensemble perspective: initial state random * Marginal entropy decreases: \(\Delta h(X_i) = -\log 2\)
Conservation Constraint: \[\sum_{j \neq i} \Delta h(X_j) = +\log 2\]
Antisymmetric Part \(A\): * Reversible shuffling only * Moves information to other variables * Not true erasure!
True Erasure:
Must increase \(H\) (2nd law) with \(\sum h_i = C\)
\[\Rightarrow \Delta I = \Delta(\sum h_i) - \Delta H < 0\]
Energy-Entropy Equivalence:
In thermodynamic limit: \(\beta \langle E \rangle \approx \sum_i h_i\)
Erasure Cost: \[\Delta \langle E \rangle = -\frac{\log 2}{\beta} = -k_BT\log 2\]
Energy must be removed from system
Landauer’s Principle Emerges From:
\[\boxed{Q_{\text{dissipated}} \geq k_BT\log 2}\]
Memory \(\equiv\) Communication through time
Storage is transmission to future
Both limited by thermal noise
At Landauer’s limit: \(E = k_BT\)
Gives \(\frac{S}{N} = 1\)
Results in capacity of \(\frac{1}{2}\) bit/s
Fundamental connection between energy and information
Perpetual Motion: Violates thermodynamics
Superintelligence Singularity: Violates information bounds
Same pattern of impossible promises
Perpetual Motion Fails: * 2nd law: entropy increases
* 1st law: energy conserved
* Efficiency limited by temperature * Fundamental, not engineering
limits
Superintelligence Fails: * Landauer: erasure costs energy * Conservation: can’t create information * Fisher bounds: finite channel capacity * GENERIC: dissipation unavoidable * Fundamental information limits
Recursive Self-Improvement:
“AI makes itself smarter → makes itself better at getting smarter → runaway growth”
But requires: * Learning (Fisher-limited) * Memory (physically limited) * Computation (Landauer-limited) * Erasure (dissipative)
Embodiment = Information Topography:
Physical substrate → Fisher information \(G(\boldsymbol{\theta})\)
\(G\) determines: * Information flow rates * Channel capacities * Energy requirements
Why the Hype? * Confuse capability with unbounded intelligence * Ignore thermodynamic costs * Mistake scaling for fundamental progress * Economic incentives for bold claims
Same reasons perpetual motion had investors!
From Four Axioms:
We Derive: * GENERIC structure * Energy-entropy equivalence * Landauer’s principle * Limits on intelligence
Key Messages:
“It from bit” realized
David MacKay’s Approach:
This work continues that tradition
Open Questions:
Much to explore!
\[ I(X;M) = \sum_{x,m} \rho(x,m) \log \frac{\rho(x,m)}{\rho(x)\rho(m)}, \]