Information Engines

Hydrodynamica

Entropy Billiards

Entropy:

Maxwell’s Demon

Entropy:

Information Theory and Thermodynamics

Information theory quantifies uncertainty and information
Core concepts inspired by thermodynamic ideas
Information entropy \(\leftrightarrow\) Thermodynamic entropy
- Free energy minimization common in both domains

Entropy

Entropy \[ S(X) = -\sum_X \rho(X) \log p(X) \]
In thermodynamics preceded by Boltzmann’s constant, \(k_B\)

Exponential Family

Exponential family: \[ \rho(Z) = h(Z) \exp\left(\boldsymbol{\theta}^\top T(Z) + A(\boldsymbol{\theta})\right) \]

Entropy is, \[ S(Z) = A(\boldsymbol{\theta}) - E_\rho\left[\boldsymbol{\theta}^\top T(Z) + \log h(Z)\right] \]

Where \[ E_\rho\left[T(Z)\right] = \nabla_\boldsymbol{\theta}A(\boldsymbol{\theta}) \] because \(A(\boldsymbol{\theta})\) is log partition function.
operates as a cummulant generating function for \(\rho(Z)\).

Available Energy

Available energy: \[ A(\boldsymbol{\theta}) \]
Internal energy: \[ U(\boldsymbol{\theta}) = A(\boldsymbol{\theta}) + T S(\boldsymbol{\theta}) \]

Traditional relationship \[ A = U - TS \]
Legendre transformation of entropy

Work through Measurement

Split system \(Z\) into two parts:
- Variables \(X\) - stochastically evolving
- Memory \(M\) - low entropy partition

Joint Entropy Decomposition

Joint entropy can be decomposed \[ S(Z) = S(X,M) = S(X|M) + S(M) = S(X) - I(X;M) + S(M) \]
Mutual information \(I(X;M)\) connects information and energy

Measurement and Available Energy

Measurement changes system entropy by \(-I(X;M)\)
Increases available energy
Difference in available energy: \[ \Delta A = A(X) - A(X|M) = I(X;M) \]
Can recover \(k_B T \cdot I(X;M)\) in work from the system

Information to Work Conversion

Maxwell’s demon thought experiment in practice
Information gain \(I(X;M)\) can be converted to work
Maximum extractable work: \(W_{max} = k_B T \cdot I(X;M)\)
Measurement creates a non-equilibrium state
Information is a physical resource

The Animal Game

Intelligence as optimal uncertainty reduction
- 20 Questions game as intuitive example
- Binary search exemplifies optimal strategy
Information gain measures question quality
- Wordle as a more complex example

The 20 Questions Paradigm

Entropy Reduction and Decisions

Entropy before question: \(S(X)\)
Entropy after answer: \(S(X|M)\)
Information gain: \(I(X;M) = S(X) - S(X|M)\)
Optimal decision maximise \(I(X;M)\) per unit cost

Thermodynamic Parallels

Intelligence requires work to reduce uncertainty
Thermodynamic work reduces physical entropy
Both operate under resource constraints
Both bound by fundamental efficiency limits

Information Engines: Intelligence as an Energy-Efficiency

Information can be converted to available energy
Simple systems that exploit this are “information engines”
This provides our first model of intelligence

Measurement as a Thermodynamic Process: Information-Modified Second Law

Measurement is a thermodynamic process
Maximum extractable work: \(W_\text{ext} \leq -\Delta\mathcal{F} + k_BTI(X;M)\)
Information acquisition creates work potential

\[ I(X;M) = \sum_{x,m} \rho(x,m) \log \frac{\rho(x,m)}{\rho(x)\rho(m)}, \]

Efficacy of Feedback Control

Channel Coding Perspective on Memory

Memory acts as an information channel
Channel capacity limited by memory size: \(C \leq n\) bits
Relates to Ashby’s Law of Requisite Variety and the information bottleneck

Decomposition into Past and Future

Model Approximations and Thermodynamic Efficiency

Perfect models require infinite resources
Intelligence balances measurement against energy efficiency
Bounded rationality as thermodynamic necessity

Markov Blanket

Split system into past/present (\(X_0\)) and future (\(X_1\))
Memory \(M\) creates Markov separation when \(I(X_0;X_1|M) = 0\)
Efficient memory minimizes information loss

At What Scales Does this Apply?

Equipartition theorem: \(kT/2\) energy per degree of freedom
Information storage is a small perturbation in large systems
Most relevant at microscopic scales

Small-Scale Biochemical Systems and Information Processing

Microscopic biological systems operate where information matters
Molecular machines exploit thermal fluctuations
Information processing enables work extraction

Molecular Machines as Information Engines

ATP synthase, kinesin, photosynthetic apparatus
Convert environmental information to useful work
Example: ATP synthase uses ~3-4 protons per ATP

ATP Synthase: Nature’s Rotary Engine

Jaynes’ World

Zero-player game implementing entropy game
Distribution \(\rho(Z)\) over state space \(Z\)
State space partitioned into observables \(X\) and memory \(M\)
Entropy bounded: \(0 \leq S(Z) \leq N\)

Jaynes’ World

Unlike animal game (which reduces entropy), Jaynes’ World maximizes entropy
System evolves by ascending the entropy gradient \(S(Z)\)
Animal game: max uncertainty → min uncertainty
Jaynes’ World: min uncertainty → max uncertainty
Thought experiment: looking backward from any point
Game appears to come from minimal entropy configuration (“origin”)
Game appears to move toward maximal entropy configuration (“end”)

\[ \rho(Z) = h(Z) \exp(\boldsymbol{\theta}^\top T(Z) - A(\boldsymbol{\theta})), \] where \(h(Z)\) is the base measure, \(T(Z)\) are sufficient statistics, \(A(\boldsymbol{\theta})\) is the log-partition function, \(\boldsymbol{\theta}\) are the natural parameters of the distribution.}

Exponential Family

Jaynes showed that entropy optimization leads to exponential family distributions. \[\rho(Z) = h(Z) \exp(\boldsymbol{\theta}^\top T(Z) - A(\boldsymbol{\theta}))\]
\(h(Z)\): base measure
\(T(Z)\): sufficient statistics
\(A(\boldsymbol{\theta})\): log-partition function
\(\boldsymbol{\theta}\): natural parameters

Information Geometry

System evolves within information geometry framework
Entropy gradient \[ \nabla_{\boldsymbol{\theta}}S(Z) = \mathbf{g} = \nabla^2_\boldsymbol{\theta} A(\boldsymbol{\theta}(M)) \]

Fisher Information Matrix

Fisher information matrix \[ G(\boldsymbol{\theta}) = \nabla^2_{\boldsymbol{\theta}} A(\boldsymbol{\theta}) = \text{Cov}[T(Z)] \]
Important: We use gradient ascent, not natural gradient

Gradient Ascent

Gradient step \[ \Delta \boldsymbol{\theta} \propto \mathbf{g} \]
Natural gradient step \[ \Delta \boldsymbol{\theta} \propto \eta G(\boldsymbol{\theta})^{-1} \mathbf{g} \]

Markovian Decomposition

\(X\) divided into past/present \(X_0\) and future \(X_1\)
Conditional mutual information: \[ I(X_0; X_1 | M) = \sum_{x_0,x_1,m} p(x_0,x_1,m) \log \frac{p(x_0,x_1|m)}{p(x_0|m)p(x_1|m)} \]
Measures dependency between past and future given memory state
Perfect Markovianity: \(I(X_0; X_1 | M) = 0\)
Memory variables capture all dependencies between past and future
Tension between Markovianity and minimal entropy creates uncertainty principle

System Evolution

Start State

Low entropy, near lower bound
Highly structured information in \(M\)
Strong temporal dependencies (high non-Markovian component)
Precise values for \(\boldsymbol{\theta}\) uncertainty in other parameter characteristics
Uncertainty principle balances precision vs. capacity

End State

Maximum entropy, approaching upper bound \(N\)
Zeno’s paradox: \(\nabla_{\boldsymbol{\theta}}S \approx 0\)
Primarily Markovian dynamics
Steady state with no further entropy increase possible

Key Point

Both minimal and maximal entropy distributions belong to exponential family
This is a direct consequence of Jaynes’ entropy optimization principle
System evolves by gradient ascent in natural parameters
Uncertainty principle governs the balance between precision and capacity

Histogram Game

Four Bin Histogram Entropy Game

Two-Bin Histogram Example

Simplest example: Two-bin system
States represented by probability \(p\) (with \(1-p\) in second bin)

Entropy

Entropy \[ S(p) = -p\log p - (1-p)\log(1-p) \]
Maximum entropy at \(p = 0.5\)
Minimal entropy at \(p = 0\) or \(p = 1\)

Natural Gradients vs Steepest Ascent

\[ \Delta \theta_{\text{steepest}} = \eta \frac{\text{d}S}{\text{d}\theta} = \eta p(1-p)(\log(1-p) - \log p). \] \[ G(\theta) = p(1-p) \] \[ \Delta \theta_{\text{natural}} = \eta(\log(1-p) - \log p) \]

Four-Bin Saddle Point Example

Four-bin system creates 3D parameter space
Saddle points appear where:
- Gradient is zero
- Some directions increase entropy
- Other directions decrease entropy
Information reservoirs form in critically slowed directions

Saddle Point Example

Saddle Points

Saddle Point Seeking Behaviour

Gradient Flow and Least Action Principles

Steepest ascent of entropy ≈ Path of least action
System follows geodesics in information geometry

Information-Theoretic Action

Action integral \[ \mathcal{A} = \int L(\theta, \dot{\theta}) \text{d}t \]
Information-theoretic Lagrangian \[ L = \frac{1}{2}\dot{\theta}^\top G(\theta)\dot{\theta} - S(\theta) \]
cf Frieden (1998).

Gradient Flow and Least Action Path

Uncertainty Principle

Information reservoir variables (\(M\)) map to natural parameters \(\boldsymbol{\theta}(M)\)
Challenge: Need both precision in parameters and capacity for information

Capacity \(\leftrightarrow\) Precision Paradox

Fundamental trade-off emerges:
\(\Delta\boldsymbol{\theta}(M) \cdot \Delta c(M) \geq k\)
Cannot simultaneously have perfect precision and maximum capacity

Quantum vs Classical Information Reservoirs

Near origin: “Quantum-like” information processing
- Wave-like encoding, non-local correlations
- Uncertainty principle nearly saturated
Higher entropy: Transition to “classical” behavior
- From wave-like to particle-like information storage
- Local rather than distributed encoding

Visualising the Parameter-Capacity Uncertainty Principle

Uncertainty principle: \(\Delta\theta \cdot \Delta c \geq k\)
Minimal uncertainty states form ellipses in phase space
Quantum-like properties emerge from information constraints
Different uncertainty states visualized as probability distributions

Visualisation of the Uncertainty Principle

Conceptual Framework

Conclusion

Unifying Perspectives on Intelligence

Intelligence through multiple lenses:
- Entropy game: Intelligence as optimal questioning
- Information engines: Intelligence as energy-efficient computation
- Least action: Intelligence as path optimization
- Schrödinger’s bridge: Intelligence as probability transport
Jaynes’ world: Initial attempt to Bridge between different views.

A Unified View of Intelligence Through Information

Converging perspectives on intelligence:
- Efficient entropy reduction (Entropy Game)
- Energy-efficient information processing (Information Engines)
- Path optimization in information space (Least Action)
- Optimal probability transport (Schrödinger’s Bridge)
Unified core: Intelligence as optimal information processing
Implications:
- Fundamental limits on intelligence
- New metrics for AI systems
- Principled approach to cognitive modeling
- Information-theoretic approaches to learning

Research Directions

Open questions:
- Information-theoretic intelligence metrics
- Physical limits of intelligent systems
- Connections to quantum information theory
- Practical algorithms based on these principles
- Biological implementations of information engines
Applications:
- Active learning systems
- Energy-efficient AI
- Robust decision-making under uncertainty
- Cognitive architectures

Thanks!

References

Boltzmann, L., n.d. Über die Beziehung zwischen dem zweiten Hauptsatze der mechanischen Warmetheorie und der Wahrscheinlichkeitsrechnung, respective den Sätzen über das wärmegleichgewicht. Sitzungberichte der Kaiserlichen Akademie der Wissenschaften. Mathematisch-Naturwissen Classe. Abt. II LXXVI, 373–435.

Eddington, A.S., 1929. The nature of the physical world. Dent (London). https://doi.org/10.2307/2180099

Frieden, B.R., 1998. Physics from fisher information: A unification. Cambridge University Press, Cambridge, UK. https://doi.org/10.1017/CBO9780511622670

Mikhailov, G.K., n.d. Daniel bernoulli, hydrodynamica (1738).

Sharp, K., Matschinsky, F., 2015. Translation of Ludwig Boltzmann’s paper “on the relationship between the second fundamental theorem of the mechanical theory of heat and probability calculations regarding the conditions for thermal equilibrium.” Entropy 17, 1971–2009. https://doi.org/10.3390/e17041971