An Entropy-Based Information Game
Neil D. Lawrence
Sorrento Meeting
<canvas id="multiball-canvas" width="700" height="500" style="border:1px solid black;display:block;width:100%"></canvas>
<div>Velocity-bin entropy: <output id="multiball-entropy"></output></div>
<div id="multiball-histogram-canvas" style="width:100%;height:250px"></div>
<canvas id="maxwell-canvas" width="700" height="500" style="border:1px solid black;display:block;width:100%"></canvas>
<div>Velocity-bin entropy: <output id="maxwell-entropy"></output></div>
<div id="maxwell-histogram-canvas" style="width:100%;height:250px"></div>
Entropy \[ S(X) = -\sum_X \rho(X) \log p(X) \]
In thermodynamics preceded by Boltzmann’s constant, \(k_B\)
Where \[ E_\rho\left[T(Z)\right] = \nabla_\boldsymbol{\theta}A(\boldsymbol{\theta}) \] because \(A(\boldsymbol{\theta})\) is log partition function.
operates as a cummulant generating function for \(\rho(Z)\).
Joint entropy can be decomposed \[ S(Z) = S(X,M) = S(X|M) + S(M) = S(X) - I(X;M) + S(M) \]
Mutual information \(I(X;M)\) connects information and energy
Measurement changes system entropy by \(-I(X;M)\)
Increases available energy
Difference in available energy: \[ \Delta A = A(X) - A(X|M) = I(X;M) \]
Can recover \(k_B T \cdot I(X;M)\) in work from the system
\[ I(X;M) = \sum_{x,m} \rho(x,m) \log \frac{\rho(x,m)}{\rho(x)\rho(m)}, \]
Jaynes (1957): Statistical mechanics as inference with incomplete information
Maximum entropy principle: most honest description of what we know
Avoids unwarranted assumptions beyond available data
Die example: Average result 4.5 instead of 3.5
Constraints:
Unlike animal game (which reduces entropy), Jaynes’ World maximizes entropy
System evolves by ascending the entropy gradient \(S(Z)\)
Animal game: max uncertainty → min uncertainty
Jaynes’ World: min uncertainty → max uncertainty
Thought experiment: looking backward from any point
Game appears to come from minimal entropy configuration (“origin”)
Game appears to move toward maximal entropy configuration (“end”)
\[ \rho(Z) = h(Z) \exp(\boldsymbol{\theta}^\top T(Z) - A(\boldsymbol{\theta})), \] where \(h(Z)\) is the base measure, \(T(Z)\) are sufficient statistics, \(A(\boldsymbol{\theta})\) is the log-partition function, \(\boldsymbol{\theta}\) are the natural parameters of the distribution.}
.
In regions where parameters are weakly coupled to entropy change (low \(\boldsymbol{\theta}^\top \nabla_\boldsymbol{\theta} S[\rho_\boldsymbol{\theta}]\)), perceived time flows slowly.
At critical points where parameters become orthogonal to the entropy gradient (\(\boldsymbol{\theta}^\top \nabla_\boldsymbol{\theta} S[\rho_\boldsymbol{\theta}] \approx 0\)), the time parameterization approaches singularity indicating phase transitions in the system’s information structure. }
\[ \Delta \theta_{\text{steepest}} = \eta \frac{\text{d}S}{\text{d}\theta} = \eta p(1-p)(\log(1-p) - \log p). \] \[ G(\theta) = p(1-p) \] \[ \Delta \theta_{\text{natural}} = \eta(\log(1-p) - \log p) \]
\(X\) divided into past/present \(X_0\) and future \(X_1\)
Same slow modes that induce spatial modularity also mediate temporal dependencies
Conditional mutual information: \[ I(X_0; X_1 | M) = \sum_{x_0,x_1,m} p(x_0,x_1,m) \log \frac{p(x_0,x_1|m)}{p(x_0|m)p(x_1|m)} \]
Measures dependency between past and future given memory state
Perfect Markovianity: \(I(X_0; X_1 | M) = 0\)
Slow modes serve dual purpose:
Eigenvalue spectrum determines both spatial and temporal structure
Fundamental tension between:
Creates uncertainty principle with necessary trade-offs
Three fundamental properties with mathematical formulations:
Mathematical uncertainty relation: \(\mathcal{C}(M) \cdot \mathcal{S}(X|M) \cdot \mathcal{T}(X_0, X_1|M) \geq k\)
Rigorous mathematical definitions:
Imposes fundamental limits on simultaneous optimization
Markov property emerges when slow modes capture all temporal dependencies
Mathematical tradeoffs between properties:
Memory \(\equiv\) Communication through time
Storage is transmission to future
Both limited by thermal noise
At Landauer’s limit: \(E = k_BT\)
Gives \(\frac{S}{N} = 1\)
Results in capacity of \(\frac{1}{2}\) bit/s
Fundamental connection between energy and information
\[ p(X|M) \approx \prod_i p(X_i|M_{\text{pa}(i)}) \]
\[ M_Z(t) = E[e^{t \cdot Z}] = \exp(A(\boldsymbol{\theta}+t) - A(\boldsymbol{\theta})). \]
\[ K_Z(t) = \log M_Z(t) = A(\boldsymbol{\theta}+t) - A(\boldsymbol{\theta}). \]
\[ \frac{d^2}{dt_i^2}K_Z(t)|_{t=0} = \frac{\partial^2 A(\boldsymbol{\theta})}{\partial \theta_i^2}. \]
Converging perspectives on intelligence:
Unified core: Intelligence as optimal information processing
Implications: