edit

The Inaccessible Game

$$\newcommand{\tk}[1]{} \newcommand{\Amatrix}{\mathbf{A}} \newcommand{\KL}[2]{\text{KL}\left( #1\,\|\,#2 \right)} \newcommand{\Kaast}{\kernelMatrix_{\mathbf{ \ast}\mathbf{ \ast}}} \newcommand{\Kastu}{\kernelMatrix_{\mathbf{ \ast} \inducingVector}} \newcommand{\Kff}{\kernelMatrix_{\mappingFunctionVector \mappingFunctionVector}} \newcommand{\Kfu}{\kernelMatrix_{\mappingFunctionVector \inducingVector}} \newcommand{\Kuast}{\kernelMatrix_{\inducingVector \bf\ast}} \newcommand{\Kuf}{\kernelMatrix_{\inducingVector \mappingFunctionVector}} \newcommand{\Kuu}{\kernelMatrix_{\inducingVector \inducingVector}} \newcommand{\Kuui}{\Kuu^{-1}} \newcommand{\Qaast}{\mathbf{Q}_{\bf \ast \ast}} \newcommand{\Qastf}{\mathbf{Q}_{\ast \mappingFunction}} \newcommand{\Qfast}{\mathbf{Q}_{\mappingFunctionVector \bf \ast}} \newcommand{\Qff}{\mathbf{Q}_{\mappingFunctionVector \mappingFunctionVector}} \newcommand{\aMatrix}{\mathbf{A}} \newcommand{\aScalar}{a} \newcommand{\aVector}{\mathbf{a}} \newcommand{\acceleration}{a} \newcommand{\bMatrix}{\mathbf{B}} \newcommand{\bScalar}{b} \newcommand{\bVector}{\mathbf{b}} \newcommand{\basisFunc}{\phi} \newcommand{\basisFuncVector}{\boldsymbol{ \basisFunc}} \newcommand{\basisFunction}{\phi} \newcommand{\basisLocation}{\mu} \newcommand{\basisMatrix}{\boldsymbol{ \Phi}} \newcommand{\basisScalar}{\basisFunction} \newcommand{\basisVector}{\boldsymbol{ \basisFunction}} \newcommand{\activationFunction}{\phi} \newcommand{\activationMatrix}{\boldsymbol{ \Phi}} \newcommand{\activationScalar}{\basisFunction} \newcommand{\activationVector}{\boldsymbol{ \basisFunction}} \newcommand{\bigO}{\mathcal{O}} \newcommand{\binomProb}{\pi} \newcommand{\cMatrix}{\mathbf{C}} \newcommand{\cbasisMatrix}{\hat{\boldsymbol{ \Phi}}} \newcommand{\cdataMatrix}{\hat{\dataMatrix}} \newcommand{\cdataScalar}{\hat{\dataScalar}} \newcommand{\cdataVector}{\hat{\dataVector}} \newcommand{\centeredKernelMatrix}{\mathbf{ \MakeUppercase{\centeredKernelScalar}}} \newcommand{\centeredKernelScalar}{b} \newcommand{\centeredKernelVector}{\centeredKernelScalar} \newcommand{\centeringMatrix}{\mathbf{H}} \newcommand{\chiSquaredDist}[2]{\chi_{#1}^{2}\left(#2\right)} \newcommand{\chiSquaredSamp}[1]{\chi_{#1}^{2}} \newcommand{\conditionalCovariance}{\boldsymbol{ \Sigma}} \newcommand{\coregionalizationMatrix}{\mathbf{B}} \newcommand{\coregionalizationScalar}{b} \newcommand{\coregionalizationVector}{\mathbf{ \coregionalizationScalar}} \newcommand{\covDist}[2]{\text{cov}_{#2}\left(#1\right)} \newcommand{\covSamp}[1]{\text{cov}\left(#1\right)} \newcommand{\covarianceScalar}{c} \newcommand{\covarianceVector}{\mathbf{ \covarianceScalar}} \newcommand{\covarianceMatrix}{\mathbf{C}} \newcommand{\covarianceMatrixTwo}{\boldsymbol{ \Sigma}} \newcommand{\croupierScalar}{s} \newcommand{\croupierVector}{\mathbf{ \croupierScalar}} \newcommand{\croupierMatrix}{\mathbf{ \MakeUppercase{\croupierScalar}}} \newcommand{\dataDim}{p} \newcommand{\dataIndex}{i} \newcommand{\dataIndexTwo}{j} \newcommand{\dataMatrix}{\mathbf{Y}} \newcommand{\dataScalar}{y} \newcommand{\dataSet}{\mathcal{D}} \newcommand{\dataStd}{\sigma} \newcommand{\dataVector}{\mathbf{ \dataScalar}} \newcommand{\decayRate}{d} \newcommand{\degreeMatrix}{\mathbf{ \MakeUppercase{\degreeScalar}}} \newcommand{\degreeScalar}{d} \newcommand{\degreeVector}{\mathbf{ \degreeScalar}} \newcommand{\diag}[1]{\text{diag}\left(#1\right)} \newcommand{\diagonalMatrix}{\mathbf{D}} \newcommand{\diff}[2]{\frac{\text{d}#1}{\text{d}#2}} \newcommand{\diffTwo}[2]{\frac{\text{d}^2#1}{\text{d}#2^2}} \newcommand{\displacement}{x} \newcommand{\displacementVector}{\textbf{\displacement}} \newcommand{\distanceMatrix}{\mathbf{ \MakeUppercase{\distanceScalar}}} \newcommand{\distanceScalar}{d} \newcommand{\distanceVector}{\mathbf{ \distanceScalar}} \newcommand{\eigenvaltwo}{\ell} \newcommand{\eigenvaltwoMatrix}{\mathbf{L}} \newcommand{\eigenvaltwoVector}{\mathbf{l}} \newcommand{\eigenvalue}{\lambda} \newcommand{\eigenvalueMatrix}{\boldsymbol{ \Lambda}} \newcommand{\eigenvalueVector}{\boldsymbol{ \lambda}} \newcommand{\eigenvector}{\mathbf{ \eigenvectorScalar}} \newcommand{\eigenvectorMatrix}{\mathbf{U}} \newcommand{\eigenvectorScalar}{u} \newcommand{\eigenvectwo}{\mathbf{v}} \newcommand{\eigenvectwoMatrix}{\mathbf{V}} \newcommand{\eigenvectwoScalar}{v} \newcommand{\entropy}[1]{\mathcal{H}\left(#1\right)} \newcommand{\errorFunction}{E} \newcommand{\expDist}[2]{\left\langle#1\right\rangle_{#2}} \newcommand{\expSamp}[1]{\left\langle#1\right\rangle} \newcommand{\expectation}[1]{\left\langle #1 \right\rangle } \newcommand{\expectationDist}[2]{\left\langle #1 \right\rangle _{#2}} \newcommand{\expectedDistanceMatrix}{\mathcal{D}} \newcommand{\eye}{\mathbf{I}} \newcommand{\fantasyDim}{r} \newcommand{\fantasyMatrix}{\mathbf{ \MakeUppercase{\fantasyScalar}}} \newcommand{\fantasyScalar}{z} \newcommand{\fantasyVector}{\mathbf{ \fantasyScalar}} \newcommand{\featureStd}{\varsigma} \newcommand{\gammaCdf}[3]{\mathcal{GAMMA CDF}\left(#1|#2,#3\right)} \newcommand{\gammaDist}[3]{\mathcal{G}\left(#1|#2,#3\right)} \newcommand{\gammaSamp}[2]{\mathcal{G}\left(#1,#2\right)} \newcommand{\gaussianDist}[3]{\mathcal{N}\left(#1|#2,#3\right)} \newcommand{\gaussianSamp}[2]{\mathcal{N}\left(#1,#2\right)} \newcommand{\uniformDist}[3]{\mathcal{U}\left(#1|#2,#3\right)} \newcommand{\uniformSamp}[2]{\mathcal{U}\left(#1,#2\right)} \newcommand{\given}{|} \newcommand{\half}{\frac{1}{2}} \newcommand{\heaviside}{H} \newcommand{\hiddenMatrix}{\mathbf{ \MakeUppercase{\hiddenScalar}}} \newcommand{\hiddenScalar}{h} \newcommand{\hiddenVector}{\mathbf{ \hiddenScalar}} \newcommand{\identityMatrix}{\eye} \newcommand{\inducingInputScalar}{z} \newcommand{\inducingInputVector}{\mathbf{ \inducingInputScalar}} \newcommand{\inducingInputMatrix}{\mathbf{Z}} \newcommand{\inducingScalar}{u} \newcommand{\inducingVector}{\mathbf{ \inducingScalar}} \newcommand{\inducingMatrix}{\mathbf{U}} \newcommand{\inlineDiff}[2]{\text{d}#1/\text{d}#2} \newcommand{\inputDim}{q} \newcommand{\inputMatrix}{\mathbf{X}} \newcommand{\inputScalar}{x} \newcommand{\inputSpace}{\mathcal{X}} \newcommand{\inputVals}{\inputVector} \newcommand{\inputVector}{\mathbf{ \inputScalar}} \newcommand{\iterNum}{k} \newcommand{\kernel}{\kernelScalar} \newcommand{\kernelMatrix}{\mathbf{K}} \newcommand{\kernelScalar}{k} \newcommand{\kernelVector}{\mathbf{ \kernelScalar}} \newcommand{\kff}{\kernelScalar_{\mappingFunction \mappingFunction}} \newcommand{\kfu}{\kernelVector_{\mappingFunction \inducingScalar}} \newcommand{\kuf}{\kernelVector_{\inducingScalar \mappingFunction}} \newcommand{\kuu}{\kernelVector_{\inducingScalar \inducingScalar}} \newcommand{\lagrangeMultiplier}{\lambda} \newcommand{\lagrangeMultiplierMatrix}{\boldsymbol{ \Lambda}} \newcommand{\lagrangian}{L} \newcommand{\laplacianFactor}{\mathbf{ \MakeUppercase{\laplacianFactorScalar}}} \newcommand{\laplacianFactorScalar}{m} \newcommand{\laplacianFactorVector}{\mathbf{ \laplacianFactorScalar}} \newcommand{\laplacianMatrix}{\mathbf{L}} \newcommand{\laplacianScalar}{\ell} \newcommand{\laplacianVector}{\mathbf{ \ell}} \newcommand{\latentDim}{q} \newcommand{\latentDistanceMatrix}{\boldsymbol{ \Delta}} \newcommand{\latentDistanceScalar}{\delta} \newcommand{\latentDistanceVector}{\boldsymbol{ \delta}} \newcommand{\latentForce}{f} \newcommand{\latentFunction}{u} \newcommand{\latentFunctionVector}{\mathbf{ \latentFunction}} \newcommand{\latentFunctionMatrix}{\mathbf{ \MakeUppercase{\latentFunction}}} \newcommand{\latentIndex}{j} \newcommand{\latentScalar}{z} \newcommand{\latentVector}{\mathbf{ \latentScalar}} \newcommand{\latentMatrix}{\mathbf{Z}} \newcommand{\learnRate}{\eta} \newcommand{\lengthScale}{\ell} \newcommand{\rbfWidth}{\ell} \newcommand{\likelihoodBound}{\mathcal{L}} \newcommand{\likelihoodFunction}{L} \newcommand{\locationScalar}{\mu} \newcommand{\locationVector}{\boldsymbol{ \locationScalar}} \newcommand{\locationMatrix}{\mathbf{M}} \newcommand{\variance}[1]{\text{var}\left( #1 \right)} \newcommand{\mappingFunction}{f} \newcommand{\mappingFunctionMatrix}{\mathbf{F}} \newcommand{\mappingFunctionTwo}{g} \newcommand{\mappingFunctionTwoMatrix}{\mathbf{G}} \newcommand{\mappingFunctionTwoVector}{\mathbf{ \mappingFunctionTwo}} \newcommand{\mappingFunctionVector}{\mathbf{ \mappingFunction}} \newcommand{\scaleScalar}{s} \newcommand{\mappingScalar}{w} \newcommand{\mappingVector}{\mathbf{ \mappingScalar}} \newcommand{\mappingMatrix}{\mathbf{W}} \newcommand{\mappingScalarTwo}{v} \newcommand{\mappingVectorTwo}{\mathbf{ \mappingScalarTwo}} \newcommand{\mappingMatrixTwo}{\mathbf{V}} \newcommand{\maxIters}{K} \newcommand{\meanMatrix}{\mathbf{M}} \newcommand{\meanScalar}{\mu} \newcommand{\meanTwoMatrix}{\mathbf{M}} \newcommand{\meanTwoScalar}{m} \newcommand{\meanTwoVector}{\mathbf{ \meanTwoScalar}} \newcommand{\meanVector}{\boldsymbol{ \meanScalar}} \newcommand{\mrnaConcentration}{m} \newcommand{\naturalFrequency}{\omega} \newcommand{\neighborhood}[1]{\mathcal{N}\left( #1 \right)} \newcommand{\neilurl}{http://inverseprobability.com/} \newcommand{\noiseMatrix}{\boldsymbol{ E}} \newcommand{\noiseScalar}{\epsilon} \newcommand{\noiseVector}{\boldsymbol{ \epsilon}} \newcommand{\noiseStd}{\sigma} \newcommand{\norm}[1]{\left\Vert #1 \right\Vert} \newcommand{\normalizedLaplacianMatrix}{\hat{\mathbf{L}}} \newcommand{\normalizedLaplacianScalar}{\hat{\ell}} \newcommand{\normalizedLaplacianVector}{\hat{\mathbf{ \ell}}} \newcommand{\numActive}{m} \newcommand{\numBasisFunc}{m} \newcommand{\numComponents}{m} \newcommand{\numComps}{K} \newcommand{\numData}{n} \newcommand{\numFeatures}{K} \newcommand{\numHidden}{h} \newcommand{\numInducing}{m} \newcommand{\numLayers}{\ell} \newcommand{\numNeighbors}{K} \newcommand{\numSequences}{s} \newcommand{\numSuccess}{s} \newcommand{\numTasks}{m} \newcommand{\numTime}{T} \newcommand{\numTrials}{S} \newcommand{\outputIndex}{j} \newcommand{\paramVector}{\boldsymbol{ \theta}} \newcommand{\parameterMatrix}{\boldsymbol{ \Theta}} \newcommand{\parameterScalar}{\theta} \newcommand{\parameterVector}{\boldsymbol{ \parameterScalar}} \newcommand{\partDiff}[2]{\frac{\partial#1}{\partial#2}} \newcommand{\precisionScalar}{j} \newcommand{\precisionVector}{\mathbf{ \precisionScalar}} \newcommand{\precisionMatrix}{\mathbf{J}} \newcommand{\pseudotargetScalar}{\widetilde{y}} \newcommand{\pseudotargetVector}{\mathbf{ \pseudotargetScalar}} \newcommand{\pseudotargetMatrix}{\mathbf{ \widetilde{Y}}} \newcommand{\rank}[1]{\text{rank}\left(#1\right)} \newcommand{\rayleighDist}[2]{\mathcal{R}\left(#1|#2\right)} \newcommand{\rayleighSamp}[1]{\mathcal{R}\left(#1\right)} \newcommand{\responsibility}{r} \newcommand{\rotationScalar}{r} \newcommand{\rotationVector}{\mathbf{ \rotationScalar}} \newcommand{\rotationMatrix}{\mathbf{R}} \newcommand{\sampleCovScalar}{s} \newcommand{\sampleCovVector}{\mathbf{ \sampleCovScalar}} \newcommand{\sampleCovMatrix}{\mathbf{s}} \newcommand{\scalarProduct}[2]{\left\langle{#1},{#2}\right\rangle} \newcommand{\sign}[1]{\text{sign}\left(#1\right)} \newcommand{\sigmoid}[1]{\sigma\left(#1\right)} \newcommand{\singularvalue}{\ell} \newcommand{\singularvalueMatrix}{\mathbf{L}} \newcommand{\singularvalueVector}{\mathbf{l}} \newcommand{\sorth}{\mathbf{u}} \newcommand{\spar}{\lambda} \newcommand{\trace}[1]{\text{tr}\left(#1\right)} \newcommand{\BasalRate}{B} \newcommand{\DampingCoefficient}{C} \newcommand{\DecayRate}{D} \newcommand{\Displacement}{X} \newcommand{\LatentForce}{F} \newcommand{\Mass}{M} \newcommand{\Sensitivity}{S} \newcommand{\basalRate}{b} \newcommand{\dampingCoefficient}{c} \newcommand{\mass}{m} \newcommand{\sensitivity}{s} \newcommand{\springScalar}{\kappa} \newcommand{\springVector}{\boldsymbol{ \kappa}} \newcommand{\springMatrix}{\boldsymbol{ \mathcal{K}}} \newcommand{\tfConcentration}{p} \newcommand{\tfDecayRate}{\delta} \newcommand{\tfMrnaConcentration}{f} \newcommand{\tfVector}{\mathbf{ \tfConcentration}} \newcommand{\velocity}{v} \newcommand{\sufficientStatsScalar}{g} \newcommand{\sufficientStatsVector}{\mathbf{ \sufficientStatsScalar}} \newcommand{\sufficientStatsMatrix}{\mathbf{G}} \newcommand{\switchScalar}{s} \newcommand{\switchVector}{\mathbf{ \switchScalar}} \newcommand{\switchMatrix}{\mathbf{S}} \newcommand{\tr}[1]{\text{tr}\left(#1\right)} \newcommand{\loneNorm}[1]{\left\Vert #1 \right\Vert_1} \newcommand{\ltwoNorm}[1]{\left\Vert #1 \right\Vert_2} \newcommand{\onenorm}[1]{\left\vert#1\right\vert_1} \newcommand{\twonorm}[1]{\left\Vert #1 \right\Vert} \newcommand{\vScalar}{v} \newcommand{\vVector}{\mathbf{v}} \newcommand{\vMatrix}{\mathbf{V}} \newcommand{\varianceDist}[2]{\text{var}_{#2}\left( #1 \right)} \newcommand{\vecb}[1]{\left(#1\right):} \newcommand{\weightScalar}{w} \newcommand{\weightVector}{\mathbf{ \weightScalar}} \newcommand{\weightMatrix}{\mathbf{W}} \newcommand{\weightedAdjacencyMatrix}{\mathbf{A}} \newcommand{\weightedAdjacencyScalar}{a} \newcommand{\weightedAdjacencyVector}{\mathbf{ \weightedAdjacencyScalar}} \newcommand{\onesVector}{\mathbf{1}} \newcommand{\zerosVector}{\mathbf{0}} $$
at Information Theory Seminar, Centre for Mathematical Sciences (MR5), University of Cambridge on May 20, 2026 [jupyter][google colab][reveal]
Neil D. Lawrence, University of Cambridge

Abstract

In this talk we will explore a zero-player game based on an information isolation constraint. The dynamics of the game emerge from a “no-barber” selection principle that prohibits external structure. The aim is for the game to avoid impredictive-style inconsistencies. Motivated by the selection principle we will derive a “selected” trajectory in the game that consists of a second-order constrained maximum entropy production along the information geometry.

The Munchkin Provision

[edit]

Without such consistency, we would require what we might call a “Munchkin provision.” In the Munchkin card game (Jackson, 2001), it is acknowledged that the cards and rules may be inconsistent. Their resolution?

Any other disputes should be settled by loud arguments, with the owner of the game having the last word.

Munckin Rules (Jackson, 2001)

While this works for card games, it’s unsatisfying for foundational mathematics. We want our game to be internally consistent, not requiring an external referee to resolve paradoxes.

Figure: The Munchkin card came has both cards and rules. The game explicitly acknowledges that this can lead to inconsistencies which should be resolved by the game owner.

A Tautology

Self-governing systems cannot refer to external arbitration.

While this is a tautology, we’re going to try and suggest how to formalise this notion through information theory.

Foundations: Information Loss and Entropy

The Inaccessible Game Setup

[edit]

Inspired by the no-barber principle, we set up the game in a way that attempts to avoid “external structure.” The first two things we need to do this are 1. A representation of information loss 2. A prohibition of information exchange with the game

How do we obtain a representation of information loss without including external structure? We use the axiomatic frameworks of Baez et al (Baez et al. (2011)) and Parzygnat (Parzygnat (2022)). They characterise entropy through category theory frameworks that depend on three axioms. Slight differences in the axioms result in different conclusions. Baez et al conclude that difference in Shannon entropy before and after a process is applied characterises information loss. Parzygnat is inspired by Baez et al but reformulates around a different categorical object which implies von Neumann entropy.

In the game (Lawrence, 2025) we introduce information conservation based on these measures of information loss.

Thermalisation from Different Initial Conditions

[edit]

This simulation places exactly nine billiard balls on a 3×3 grid, each coloured according to its position. The 3×3 histogram grid tracks, for each ball, the cumulative 2-D velocity distribution \((v_x, v_y)\) it has visited since the last reset.

The entropy \(H(v_x, v_y)\) for each ball is shown in the top-left of its panel. At the start, when all balls move identically, every panel shows a single bright dot near zero entropy. As elastic collisions redistribute energy the dots spread outward, tracing the Maxwell–Boltzmann circle, and entropy climbs toward its maximum.

The coloured dot in the top-right corner of each panel matches the ball’s colour on the main canvas, making it easy to follow individual balls.

Use the Display dropdown to switch between the 2D joint distribution \(p(v_x, v_y)\) (heatmap) and the two 1D marginals \(p(v_x)\) and \(p(v_y)\) overlaid as bar charts. Both marginals are expected to converge to the same symmetric distribution; the coloured bars show \(p(v_x)\) (ball colour) and the dark outline shows \(p(v_y)\). The entropy labels \(H_x\) and \(H_y\) confirm that the two components thermalise at the same rate.

Use the Initialisation dropdown to choose how the balls start:

Option Description
From top ↓ All balls move downward at the same speed
From bottom ↑ All balls move upward
From left → All balls move rightward
From right ← All balls move leftward
Clockwise ↻ Each ball moves tangentially clockwise around the canvas centre
Counter-CW ↺ Each ball moves tangentially counter-clockwise

For the four directional cases all nine histograms start at the same point, yet rapidly diverge and then converge to the same circular distribution. For the propellor cases adjacent balls start with very different velocity directions — the corner and edge balls even start at nearly opposite velocities — and yet all nine panels converge to the same equilibrium blob.

Notice that this system is ergodic: in the long-run distribution of each ball’s velocity is independent of the initial conditions and identical for all balls, even though the path to equilibrium differs.

Initialisation: Display:

Figure: Nine billiard balls on a 3×3 grid. The histogram grid tracks each ball’s cumulative \((v_x, v_y)\) velocity distribution. Entropy per ball rises from near zero (single bright dot at the initial velocity) to the Maxwell–Boltzmann value as collisions thermalise the gas. Use the Initialisation dropdown to compare directional starts (all balls move the same way) with propellor starts (adjacent balls move in opposite directions): all initial conditions converge to the same equilibrium, demonstrating ergodicity.

Figure: Samples from independent Gaussian variables that represent horizontal and vertical velocities when our system is at equilibrium.

Sampling Two Dimensional Variables

[edit]

Figure: Samples from correlated Gaussian variables that represent vertical and horizontal velocity.

Jaynes and Maximum Entropy

[edit]

Figure: Ed Jaynes who developed the maximum entropy principle

Maximum Entropy Motivation

[edit]

Ed Jaynes (Jaynes, 1957), proposed a foundation for statistical mechanics based on information theory. Jaynes recast that the problem of assigning probabilities in statistical mechanics as a problem of inference with incomplete information.

A central problem in statistical mechanics is assigning initial probabilities when our knowledge is incomplete. The canonical example is if we know only the average energy of a system, what probability distribution should we use? Jaynes argued that we should use the distribution that maximises entropy subject to the constraints of our knowledge.

Jaynes illustrated the approach with a simple example. If a die has been tossed many times, with an average result of 4.5 rather than the expected 3.5 for a fair die. What probability assignment \(P_n\) (\(n=1,2,...,6\)) should we make for the next toss?

We need to satisfy two constraints \[\begin{align} \sum_{n=1}^6 P_n &= 1 \\ \sum_{n=1}^6 n P_n &= 4.5 \end{align}\]

Many distributions could satisfy these constraints, but which one makes the fewest unwarranted assumptions? Jaynes argued that we should choose the distribution that is maximally noncommittal with respect to missing information - the one that maximises the entropy, \[\begin{align} S_I = -\sum_{i} p_i \log p_i \end{align}\] This principle leads to the exponential family of distributions, which in statistical mechanics gives us the canonical ensemble and other familiar distributions.

Die Roll Simulation

[edit]

This simulation illustrates the maximum entropy principle through Jaynes’ dice example (Jaynes, 1957). A fair die has expected outcome 3.5; the Jaynes example asks: if we know only that the average outcome is 4.5, what probability distribution \(P_n\) over the six faces should we assign?

The answer is the maximum-entropy distribution subject to the constraint \(\sum_{n=1}^6 n P_n = 4.5\), which belongs to the exponential family: \[\begin{align} P_n = \frac{e^{\lambda n}}{Z(\lambda)}, \qquad Z(\lambda) = \sum_{n=1}^6 e^{\lambda n} \end{align}\] where \(\lambda > 0\) is chosen so the mean constraint is satisfied. This avoids any unwarranted assumption beyond the available data.

click die or button to roll

Rolls: 0
Sample mean:
H(p):

Outcome weights (auto-normalised to probabilities)

Figure: Interactive die-roll simulation. Click the die or press Roll to sample from the configured distribution. The histogram shows empirical relative frequencies (coloured bars) overlaid on the theoretical probabilities (dashed outlines). Use the sliders to set arbitrary outcome weights, or click a preset to load the uniform distribution (mean 3.5), the Jaynes maximum-entropy distribution (mean 4.5), or a low-biased distribution (mean 2).

The General Maximum-Entropy Formalism

[edit]

For a more general case, suppose a quantity \(x\) can take values \((x_1, x_2, \ldots, x_n)\) and we know the average values of several functions \(f_k(x)\). The problem is to find the probability assignment \(p_i = p(x_i)\) that satisfies \[\begin{align} \sum_{i=1}^n p_i &= 1 \\ \sum_{i=1}^n p_i f_k(x_i) &= \langle f_k(x) \rangle = F_k \quad k=1,2,\ldots,m \end{align}\] and maximises the entropy \(S_I = -\sum_{i=1}^n p_i \log p_i\).

Using Lagrange multipliers, the solution is the generalised canonical distribution, \[\begin{align} p_i = \frac{\exp(-\lambda_1 f_1(x_i) - \ldots - \lambda_m f_m(x_i))}{Z(\lambda_1,\ldots,\lambda_m)} \end{align}\] where \(Z(\lambda_1,\ldots,\lambda_m)\) is the partition function, \[\begin{align} Z(\lambda_1,\ldots,\lambda_m) = \sum_{i=1}^n \exp(-\lambda_1 f_1(x_i) - \ldots - \lambda_m f_m(x_i)) \end{align}\] The Lagrange multipliers \(\lambda_k\) are determined by the constraints, \[\begin{align} \langle f_k \rangle = -\frac{\partial}{\partial \lambda_k}\log Z(\lambda_1,\ldots,\lambda_m) \quad k=1,2,\ldots,m. \end{align}\] The maximum attainable entropy is \[\begin{align} (S_I)_{max} = \log Z + \sum_{k=1}^m \lambda_k \langle f_k \rangle. \end{align}\]

\[ p_i = \frac{\exp(-\lambda_1 f_1(x_i) - \ldots - \lambda_m f_m(x_i))}{Z(\lambda_1,\ldots,\lambda_m)} \] \[ Z(\ldots) = \sum_{i=1}^n \exp(-\lambda_1 f_1(x_i) - \ldots - \lambda_m f_m(x_i)) \] \[ \langle f_k \rangle = -\frac{\partial}{\partial \lambda_k}\log Z(\lambda_1,\ldots,\lambda_m) \quad k=1,2,\ldots,m. \]

Exponential Family

This mirrors a broadly used representation in statistics known as the exponential family.

\[ p(X|\boldsymbol{\theta}) = \exp\left(\sum_i \theta_i T(X) - \phi(\boldsymbol{\theta}_i)\right) \] where \(\theta_i = \lambda_i\)

Figure: Two independent Gaussians for the \(x\) and \(y\) velocity of a ball.

Figure: A correlated Gaussian for the \(x\) and \(y\) velocity of a ball. If all balls were correlated in this way, this would imply that the whole box is moving towards the upper right or bottom left.

Figure: An anti-correlated Gaussian for the \(x\) and \(y\) velocity of a ball. If all balls were anti-correlated in this way, this would imply that the whole box is moving towards the upper left or bottom right.

The Classical Observer

[edit]

Figure: Here the observer is monitoring the movements of the particles. We’ve plotted the velocities alongside the 1 standard deviation contour of their theoretical distribution.

The Classical Observer - Correlated

Figure: Again the observer is monitoring the movements of the particles, but here their motion is correlated (\(\rho=0.95\)).

The Classical Observer - Anti-correlated

Figure: Here the observer is monitoring the movements of the particles, but here their motion is anti-correlated (\(\rho=-0.95\)).

Back to self adjudication

The No-Barber Principle

[edit]

In 1901 Bertrand Russell introduced a paradox: if a barber shaves everyone in the village who does not shave themselves, does the barber shave themselves? The paradox arises when a definition quantifies over a totality that includes the defining rule itself.

We propose a similar constraint for the inaccessible game: the foundational rules must not refer to anything outside themselves for adjudication or reference. Or in other words there can be no external structure. We call this the “no-barber principle.”

The no-barber principle says that admissible rules must be internally adjudicable: they depend only on quantities definable from within the system’s internal language, without requiring e.g. an external observer to define the co-ordinates or a privileged decomposition.

The Classical Observer - Inaccessible

[edit]

Figure: Here the observer is blocked from monitoring anything inside the sytem.

When we don’t know what’s going on inside, we can’t express outcomes in the way we could with an observer. But we can still express entropies. This highlights an interesting characteristic of entropies. If we don’t express the probability directly, but just work with the entropies themselves, it feels like we can assess the bounds of possibility without directly expressing what’s going on.

Entropy and Impossibility

While we don’t see the underlying probability, we can capture a class of different distirbutions by considering the mapping to the system entropy.

Think of entropy as a scoring system: every probability distribution gets a number measuring its uncertainty. Once you have that, you can line them up from least to most uncertain — which gives you a natural ordering.1

We denote marginal entropy of the \(i\)th variable by \(h_i\). We denote the joint entropy of the entire system by \(H\).

Two Candidates for Information Loss

[edit]

The inaccessible game uses formal notion of information loss in order to impose information isolation. Two category-theoretic characterisations exist. Baez et al. (2011) work in the classical category \(\textsf{FinProb}\) of finite probability spaces and stochastic maps, characterising information loss via Shannon entropy. Parzygnat (2022) work in the noncommutative category \(\textsf{NCFinProb}\) of finite-dimensional quantum probability spaces and unital completely positive maps, characterising information loss via von Neumann entropy. Both give rigorous accounts of information loss, but they correspond to very different underlying mathematical structures.

In \(\textsf{FinProb}\), probability distributions live on finite sets and information loss is characterised by the decrease in Shannon entropy. The result of Baez et al. (2011) is that three axioms (functoriality, convex linearity, and continuity) pin down information loss as proportional to the entropy difference \(H(p) - H(q)\) along a stochastic map \(p \to q\).

\(\textsf{FinProb}\) is a category. This means every object \(A\) is equipped with a canonical diagonal map \(\Delta_A : A \to A \times A\). Such a map duplicates its input freely, with no additional postulates required.

In \(\textsf{NCFinProb}\), states are density matrices on finite-dimensional Hilbert spaces and channels are unital completely positive maps. Parzygnat (2022) characterises information loss via von Neumann entropy with the analogous result.

\(\textsf{NCFinProb}\) is a category. The tensor product \(\otimes\) combines systems, but there is no canonical copying map \(A \to A \otimes A\). Any duplication must be explicitly postulated — it cannot arise from the categorical structure alone. This is the mathematical expression of the quantum no-cloning theorem.

Applying the No-Barber Principle

Lawvere (1969) showed that Russell’s barber paradox shares a common categorical structure with Gödel’s incompleteness theorem, Turing’s halting problem, and Cantor’s theorem. The structure requires five ingredients: rules, a universal evaluator, internalisation of syntax, self-application, and a twist. The self-application step requires feeding the same object into two slots simultaneously. In categorical terms this is precisely the diagonal map \(A \to A \times A\).

In a cartesian category this map exists for free, making the paradox constructible internally. In a symmetric monoidal category with no canonical copying map, self-application cannot be formed without additional non-standard structure. The Lawvere diagonalisation — and with it the class of impredicative paradoxes it generates — cannot be derived internally.

The no-barber principle requires that the internal language of the game does not permit constructions whose resolution requires external adjudication. \(\textsf{FinProb}\), being cartesian, admits the diagonal constructions that can generate impredicative circularity. \(\textsf{NCFinProb}\), lacking canonical copying, blocks those constructions at the structural level.

The no-barber principle selects \(\textsf{NCFinProb}\) over \(\textsf{FinProb}\) as the foundational language for information loss in the game. This is independent of but consistent with the selection in Lawrence (2026), where von Neumann entropy is independently required because the game’s origin demands zero joint entropy with positive marginal entropies (a configuration achievable via entanglement but impossible under Shannon entropy).

Configurations Without Visibility

[edit]

The inaccessible game proceeds without an external observer who can read off the state. However, regardless of the current configuration of the system, we can ask: is there a property of the system we can use?

The idea is we use the entropy (von Neumann entropy) as a potential which gives us the system flow. This makes it a property of each configuration.

Entropy Orders the Configurations

Entropy assigns a non-negative real number to every configuration. This gives us a map \[ S : \{\text{configurations}\} \longrightarrow [0, \infty), \] and we can pull back the standard order on \(\mathbb{R}\) to get an ordering on configurations: \[ \rho \leq \rho' \quad \Longleftrightarrow \quad S(\rho) \leq S(\rho'). \] This relation is reflexive (\(\rho \leq \rho\)) and transitive (if \(\rho \leq \rho'\) and \(\rho' \leq \rho''\) then \(\rho \leq \rho''\)). It is antisymmetric in general: two distinct density matrices \(\rho \neq \rho'\) can share the same entropy value \(S(\rho) = S(\rho')\). A relation that is reflexive and transitive but not necessarily antisymmetric is called a , and a set equipped with a preorder is called a (preordered set).

So the configurations of the inaccessible game form a natural proset under von Neumann entropy.

Within the proset, the configurations that share an entropy value form equivalence classes: \(\rho \sim \rho'\) iff \(S(\rho) = S(\rho')\). Each equivalence class is an , i.e. a manifold of configurations all carrying the same entropic content.

If we pass to the quotient, i.e. treat isoentropy configurations as a single object, the resulting collection of equivalence classes \([\rho]\) is antisymmetric: \([\rho] \leq [\rho']\) and \([\rho'] \leq [\rho]\) together imply \([\rho] = [\rho']\). This quotient is a (poset). Because because \(S\) maps into \(\mathbb{R}\), which is totally ordered, the quotient poset is a : every two equivalence classes are comparable.

Even without seeing inside the system we can hypothesise that the chain of entropy levels is defined.

Figure: Many configurations (density matrices \(\rho\)) map under von Neumann entropy \(S\) to a single real number. Configurations with the same entropy value are isoentropy; they form an equivalence class. The quotient is a totally ordered chain of entropy levels.

We can picture the structure as a ladder: each rung corresponds to an entropy level \(S = c\), and multiple configurations sit at the same rung. Moving up the ladder means increasing entropy, more mixed, less structured. Moving down means decreasing entropy, more ordered, more pure. The bottom rung \(S = 0\) corresponds to a pure state.

This picture does not require us to know configuration the system is in at any rung, only the system sits on the ladder. We can think of dynamics in the inaccessible game as being expressed as movement along this ladder.

Figure: The entropy ladder: each rung is an isoentropy class. Multiple configurations sit at the same rung. Dynamics move the system up the ladder (entropy increase) subject to the marginal entropy conservation constraint.

The functor \(S: \textsf{NCFinProb} \to (\mathbb{R}_{\geq 0}, \leq)\) that assigns von Neumann entropy to each configuration is the formal expression of this structure. It maps the category of configurations to the poset of non-negative reals, and it is the unique (up to rescaling) continuous, functorial measure of information loss in \(\textsf{NCFinProb}\) (Parzygnat, 2022).

Energy

Energy Constraints

[edit]

The Conservation Law

The \(I + H = C\) Structure

[edit]

We have established four axioms, with the fourth axiom stating that the sum of marginal entropies is conserved, \[ \sum_{i=1}^N h_i = C. \] This conservation law is the heart of The Inaccessible Game, but to understand its dynamical implications, we need to rewrite it in a more revealing form.

Multi-Information: Measuring Correlation

The multi-information (or total correlation), introduced by Watanabe (1960), measures how much the variables in a system are correlated. It is defined as, \[ I = \sum_{i=1}^N h_i - H, \] where \(H\) is the joint entropy of the full system: \[ H = -\sum_{\mathbf{x}} p(\mathbf{x}) \log p(\mathbf{x}). \]

The multi-information has a nice interpretation:

  • \(I = 0\): The variables are completely independent. The joint entropy equals the sum of marginal entropies.
  • \(I > 0\): The variables are correlated. Some information is “shared” between variables, so the joint entropy is less than the sum of marginals.
  • \(I\) is maximal: The variables are maximally correlated (in the extreme case, deterministically related).

Multi-information is always non-negative (\(I \geq 0\)) and measures how much knowing one variable tells you about others.

Using the definition of multi-information, we can rewrite our conservation law. From \(I = \sum_{i=1}^N h_i - H\), we have: \[ \sum_{i=1}^N h_i = I + H. \] Therefore, the fourth axiom \(\sum_{i=1}^N h_i = C\) becomes: \[ I + H = C. \]

This is an information action principle. It says that multi-information plus joint entropy is conserved. This equation sits behind the dynamics of the Inaccessible Game.

This equation has the structure of an action principle in classical mechanics. In physics, total energy is conserved and splits into two parts, \[ T + V = E, \] where \(T\) is kinetic energy and \(V\) is potential energy.

The analogy for The Inaccessible Game is.

  • Multi-information \(I\) plays the role of potential energy. It represents “stored” correlation structure. High \(I\) means variables are tightly coupled, like a compressed spring.
  • Joint entropy \(H\) plays the role of kinetic energy. It represents “dispersed” or “free” information. High \(H\) means the probability distribution is spread out, with maximal uncertainty.

Just as a classical system evolves from high potential energy to high kinetic energy (a ball rolling down a hill), the idea in the Inaccessible Game will be that the information system evolves from high correlation (high \(I\)) to high entropy (high \(H\)).

Information Relaxation

The \(I + H = C\) structure suggests a relaxation principle: systems naturally evolve from states of high correlation (high \(I\), low \(H\)) toward states of low correlation (low \(I\), high \(H\)).

Why? Our inspiration is that the second law of thermodynamics tells us that entropy increases. If we want to introduce dynamics in the game, increasing entropy provides an obvious way to do that. Since \(I + H = C\) is constant, if \(H\) increases, \(I\) must decrease. The system breaks down correlations to increase entropy.

This is analogous to how physical systems relax from non-equilibrium states (low \(T\), high \(V\)) to equilibrium (high \(T\), low \(V\)). A compressed spring releases its stored energy. A hot object in a cold room disperses its energy. In information systems, correlated structure dissipates into entropy.

Consider a simple two-variable system with binary variables \(X_1\) and \(X_2\):

High correlation state (high \(I\), low \(H\)): \[ p(X_1=0, X_2=0) = 0.5, \quad p(X_1=1, X_2=1) = 0.5 \] The variables are perfectly correlated. Marginal entropies: \(h_1 = h_2 = 1\) bit. Joint entropy: \(H = 1\) bit. Multi-information: \(I = 1 + 1 - 1 = 1\) bit.

Low correlation state (low \(I\), high \(H\)): \[ p(X_1, X_2) = 0.25 \text{ for all four combinations} \] The variables are independent. Marginal entropies: \(h_1 = h_2 = 1\) bit. Joint entropy: \(H = 2\) bits. Multi-information: \(I = 1 + 1 - 2 = 0\) bits.

The system relaxes from the first state to the second, conserving \(I + H = 2\) bits throughout. Let’s visualise this relaxation:

import numpy as np
# Generate relaxation trajectory
n_steps = 100
alphas = np.linspace(0, 1, n_steps)

h1_vals = []
h2_vals = []
H_vals = []
I_vals = []

for alpha in alphas:
    p00, p01, p10, p11 = relaxation_path(alpha)
    h1, h2, H, I = compute_binary_entropies(p00, p01, p10, p11)
    h1_vals.append(h1)
    h2_vals.append(h2)
    H_vals.append(H)
    I_vals.append(I)

h1_vals = np.array(h1_vals)
h2_vals = np.array(h2_vals)
H_vals = np.array(H_vals)
I_vals = np.array(I_vals)
C_vals = I_vals + H_vals  # Should be constant

Figure: Left: Multi-information \(I\) decreases as joint entropy \(H\) increases, conserving \(I + H = C\). The colored regions show how the conserved quantity splits between correlation (red) and entropy (blue). Right: Marginal entropies remain constant throughout, making the system inaccessible to external observation.

The visualisation shows the trade-off: as the system relaxes, correlation structure (multi-information) is converted into entropy. The total \(I + H = C\) remains constant (black dashed line), but the system evolves from a state dominated by correlation to one dominated by entropy.

The marginal entropies \(h_1\) and \(h_2\) stay constant throughout this evolution. An external observer measuring only marginal entropies would see no change—the system is informationally isolated, hence “inaccessible.”

Long Story Short

Building on these ideas, some interesting conclusions emerge. The marginal entropy constraint leads to GENERIC-like dynamics Öttinger (2005).

When characterising the origin of the game, a shift is forced from Shannon entropy to von Neumann entropy (Neumann, 1932). In retrospect the shift feels natural if we take an algebraic view of quantum probability, where outcomes are no longer primitive. This is consistent with the inaccessible nature of the game.

The game is inspired by the nice connections between inference and thermodynamics explored by E. T. Jaynes, but the dynamics play out through the framework of information geometry (Amari and Nagaoka, 2000) which makes much of the (normally complicated) calculations around Riemanian geometry relatively straightforward.

Pendulum Animation

[edit]
import numpy as np
# Simple pendulum: trading potential ↔ kinetic energy
# State: angle θ and angular velocity ω
# Energy: E = (1/2)mL²ω² + mgL(1-cos(θ))

# Parameters
g = 9.81  # gravity
L = 1.0   # length
m = 1.0   # mass

# Initial conditions: release from angle, zero velocity
theta0 = np.pi/3  # 60 degrees
omega0 = 0.0
initial_state = np.array([theta0, omega0])
initial_energy = pendulum_energy(theta0, omega0)

# Simulate using Störmer-Verlet method (symplectic integrator, preserves energy)
dt = 0.02
t_max = 5.0
num_steps = int(t_max / dt)

# Arrays to store trajectory
times = np.linspace(0, t_max, num_steps)
trajectory = np.zeros((num_steps, 2))
energies = np.zeros(num_steps)

# Integrate using Störmer-Verlet (symplectic, time-reversible)
# Half-step omega, full-step theta, half-step omega — preserves energy to machine precision
theta, omega = initial_state
for i in range(num_steps):
    trajectory[i] = [theta, omega]
    energies[i] = pendulum_energy(theta, omega)
    omega_half = omega - 0.5 * (g/L) * np.sin(theta) * dt
    theta = theta + omega_half * dt
    omega = omega_half - 0.5 * (g/L) * np.sin(theta) * dt

# Verify energy conservation
energy_drift = np.abs(energies - initial_energy).max()
print(f"Maximum energy drift: {energy_drift/initial_energy*100:.2f}%")

Figure: Pendulum energy conservation: the pendulum (left) trades potential and kinetic energy while total energy (red line, right) remains constant. Green shows kinetic energy, orange shows potential energy.

This pendulum simulation uses.

  1. Energy formula: \(E = \frac{1}{2}mL^2\omega^2 + mgL(1-\cos\theta)\) (kinetic + potential)

  2. Dynamics from energy: The equation \(\frac{\text{d}\omega}{\text{d}t} = -\frac{g}{L}\sin\theta\) comes from energy conservation structure

  3. Trading energy: Watch kinetic (green) and potential (orange) trade off while total (red) stays constant

  4. Geometric structure: The antisymmetric structure we’ll study ensures this conservation automatically

  5. Störmer-Verlet integrator: The simulation uses a symplectic, time-reversible integrator — each step splits the velocity update into two half-steps either side of the position update, preserving the Hamiltonian structure and keeping energy drift at machine precision level.

The animation shows the pendulum swinging with the energy plot demonstrating near-perfect conservation.

One of the nice results of Lawrence (2025) is that in certain thermodynamic limits marginal entropy conservation manifests as energy conservation. So in these (meta-stable) regions one can use Jaynes’ maximum entropy approach to determin the stationary distribution.

Intelligence

Perpetual Motion and Superintelligence

[edit]

Imagine in 1925 a world where the automobile is already transforming society, but big promises are being made for things to come. The stock market is soaring, the 1918 pandemic is forgotten. And every major automobile manufacturer is investing heavily on the promise they will each be the first to produce a car that needs no fuel. A perpetual motion machine.

Well, of course that didn’t happen. But I sometimes wonder if what we’re seeing today 100 years later is the modern equivalent of that. In 2025 billions are being invested in promises of superintelligence and artificial general intelligence that will transform everything.

We know why perpetual motion is impossible: the second law of thermodynamics tells us that entropy always increases. So we can’t have motion without entropy production. No matter how clever the design, you cannot extract energy from nothing, and you cannot create a closed system that does useful work indefinitely without an external energy source.

How might we make an equivalent statement for the bizarre claims around superintelligence? Some inspiration comes from Maxwell’s demon, an “intelligent” entity which operates against the laws of thermodynamics. The inspiration comes because the demon suggests that for the second law to hold there must be a relationship between the demon’s decisions and thermodynamic entropy.

One of the resolutions comes from Landauer’s principle, the notion that erasure of information requires heat dissipation. This suggests there are fundamental information-theoretic constraints on intelligent systems, just as there are thermodynamic constraints on engines.

I’ve no doubt that AI technologies will transform our world just as much as the automobile has. But I also have no doubt that the promise of superintelligence is just as silly as the promise of perpetual motion. The inaccessible game provides one way of understanding why.

Information-Theoretic Limits

The hope is that this framework might reveal limits on information processing systems, including intelligent systems.

Information-Theoretic Limits on Intelligence

[edit]

Just as the second law of thermodynamics places fundamental limits on mechanical engines, no matter how cleverly designed, the idea is that information theory places fundamental limits on information engines, no matter how cleverly implemented.

What Intelligent Systems Must Do

Any intelligent system, whether biological or artificial, must perform certain fundamental operations:

  1. Acquire information from its environment (sensing, observation)
  2. Store information about the world (memory)
  3. Process information to make decisions (computation)
  4. Erase information to make room for new data (memory management)
  5. Act on the world using the processed information

Each of these operations has information-theoretic costs that cannot be eliminated by clever engineering.

Landauer’s Principle

Landauer’s principle (Landauer, 1961) establishes that erasing one bit of information requires dissipating at least \(k_BT\log 2\) of energy as heat, where \(k_B\) is Boltzmann’s constant and \(T\) is temperature.

This isn’t an engineering limitation, it’s a fundamental consequence of the second law. To reset a bit to a standard state (say, always 0) requires reducing its entropy from 1 bit to 0 bits. That entropy must go somewhere, and it ends up as heat in the environment.

This doesn’t mean AI can’t be powerful or transformative — internal combustion engines transformed the world despite thermodynamic limits. But it does mean there are hard bounds on what’s possible, and claims that ignore these bounds are as unrealistic as promises of perpetual motion.

The perpetual motion analogy provides an accessible way to think about claims of unbounded intelligence.

Open Questions

Many questions remain:

  1. Can we formalize the no-barber principle more rigorously?
  2. What is the right internal notion of “stage”/sample space for the game?
  3. When are selections actually forced (vs design degrees of freedom)?
  4. Can this be extended beyond symmetric configurations / beyond the origin?
  5. What other structures emerge from internal adjudicability?

These point toward future work at the intersection of information theory, geometry, and foundations.

Thanks!

For more information on these subjects and more you might want to check the following resources.

References

Amari, S., Nagaoka, H., 2000. Information geometry and its applications. Springer.
Baez, J.C., Fritz, T., Leinster, T., 2011. A characterization of entropy in terms of information loss. Entropy 13, 1945–1957. https://doi.org/10.3390/e13111945
Grmela, M., Öttinger, H.C., 1997. Dynamics and thermodynamics of complex fluids. I. Development of a general formalism. Physical Review E 56, 6620–6632. https://doi.org/10.1103/PhysRevE.56.6620
Jackson, S., 2001. Munchkin. Steve Jackson Games.
Jaynes, E.T., 1957. Information theory and statistical mechanics. Physical Review 106, 620–630. https://doi.org/10.1103/PhysRev.106.620
Landauer, R., 1961. Irreversibility and heat generation in the computing process. IBM Journal of Research and Development 5, 183–191. https://doi.org/10.1147/rd.53.0183
Lawrence, N.D., 2026. The origin of the inaccessible game. https://doi.org/10.48550/arXiv.2601.12576
Lawrence, N.D., 2025. The inaccessible game. https://doi.org/10.48550/arXiv.2511.06795
Lawvere, F.W., 1969. Diagonal arguments and cartesian closed categories, in: Category Theory, Homology Theory and Their Applications II, Lecture Notes in Mathematics. Springer, Berlin, pp. 134–145. https://doi.org/10.1007/BFb0080769
Neumann, J. von, 1932. Mathematische grundlagen der quantenmechanik. Springer, Berlin.
Öttinger, H.C., 2005. Beyond equilibrium thermodynamics. Wiley-Interscience, Hoboken, NJ. https://doi.org/10.1002/0471727903
Parzygnat, A.J., 2022. A functorial characterization of von Neumann entropy. Cahiers de Topologie et Géométrie Différentielle Catégoriques 63, 89–128.
Watanabe, S., 1960. Information theoretical analysis of multivariate correlation. IBM Journal of Research and Development 4, 66–82. https://doi.org/10.1147/rd.41.0066

  1. More formally entropy defines a functor from the category of finite probability spaces to the poset category \((\Re, \leq)\), assigning to each object its Shannon entropy.↩︎