# Physics 434, 2015: Introduction to Information theory

Back to the main Teaching page.

Back to Physics 434, 2015: Physical Biology.

References (Nemenman, 2012) and (Levchenko and Nemenman, 2014) will be a useful reading.

- Setting up the problem: How do we measure information transmitted by a biological signaling system?
- Dose-response curves examples in (Levchenko and Nemenman, 2014) show us that this should be a property of the entire joint distribution.
- Information is the difference between the uncertainty pre and post the measurement.
- How do we define the uncertainty?

- Shannon's axioms and the derivation of entropy: if a variable is observed from a distribution then the amount of the information we gain from this observation must obey the following properties.
- If the cardinality of the distribution grows and the distribution is uniform, then the measure of information grows as well.
- The measure of information must be a continuous function of the distribution
- The measure of information is additive. That is, for a fine graining of into , we should have .

- Up to a multiplicative constant, the measure of information is then , which is also called the Boltzman-Shannon entropy. And we fix the constant by defining the entropy of a uniform binary distribution to be 1. Then . The entropy is then measured in
*bits*. - Meaning of entropy: Entropy of 1 bit means that we have gained enough information to answer one yes or no (binary) question about the variable .
- Properties of entropy (positive, limited, convex):
- , where is the cardinality of the distribution. Moreover, the first inequality becomes an equality iff the variable is deterministic (that is, one event has a probability of 1), and the second inequality is an equality iff the distribution is uniform.
- Entropy is a convex function of the distribution
- Entropies of independent variables add.
- Entropy is an extensive quantity: for a joint distribution , we can define an entropy
*rate*.

- Differential entropy: a continuous variable can be discretized with a step , and then the entropy is . This formally diverges at fine discretization: we need infinitely many bits to fully specify a continuous variable. The integral in the above expression is called the
*differential entropy*, and whenever we write for continuous variables, we mean the differential entropy. - Entropy of a normal distribution with variance is .
- Multivariate entropy is defined with summation/integration of log-probability over multiple variables, cf. entropy rate above.
- Conditional entropy is defined as averaged log-probability of a conditional distribution
- Mutual information: what if we want to know about a variable , but instead are measuring a variable . How much are we learning about then? This is given by the difference of entropies of before and after the measurement: .
- Meaning of mutual information: mutual information of 1 bit between two variables means that by querying one of them as much as possible, we can get one bit of information about the other.
- Properties of mutual information
- Limits: . Note that the first inequality becomes an equality iff the two variables are completely statistically independent.
- Mutual information is well-defined for continuous variables.
- Reparameterization invariance: for any , the following is true .
- Data processing inequality: For , . That is, information cannot get created in a transformation of a variable, whether deterministic or probabilistic.
- Information rate: Information is also an extensive quantity, so that it makes sense to define an information rate .

- Mutual information of a bivariate normal with a correlation coefficient is .
- For Gaussian variables , where is the signal, is the response, and is the noise related to the input, .