Physics 434, 2012: Lectures 10-11
Back to the main Teaching page.
In these lectures, we cover some background on information theory. A good physics style introduction to this problem can be found in the upcoming book by Bialek (Bialek 2010). A very nice, and probably still the best, introduction to information theory as a theory of communication is (Shannow and Weaver, 1949). A standard and very good textbook on information theory is (Cover and Thomas, 2006).
- We would like to characterize how much information is transmitted by a cellular signaling pathway, say the NF-B pathway depicted on the right (Cheong et al. 2011) , or in E. coli transcription (Guet et al., 2002; Ziv et al., 2007), as shown on the left. What characteristics of the system should we measure in order to be able to quantify this? Specifically, do we need:
- , only?
- , , and , only?
- for all s only?
- for all s and , that is, the entire ?
- For transmitting information through a synthetic transcriptional circuit in E. coli (Guet et al., 2002) -- see picture on the board -- which of the following quantities might constrain the mutual information between the chemical signal and the expressed reporter response?
- The mean molecular copy number of the reporter molecule.
- The mean molecular copy number of the other, non-reporter genes.
- The probability distribution of the input signals.
- Setting up the problem: How do we measure information transmitted by a biological signaling system?
- Shannon's axioms and the derivation of entropy: if a variable is observed from a distribution then the amount of the information we gain from this observation must obey the following properties.
- If the cardinality of the distribution grows and the distribution is uniform, then the measure of information grows as well.
- The measure of information must be a continuous function of the distribution
- The measure of information is additive. That is, for a fine graining of into , we should have .
- Up to a multiplicative constant, the measure of information is then , which is also called the Boltzman-Shannon entropy. And we fix the constant by defining the entropy of a uniform binary distribution to be 1. Then . The entropy is then measured in bits.
- Meaning of entropy: Entropy of 1 bit means that we have gained enough information to answer one yes or no (binary) question about the variable .
- Properties of entropy (positive, limited, convex):
- , where is the cardinality of the distribution. Moreover, the first inequality becomes an equality iff the variable is deterministic (that is, one event has a probability of 1), and the second inequality is an equality iff the distribution is uniform.
- Entropy is a convex function of the distribution
- Entropies of independent variables add.
- Entropy is an extensive quantity: for a joint distribution , we can define an entropy rate .
- Differential entropy: a continuous variable can be discretized with a step , and then the entropy is . This formally diverges at fine discretization: we need infinitely many bits to fully specify a continuous variable. The integral in the above expression is called the differential entropy, and whenever we write for continuous variables, we mean the differential entropy.
- Entropy of a normal distribution with variance is .
- Multivariate entropy is defined with summation/integration of log-probability over multiple variables, cf. entropy rate above.
- Conditional entropy is defined as averaged log-probability of a conditional distribution
- Mutual information: what if we want to know about a variable , but instead are measuring a variable . How much are we learning about then? This is given by the difference of entropies of before and after the measurement: .
- Meaning of mutual information: mutual information of 1 bit between two variables means that by querying one of them as much as possible, we can get one bit of information about the other.
- Properties of mutual information
- Limits: . Note that the first inequality becomes an equality iff the two variables are completely statistically independent.
- Mutual information is well-defined for continuous variables.
- Reparameterization invariance: for any , the following is true .
- Data processing inequality: For , . That is, information cannot get created in a transformation of a variable, whether deterministic or probabilistic.
- Information rate: Information is also an extensive quantity, so that it makes sense to define an information rate .
- Mutual information of a bivariate normal with a correlation coefficient is .
- For Gaussian variables , where is the signal, is the response, and is the noise related to the input, .