Ilya: 1 revision imported

2018-07-04T16:28:41Z

1 revision imported

nemenman>Ilya: /* Main lecture */

2011-11-03T13:40:31Z

‎Main lecture

New page

{{PHYS380-2011}}
This is the last lecture in the information theory block.

==In-class presentation==
Xiang Cheng presents the article by Pedraza and van Oudenaarden, 2005.

==Warmup question==
I am showing you a diagram of cellular information processing pathways of certain kind (MAPK pathways -- more on them later) courtesy of Jim Faeder (U Pittsburgh). We know how to measure the amount of information that travels along these pathways -- need to estimate mutual information between the quantities of interest. But can information-theoretic ideas also help us understand which of these pathways contribute more to the information processing than the others and, therefore, need first attention?

==Main lecture==
#Information theory provides a measure for characterization of quality of input output relations. But in addition, due to the data processing inequality, it also provides ways of unambiguously reducing dimensionality of the modeled biological system.
#*Indeed, say we have a large-dimensional signal <math>\vec{s}</math> and response <math>\vec{r}</math>. There's a certain mutual information between these <math>I[\vec{s};\vec{r}]=I_0</math>. If we propose a reduction of the signal and response to <math>f=f(\vec{s}),g=g(\vec{r})</math>, then <math>I[f;g]\le I[\vec{s};\vec{r}]</math> by the data processing inequality.
#We can, for example, solve the problem like: Which inputs are informative of the outputs (and hence need to be accounted for in a model)? We omit different subsets of the inputs <math>s_i</math>, calculate <math>I[\vec{s}-s_i;\vec{r}]=I_{\not i}</math>, and calculate the error due to omitting this signal <math>\Delta_i=\frac{I_0-I_{\not i}}{I_0}</math>. Those components that have a small <math>\Delta_{\not i}</math> can be safely neglected. This type of analysis can be used, for example, to understand which features of the neural code are important. In the subsequent presentation by Farhan, we will hear about using this trick to understand if high precision of neural spikes is important or not.
#We have discussed the problem of lossless coding earlier in the class. What if one is willing to transmit the message with errors, but still wants to reconstruct the message with a small loss. This is Shannon's rate distortion or lossy coding theorem. A good place to look this up is the book by Cover and Thomas.
#*Suppose there's a loss function <math>d(x,x')\ge0</math> for recovering a value of signal <math>x'</math> when, in fact, it should have been </math>x</math>.
#*One can encode the signal as <math>P(x'|x)</math>, and the amount of bits one would need to store this encoding would be <math>I[x;x']</math>.
#*The average loss experienced would be <math><d(x,x')>=\int dx\, dx' P(x)P(x'|x) d(x,x')</math>.
#* We are interested in the shortest, the most compressed encoding, <math>I[x;x']\to0</math>, and yet we want to have the smallest loss as well <math><d(x,x')>\to0</math>. We can choose then to minimize <math>L=I[x;x']+\lambda <d(x,x')></math>, where <math>\lambda</math> is an arbitrary constant that controls how much we value compression over quality (think different bitrates in the mp3 coding).
#*We minimize <math>L</math> over all <math>P(x'|x)</math>. This can only be done numerically, but there are effective algorithms (namely, Blahur-Arimoto algorithm described in the Cover and Thomas textbook) to do this.
#In some cases, it is unclear how to define <math>d</math>. But maybe instead there's the following setup.
#*We observe <math>x</math>, and compress it to <math>x'</math> by <math>P(x'|x)</math>. However, we really care not about <math>x</math> but about some other ''relevant'' variable <math>y</math>, given by <math>P(y|x)</math>.
#*With compression, <math>P(y|x')=\sum_x P(y|x)P(x|x')=\sum_x P(y|x)\frac{P(x,x')}{P(x')}=\sum_x P(y|x)\frac{P(x'|x)P(x)}{\sum_x P(x'|x)P(x)}</math>.
#*We can now maximize the information the compressed variable has about the relevant variable, while maximizing the compression itself. That is, we want to maximize <math>L=I[x';y]-\lambda I[x;x']</math>, where <math>\lambda</math> is again a control parameter.
#*This should also be done numerically with the same Blahut-Arimoto algorithm. This approach is know as the Information Bottleneck method (see Tishby et al., 2000).

← Older revision	Revision as of 16:28, 4 July 2018
(No difference)

Physics 380, 2011: Lecture 13 - Revision history

Ilya: 1 revision imported

nemenman>Ilya: /* Main lecture */