Difference between revisions of "Physics 380, 2011: Lecture 13"
nemenman>Ilya (→Main lecture) |
m (1 revision imported) |
(No difference)
|
Latest revision as of 12:28, 4 July 2018
Back to the main Teaching page.
Back to Physics 380, 2011: Information Processing in Biology. This is the last lecture in the information theory block.
In-class presentation
Xiang Cheng presents the article by Pedraza and van Oudenaarden, 2005.
Warmup question
I am showing you a diagram of cellular information processing pathways of certain kind (MAPK pathways -- more on them later) courtesy of Jim Faeder (U Pittsburgh). We know how to measure the amount of information that travels along these pathways -- need to estimate mutual information between the quantities of interest. But can information-theoretic ideas also help us understand which of these pathways contribute more to the information processing than the others and, therefore, need first attention?
Main lecture
- Information theory provides a measure for characterization of quality of input output relations. But in addition, due to the data processing inequality, it also provides ways of unambiguously reducing dimensionality of the modeled biological system.
- Indeed, say we have a large-dimensional signal Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \vec{s}} and response . There's a certain mutual information between these . If we propose a reduction of the signal and response to , then by the data processing inequality.
- We can, for example, solve the problem like: Which inputs are informative of the outputs (and hence need to be accounted for in a model)? We omit different subsets of the inputs , calculate , and calculate the error due to omitting this signal . Those components that have a small can be safely neglected. This type of analysis can be used, for example, to understand which features of the neural code are important. In the subsequent presentation by Farhan, we will hear about using this trick to understand if high precision of neural spikes is important or not.
- We have discussed the problem of lossless coding earlier in the class. What if one is willing to transmit the message with errors, but still wants to reconstruct the message with a small loss. This is Shannon's rate distortion or lossy coding theorem. A good place to look this up is the book by Cover and Thomas.
- Suppose there's a loss function for recovering a value of signal Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x'} when, in fact, it should have been </math>x</math>.
- One can encode the signal as Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(x'|x)} , and the amount of bits one would need to store this encoding would be Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle I[x;x']} .
- The average loss experienced would be Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle <d(x,x')>=\int dx\, dx' P(x)P(x'|x) d(x,x')} .
- We are interested in the shortest, the most compressed encoding, Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle I[x;x']\to0} , and yet we want to have the smallest loss as well Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle <d(x,x')>\to0} . We can choose then to minimize , where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \lambda} is an arbitrary constant that controls how much we value compression over quality (think different bitrates in the mp3 coding).
- We minimize Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle L} over all Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(x'|x)} . This can only be done numerically, but there are effective algorithms (namely, Blahur-Arimoto algorithm described in the Cover and Thomas textbook) to do this.
- In some cases, it is unclear how to define Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle d}
. But maybe instead there's the following setup.
- We observe Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x} , and compress it to Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x'} by . However, we really care not about Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x} but about some other relevant variable Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle y} , given by Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(y|x)} .
- With compression, Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(y|x')=\sum_x P(y|x)P(x|x')=\sum_x P(y|x)\frac{P(x,x')}{P(x')}=\sum_x P(y|x)\frac{P(x'|x)P(x)}{\sum_x P(x'|x)P(x)}} .
- We can now maximize the information the compressed variable has about the relevant variable, while maximizing the compression itself. That is, we want to maximize Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle L=I[x';y]-\lambda I[x;x']} , where is again a control parameter.
- This should also be done numerically with the same Blahut-Arimoto algorithm. This approach is know as the Information Bottleneck method (see Tishby et al., 2000).