Physics 380, 2011: Homework 7

From Ilya Nemenman: Theoretical Biophysics @ Emory
Revision as of 17:47, 14 November 2011 by nemenman>Ilya
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Emory Logo

Back to the main Teaching page.

Back to Physics 380, 2011: Information Processing in Biology.

  1. Let's explore the channel coding theorem (and also learn how to do optimization in Matlab and Octave). Suppose we have the following molecular information processing channel: for any number of molecules on the input Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x} , the channel output Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle y} is a Poisson variable with that mean (this is relevant to Ziv et al, 2007, which was discussed by Martin in one of the lectures). That is, . Write a code to estimate the mutual information over this channel for an arbitrary distribution of the (discrete) input signal. Use the input distribution as an input to the function you write. Explore different input distributions, assuming that the number of input molecules is between 0 and 64. What are the general features of the input distribution that achieve higher mutual information? Recall that Ziv et al. have shown that you should be able to send Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle 1/2\log_2 \bar{N}\approx 1/2 \log_2 64/2=2.5} bits through this channel. Can you find a distribution that allows you to send close to these Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \approx 2.5} bits? Submit plots of your "most informative" distributions. For Grad Students: Now try to maximize this mutual information by changing the input distribution comprehensively using minimization routines (in Matlab, you will need to use the fminsearch functions, and in octave you will use the sqp function; Matlab and octave are not compatible for optimization routines). How many bits can you send through at the maximum?
  2. When sketching the proof of Shannon's channel coding theorem, we introduced the idea of typicality. Let's play with it and related notions. Take Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x} as a binary variable with mean 1/3. Generate N=10 samples Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_i} from this distribution and record logarithm of the probability of the sequence you got. Repeat a 100 times and histogram these log-probabilities. What is the mean of the log-probabilities? Now increase Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle N} and observe how all sequences become essentially equiprobable. This is the equipartition theorem in action.
  3. Extra credit: Consider now a binary variable Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x} with a mean 1/2. This is the input to the information channel. Also consider another binary variable Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle y} , such that is given by Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(0|0)=3/4,\,P(1|0)=1/4,\,P(0|1)=1/4,\,P(1|1)=3/4} . This is the channel output. What is Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle I[X,Y]} ? We should be able to transmit these many bits through this channel with no errors. Let's suppose the third binary variable Z has a distribution Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(z=0)=1/4,\,P(z=1)=3/4} . Can you think of a strategy how to encode a sequence of z's into a sequence of x's, so that the z's can be (almost) uniquely decoded from the resulting random sequence of y's? How close is your code to the channel capacity bound predicted by Shannon?
  4. In class, we have considered a population of bacteria that choose between two different behavioral responses to environmental states. Let's do a straightforward generalization of this problem to a case of more states. In response to environmental stresses, Bacillus subtilis can choose one of three phenotypic outcomes: it can try to divide in a normal fashion, it may engage in uptake of DNA fragments from the surrounding environment (in the hope of getting some new and useful genes; this is called competence), or it may start sporulating. Suppose that the environment also comes in three states: normal, medium-stress, and high-stress. In normal environments, the dividing cells divide the fastest, DNA-uptaking cells divide slower, and sporulating cells don't divide at all. In medium stress environments, the competent cells do better then the rest. Finally, in high-stress environments, spores persist, competent cells slowly decrease in numbers, and normal cells collapse. Set up the problem as a betting matrix like we did in class and calculate the fraction of bacteria that should take each of these phenotypes for the optimum growth. What is this optimal growth? Is the investment strategy still described by matching? Is the optimal growth still limited by what the bacteria don't know about the environment. Note: you don't need to derive the maximum growth rate in a compact expression. Once you show me how to do it, let's not spend too much time on the algebra to actually get it done. Open ended problem, not required: assume now that the transitions among the environments are not totally random. That is, high stress environments are more likely after medium-stress than after normal. And normal are also more likely after medium-stress than after high-stress. At the same time, it takes a while for cells to switch. That is, a normal cell can easily switch into a spore or into a competent state, but both the competent state and the spore must first switch back to normal before assuming any other form. Can you calculate the optimal phenotypic response strategies and the optimal growth rate in this case?
  5. We briefly touched on using mutual information to choose optimal reduced models for the data. Consider now three neurons: X, Y, Z. We need to understand if both X and Y project into Z and affect its spiking. Further, if both affect Z, we need to understand how big of an error we make by neglecting one of the projections. Download the following simulated neural spiking data. In this file, you will find a variable spikes, which is 3x1e4 in size. The rows are the activities of each of the neurons, and the columns are the time steps. Each entry in the matrix tells us how many times a particular neuron has fired in a particular time step. Estimate the mutual informations among the neurons and answer the two questions above.
  6. Graduate students: Try to understand the Information Bottleneck method (Tishby et al, 2000). For a given distribution of a signal Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x} and the relevance variable Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle y} , Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(x,y)} , this method tries to compress into Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle z} , while preserving as much information in Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle z} as possible about Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle y} . That is, the method maximizes Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle L=I[Z,Y]-\lambda I[X,Z]} over Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(z|x)} , and the parameter Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \lambda} controls the tradeoff between the strength of the compression and the amount of information retained about Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle y} . Suppose I vary Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \lambda} and plot the accuracy vs. compression curve, that is Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle I[Y,Z]/I[X,Y]} vs. Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle I[X,Z]/S[X]} as a function of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \lambda} . What is the upper bound on the slope of this curve at Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \lambda\to\infty} ? Can you derive the lower bound (not required)?