Physics 380, 2011: Homework 7

From Ilya Nemenman: Theoretical Biophysics @ Emory
Revision as of 11:28, 4 July 2018 by Ilya (talk | contribs) (1 revision imported)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Emory Logo

Back to the main Teaching page.

Back to Physics 380, 2011: Information Processing in Biology.

  1. Let's explore the channel coding theorem (and also learn how to do optimization in Matlab and Octave). Suppose we have the following molecular information processing channel: for any number of molecules on the input , the channel output is a Poisson variable with that mean (this is relevant to Ziv et al, 2007, which was discussed by Martin in one of the lectures). That is, . Write a code to estimate the mutual information over this channel for an arbitrary distribution of the (discrete) input signal. Use the input distribution as an input to the function you write. Explore different input distributions, assuming that the number of input molecules is between 0 and 64. What are the general features of the input distribution that achieve higher mutual information? Recall that Ziv et al. have shown that you should be able to send bits through this channel. Can you find a distribution that allows you to send close to these bits? Submit plots of your "most informative" distributions. For Grad Students: Now try to maximize this mutual information by changing the input distribution comprehensively using minimization routines (in Matlab, you will need to use the fminsearch functions, and in octave you will use the sqp function; Matlab and octave are not compatible for optimization routines). How many bits can you send through at the maximum?
  2. When sketching the proof of Shannon's channel coding theorem, we introduced the idea of typicality. Let's play with it and related notions. Take as a binary variable with mean 1/3. Generate N=10 samples from this distribution and record logarithm of the probability of the sequence you got. Repeat a 100 times and histogram these log-probabilities. What is the mean of the log-probabilities? Now increase and observe how all sequences become essentially equiprobable. This is the equipartition theorem in action.
  3. Extra credit: Consider now a binary variable with a mean 1/2. This is the input to the information channel. Also consider another binary variable , such that is given by . This is the channel output. What is ? We should be able to transmit these many bits through this channel with no errors. Let's suppose the third binary variable Z has a distribution . Can you think of a strategy how to encode a sequence of z's into a sequence of x's, so that the z's can be (almost) uniquely decoded from the resulting random sequence of y's? How close is your code to the channel capacity bound predicted by Shannon?
  4. In class, we have considered a population of bacteria that choose between two different behavioral responses to environmental states. Let's do a straightforward generalization of this problem to a case of more states. In response to environmental stresses, Bacillus subtilis can choose one of three phenotypic outcomes: it can try to divide in a normal fashion, it may engage in uptake of DNA fragments from the surrounding environment (in the hope of getting some new and useful genes; this is called competence), or it may start sporulating. Suppose that the environment also comes in three states: normal, medium-stress, and high-stress. In normal environments, the dividing cells divide the fastest, DNA-uptaking cells divide slower, and sporulating cells don't divide at all. In medium stress environments, the competent cells do better then the rest. Finally, in high-stress environments, spores persist, competent cells slowly decrease in numbers, and normal cells collapse. Set up the problem as a betting matrix like we did in class and calculate the fraction of bacteria that should take each of these phenotypes for the optimum growth. What is this optimal growth? Is the investment strategy still described by matching? Is the optimal growth still limited by what the bacteria don't know about the environment. Note: you don't need to derive the maximum growth rate in a compact expression. Once you show me how to do it, let's not spend too much time on the algebra to actually get it done. Open ended problem, not required: assume now that the transitions among the environments are not totally random. That is, high stress environments are more likely after medium-stress than after normal. And normal are also more likely after medium-stress than after high-stress. At the same time, it takes a while for cells to switch. That is, a normal cell can easily switch into a spore or into a competent state, but both the competent state and the spore must first switch back to normal before assuming any other form. Can you calculate the optimal phenotypic response strategies and the optimal growth rate in this case?
  5. We briefly touched on using mutual information to choose optimal reduced models for the data. Consider now three neurons: X, Y, Z. We need to understand if both X and Y project into Z and affect its spiking. Further, if both affect Z, we need to understand how big of an error we make by neglecting one of the projections. Download the following simulated neural spiking data. In this file, you will find a variable spikes, which is 3x1e4 in size. The rows are the activities of each of the neurons, and the columns are the time steps. Each entry in the matrix tells us how many times a particular neuron has fired in a particular time step. Estimate the mutual informations among the neurons and answer the two questions above.
  6. Graduate students: Try to understand the Information Bottleneck method (Tishby et al, 2000). For a given distribution of a signal and the relevance variable , , this method tries to compress into , while preserving as much information in as possible about . That is, the method maximizes over , and the parameter controls the tradeoff between the strength of the compression and the amount of information retained about . Suppose I vary and plot the accuracy vs. compression curve, that is vs. as a function of . What is the upper bound on the slope of this curve at ? Can you derive the lower bound (not required)?