Physics 434, 2014: Search and first passage times
Back to the main Teaching page.
Back to Physics 434, 2014: Information Processing in Biology.
Here we focus on a particle trying to find a certain site using diffusion-like process. A very good discussion of this is in Introduction to Probability by CM Grinstead and JL Snell.
Random walk search in a different number of dimensions
Interestingly, even though a multidimensional random walk is just a combination of a few one-dimensional ones, the resulting properties of the multi-dimensional walk depend on the number of dimensions. This is particularly easy to see when random walks are used as a mechanism for a random search of a certain target. Many biological systems implement such diffusive search strategies because they don't require energy expenditures -- for example, a molecule searching for a binding site on the DNA undergoes a diffusive Brownian motion on its own, without having to implement it deliberately.
For a diffusive process, the radius of explored region goes as . The number of different sites in the explored region is . But the number of different visited sites is because the walker visits one site per time step. Hence each site is explored about times. Hence in 1-d each site is explored many times, in 2-d each site is (barely) explored (but it takes long since some sites are visited often), and in 3-d or higher dimensions very few sites are ever explored compared to the available sites. Actually, a statement that in 2-d each site is explored is not a trivial statement, and it doesn't immediately follow from our simple analysis. A better derivation of this is found in the Grinstead and Snell book.
Where would one use random walk search in a different number of dimensions? Consider the famous lac operon of E. coli, which I am sure you read about in your intro bio classes. You were told that the lac inhibitor protein lacI binds on the DNA, changes its shapes, and this prevents RNA polymerase to bind and hence to transcribe the operon, and eventually to produce the lacZ and lacY proteins involved in lactose metabolism and transport into the cell. But how easy is it for the transcription factor to find the necessary target site on the DNA? Let's study this mathematically.
At this point in class, I draw the picture of this regulatory circuit, and we also spend a substantial amount of time discussing the molecular mechanisms of the transcriptional control, and specifically some experiments done by our own Laura Finzi to understand DNA looping in this system. I am skipping this diversion in these lecture notes. The picture will be produced for one of the next incarnations of the class as well. If your own class notes are insufficient, you can read more about this system in (Dreisigmeyer et al., 2008).
For the lacI protein to shut of transcription, it first needs to find the necessary site on the DNA. This is a special case of the first passage time problem -- what is the distribution of times when a random walker gets to a certain point for the first time? A good review of this is in the Grinstead and Snell book.
First passage and first return
There are four different probability distributions that are worth considering when studying diffusive search:
- the distribution of being at a certain point at time having started at at ;
- the distribution of times to return to the initial point after the time ;
- the distribution of times of the first return to the initial point after the time ;
- the distribution of time to to go from to for the first time in time (first passage distribution).
These distributions are related. For example, In order to be somewhere at time , one must first get there for the first time at time , and then return to the same point in exactly the time . That is the distribution of being there is a convolution of the first passage and the return distributions. In other words, the moment generating function of being there is the product of the two other moment generating functions, . Similarly, to return to the origin in time , one must return to the origin for the first time at some , and then return to it again in . This gives .
Notice that two of these distributions are quite well known to us: the return and the being there distributions. For example, in a 1d DTDS unbiased random walk, the displacement is just the difference of the number of steps made to the right and to the left. So if one makes the total of steps, then of them must be to the right, and should be to the left (notice that one can have an odd/even only for odd/even . The probability of every such sequence of steps is , and there are such sequences which end at at time . As a result, the probability of being at at time is . Very similarly, to return to where it started from, the walker must make the same number of steps to the left and to the right, so that the probability of return at time in 1d: . Now having these two distributions, one can calculate their MGFs, divide the two, and get the MGF for the first passage probability distribution. Similarly, dividing MGFs of two return distributions with different times, gives the MGF of the first return distribution. We won't be doing these calculations explicitly here, but these is how they are done generally. Some of these calculations can be found in the Grinsted and Snell book.
For diffusion rather than walkers, the procedure is very similar, except that the being there and the return distributions are now given by the Gaussian (rather than combinatorial) expressions for the displacement of the diffusive particle.
The return/passage distributions are very nonintuitive, even in 1d. For example, plot the function (it is easier to calculate not factorials, but their logs, so that you don't operate with too large of numbers). Notice how slowly this function falls off! Only about as . That is, there is a probability that one will be returning to the origin even a very-very long time into the future. Another way to say this, in agreement with what we mentioned before, is that, in 1d, the particle will be coming back to zero many-many times, before it eventually leaves. You may want to derive the same result for diffusion, rather than DTDS walk -- you should quickly get the same dependence using the Gaussian form of the diffusive probability.
The analysis of this kind eventually helps us to get the properties of returns, first returns, and first passage, which we mentioned in the previous section, on a solid mathematical footing, namely:
- the mean return times diverge in all dimensions, there's a substantial chance of returning to zero even at very long times;
- probability of eventual return over the entire infinite time is 1 in 1-d and 2-d, and about 0.65 in 3-d.
- probabilities of the first passage in time are very small when ; they peak around this value, and fall off with long tails beyond this value.
Note that, for biased walks, situation changes. Look at Grinsted's book, an article by (Bel et al., 2010), or the problem of firing an action potential in one of the previous problems to see the difference.
We will test these predictions with simulations in a homework problem.
Berg-von Hippel search of the binding site by a transcription factor
Biological systems would like to use diffusive motion for a molecule, like a transcription factor (TF), to find its place (like the binding site on the DNA) -- this uses no energy. But is it effective? For the first time, such diffusive search by a transcription factor was analyzed, as far as I know, in (Berg and von Hippel, 1987; Berg 1981). They considered the following possibilities:
- Pure 1-d search, where the TF binds to the DNA and undertakes a 1-d random walk on the DNA. This is not efficient because very little time is spent on exploration -- the TF keeps coming back.
- Pure 3-d search, where the TF does 3-d diffusion in the cytoplasm until it finds the right binding site, and then binds. This is not efficient either because very few sites are ever explored in 3d, and the TF very likely will not come close to its needed target.
Their suggested solution was that the search proceeds in 1-d/3-d bouts. First the TF moves fast between distant parts of the cell using 3-d diffusion in the cytoplasm, and then it binds to the DNA, and explores a patch of the DNA using 1-d random walk very thoroughly, looking for the right site. If it find the cognate site, is binds to it strongly, if it doesn't, then it unbinds and repeats another cycle of 1-d/3-d search. Mathematical details of such 1-d/3-d search were analyzed by (Slutsky and Mirny, 2004), and we follow their analysis here.
The total search time is the sum of search times in both 1-d nd 3-d modes: , where is the number of search rounds until the target site is found. In the 1-d part of the search cycle, the protein explores sites before it unbinds. Hence the number of cycles before it finds the target site will be , where is the DNA length, provided that the 3-d part of the search moves the TF far away, so that it almost never comes back to the same patch of the DNA for the 1-d part of the search. Therefore, . However, we also know that , where is the 1-d diffusion constant. Therefore, . Is there an optimal time to spend on a 1-d search? Differentiating w.r.t. , we get . Thus the transcription factor should spend about the same amount of time in 1-d and 3-d search modes. Slutsky and Mirny (2004) review experimental confirmations of this.