# Physics 380, 2010: Basic Probability Theory

## Lectures 2 and 3

During these lectures, we will review some basic concepts of probability theory, such as probability distributions, conditionals, marginals, expectations, etc. We will discuss the central limit theorem and will derive some properties of random walks. Finally, we will study some specific useful probability distributions.

A very good introduction to probability theory can be found in Introduction to Probability by CM Grinstead and JL Snell.

• Random variables: motion of E. coli, time to neural action potential; diffusion and first passage
• Sample space, events, probabilities -- probability space
• Properties of distributions:
• nonnegativity: ${\displaystyle P_{i}\geq 0}$
• unit normalization: ${\displaystyle \sum _{i=1}^{N}P_{i}=1}$
• nesting: if ${\displaystyle A\subset B}$ then ${\displaystyle P(A)\leq P(B)}$
• additivity (for non-disjoint events): ${\displaystyle P(A\cup B)=P(A)+P(B)-P(A\cap B)}$
• complementarity ${\displaystyle P(not\,A)=1-P(A)}$
• Continuous and discrete events: probability distributions and densities ${\displaystyle P_{i}}$ or ${\displaystyle P(x)}$
• Cumulative distributions ${\displaystyle C(x)=\int _{-\infty }^{x}P(x')dx'}$
• Change of variables for continuous and discrete variates ${\displaystyle P(x')=P(x){\frac {dx}{dx'}}}$, for multi-dimensional variables ${\displaystyle P({\vec {x'}})=P({\vec {x}})\left|{\frac {dx_{\alpha }}{dx'_{\beta }}}\right|}$
• Distributions:
• uniform: probability of emitting a spike or doing a tumble by an E.coli. ${\displaystyle P(t)=1/T,\;0\leq t\leq T}$
• exponential: time to the next action potential at constant rate ${\displaystyle P(t)=re^{-rt}}$.
• Poisson: number of action potentials in a given interval; number of E. coli tumbles; ${\displaystyle P(n)={\frac {(rT)^{n}}{n!}}e^{-rT}}$
• normal: diffusive motion ${\displaystyle P(x)={N}(\mu ,\sigma ^{2})={\frac {1}{{\sqrt {2\pi }}\sigma }}\exp {\left[-{\frac {(x-\mu )^{2}}{2\sigma ^{2}}}\right]}}$
• ${\displaystyle \delta }$-distribution: deterministic limit ${\displaystyle \delta (x-\mu )=\lim _{\sigma \to 0}{\frac {1}{{\sqrt {2\pi }}\sigma }}\exp {\left[-{\frac {(x-\mu )^{2}}{2\sigma ^{2}}}\right]}}$; ${\displaystyle \delta (0)\to \infty ,\;\delta (x\neq 0)=0}$.
• Conditional and joint probabilities, Bayes theorem: ${\displaystyle P(A,B)=P(A|B)P(B)=P(B|A)P(A)}$
• independence: two variables are independent if and only if ${\displaystyle P(A,B)=P(A)P(B)}$, or, equivalently, ${\displaystyle P(A|B)=P(A)}$ or ${\displaystyle P(B|A)=P(B)}$.
• Distributions:
• multivariate normal: ${\displaystyle P({\vec {x}}|{\vec {\mu }},\Sigma )={\frac {1}{[2\pi ]^{d/2}\left|\Sigma \right|^{1/2}}}\exp \left[-{\frac {1}{2}}\left({\vec {x}}-{\vec {\mu }}\right)^{T}\Sigma ^{-1}\left({\vec {x}}-{\vec {\mu }}\right)\right]}$, here ${\displaystyle \Sigma }$ is the covariance matrix ${\displaystyle \Sigma =\left[{\begin{array}{llll}\langle (x_{1}-\mu _{1})(x_{1}-\mu _{1})\rangle &\langle (X_{1}-\mu _{1})(X_{2}-\mu _{2})\rangle &\cdots &\langle (X_{1}-\mu _{1})(X_{n}-\mu _{n})\rangle \\\langle (X_{2}-\mu _{2})(X_{1}-\mu _{1})\rangle &\langle (X_{2}-\mu _{2})(X_{2}-\mu _{2})\rangle &\cdots &\langle (X_{2}-\mu _{2})(X_{n}-\mu _{n})\rangle \\\vdots &\vdots &\ddots &\vdots \\\langle (X_{n}-\mu _{n})(X_{1}-\mu _{1})\rangle &\langle (X_{n}-\mu _{n})(X_{2}-\mu _{2})\rangle &\cdots &\langle (X_{n}-\mu _{n})(X_{n}-\mu _{n})\rangle \end{array}}\right].}$
• Expected values: ${\displaystyle E(f(x))=\int _{-\infty }^{+\infty }f(x)P(x)dx}$. In particular, a few of the expectation values are very common: the mean, ${\displaystyle \langle x\rangle =\mu =\int _{-\infty }^{+\infty }xP(x)dx}$, and the variance ${\displaystyle \langle x^{2}\rangle -\langle x\rangle ^{2}=\sigma ^{2}=\int _{-\infty }^{+\infty }x^{2}P(x)dx-\mu ^{2}}$.
• addition of independent variables: in general, ${\displaystyle E(f(x)+g(x))=E(f(x))+E(g(x))}$, and ${\displaystyle E(f(x)g(y))=E(f(x))E(g(y))}$, provided ${\displaystyle x}$ and ${\displaystyle y}$ are independent, that is, ${\displaystyle P(x,y)=P(x)P(y)}$.
• Moments, central moments, and cumulants
• moments: ${\displaystyle \mu _{n}=E(x^{n})=\int x^{n}P(x)dx}$
• central moments: ${\displaystyle m_{n}=E((x-\mu )^{n})}$: distribution mean, width, asymmetry, flatness, etc...
• cumulants: ${\displaystyle c_{n}}$, ${\displaystyle c_{1}=\mu }$, ${\displaystyle c_{2}=\sigma ^{2}}$, and higher order cumulants measure the difference of the distribution from a Gaussian (all higher cumulants for a Gaussian are zero)
• Moment and cumulant generating functional: the Gaussian integral
• Moment generating function (MGF): ${\displaystyle M_{x}(t)=E(e^{tx})}$. The utility of MGF comes from the following result: ${\displaystyle \mu _{n}=\left.{\frac {d^{n}M_{x}(t)}{dt^{n}}}\right|_{t=0}}$.
• Properties of MGF:${\displaystyle M_{x+a}(t)=e^{at}M_{x}(t)}$. From this we can show that if ${\displaystyle P(y|x)=P_{1}(y-x)}$, that is, ${\displaystyle P_{3}(y)=\int P_{1}(x-y)P_{2}(x)}$, then ${\displaystyle M_{3}(t)=M_{1}(t)M_{2}(t)}$.
• Cumulant generating function (CGF): ${\displaystyle C_{x}(t)=\log M_{x}(t)}$. Then the cumulants are: ${\displaystyle c_{n}=\left.{\frac {d^{n}C_{x}(t)}{dt^{n}}}\right|_{t=0}}$
• Frequencies and probabilities: Law of large numbers. If ${\displaystyle S={\frac {1}{n}}\sum x_{i}}$, then ${\displaystyle \mu _{S}=\mu _{x}}$ and ${\displaystyle \sigma _{S}^{2}=\sigma _{x}^{2}/n}$. Thus the sample mean approaches the true mean of the distribution. See one of the homework problems for week 2.
• Central limit theorem: sum of i.i.d. random variables approaches a Gaussian distribution. See one of the homework problems for week 2.
• Random walk and diffusion:
• Unbiased random walk in 1-d: ${\displaystyle T}$ steps of ${\displaystyle \pm a}$ length each. For the total displacement, ${\displaystyle \mu =T\mu _{\rm {onestep}}=T\times 0=0}$ and ${\displaystyle \sigma ^{2}=T\sigma _{\rm {onestep}}^{2}=T\times a^{2}}$
• Conventionally, for a diffusive process: ${\displaystyle \mu =vT}$ and ${\displaystyle \sigma ^{2}=2DdT}$, where ${\displaystyle d}$ is the dimension. So, random walk is an example of a diffusive process on long time scales, and for this random walk: ${\displaystyle v=0}$ and ${\displaystyle D=a^{2}/2}$.
• Biased walk gets ${\displaystyle v\neq 0}$.
• multivariate random walk: ${\displaystyle {\vec {x}}=0}$, ${\displaystyle \sigma _{r}^{2}=2Ddt}$, where ${\displaystyle d}$ is the dimension, and ${\displaystyle r=|{\vec {x}}|}$. We derive this by noting that diffusion/random walk in every dimension is independent of the other dimensions.

## Homework (due Sep 10)

1. (Problem 1.2.1 in Grinstead and Snell). Let ${\displaystyle \Omega =\{a,b,c\}}$ be a sample space. Let ${\displaystyle p(a)=1/2,\,p(b)=1/3}$, and ${\displaystyle p(c)=1/6}$. Find the probabilities for all eight subsets of ${\displaystyle \Omega }$.
2. Exponential, ${\displaystyle P(x|a)=ae^{-at}}$, Poisson ${\displaystyle P(n|n_{0})={\frac {n_{0}^{n}}{n!}}e^{-n_{0}}}$, and multivariate Gaussian ${\displaystyle P({\vec {x}}|{\vec {\mu }},\Sigma )={\frac {1}{[2\pi ]^{d/2}\left|\Sigma \right|^{1/2}}}\exp \left[-{\frac {1}{2}}\left({\vec {x}}-{\vec {\mu }}\right)^{T}\Sigma ^{-1}\left({\vec {x}}-{\vec {\mu }}\right)\right]}$ (where ${\displaystyle d}$ is the dimensionality of ${\displaystyle {\vec {x}}}$) probability distributions are some of the most important distributions that we will see in this class. Calculate the means and the variances for these distributions. Note that for the Gaussian distribution, the easiest way to calculate the mean and the variance is to calculate the moment generating functional first and then differentiate it. Undergraduates: work with 1-dimensional Gaussians, where ${\displaystyle \Sigma ^{-1}=1/\sigma ^{2}}$. Graduate students: Calculate the covariance for the multivariate normal distribution. Pay attention how we do integrals over Gaussians -- we will use this over and over in this class. Also note that logarithms of moment generating functionals are called cumulant generating functionals, and they are often easier to work with. We will denote them as ${\displaystyle C_{x}(t)=\log M_{x}(t)}$. Note that ${\displaystyle {\frac {d^{2}C_{x}(t)}{dt^{2}}}=\sigma ^{2}}$.
3. An E. coli moving on a 2-dimensional surface is being tracked in an experiment. It chooses a direction at random and runs, then tumbles and reorients randomly, runs for the second time, tumbles yet again, and keeps running. What is the probability that all three of the directions that it chooses all fall not farther than ${\displaystyle \pi }$ from each other. That is, what is the probability that the bacterium moves in roughly speaking the same direction all three times? For graduate students: Can you generalize this for ${\displaystyle n}$ tumbles, instead of three?
4. In class we discussed an approximation for the motion of E. coli, where the bacterium would tumble every ${\displaystyle \tau }$ seconds, moving with the velocity of ${\displaystyle v}$ between the tumbles. We have concluded that the long-term displacement of the bacterium can be well characterized by diffusion: mean displacement is zero, and ${\displaystyle \sigma \propto {\sqrt {t}}}$.
• Calculate the coefficient of proportionality for this relation for the bacterium in one dimension. By convention, for a diffusion in ${\displaystyle d}$ dimensions, we write: ${\displaystyle \sigma ^{2}=2dDt}$, where ${\displaystyle D}$ is the diffusion coefficient. What is the diffusion coefficient for this model?
5. Let's now improve the model and say that E. coli tumbles at random times, and the distribution of intervals between two successive tumbles is the exponential distribution with the mean ${\displaystyle \tau }$.
• Derive the distribution of the number of times the E.coli will tumble over a time ${\displaystyle T}$.
• Remember that means and variances of independent random variables add and use this fact repeatedly to calculate the mean and the variance of the displacement of E. coli in this model (still in 1 dimension). Is it still described well by a diffusion model? What is the diffusion coefficient?
• For Grads: If we complicate the model even further, and say that the velocity for each run is sampled independently from ${\displaystyle N(v_{0},\sigma _{v}^{2})}$, does this change the diffusive behavior?
• What should we do to the distributions run durations (and velocities) to violate the diffusive limit?
6. The law of large numbers states that when a random variable is independently sampled from a distribution many times, its sample mean approaches the mean of the distribution. We have almost showed this in class, but stopped a bit short. Let's finish the work. Recall that, when independent random variables are summed, means add and variances add (if both exist). Use this to show that the mean of a sample of ${\displaystyle N}$ independent, identically distributed (denoted: i.i.d.) variables ${\displaystyle x_{i}}$ (with mean ${\displaystyle \mu }$ and variance ${\displaystyle \sigma ^{2}}$), namely ${\displaystyle S_{n}={\frac {1}{n}}\sum _{i=1}^{n}x_{i}}$, has the mean equal to ${\displaystyle \mu }$, and the variance equal to ${\displaystyle \sigma _{\Sigma }^{2}=\sigma ^{2}/n}$. Therefore, as ${\displaystyle n}$ grows, ${\displaystyle S_{n}}$ becomes closer and closer to ${\displaystyle \mu }$, proving the law.
7. The most remarkable law in the probability theory is the Central Limit Theorem (CLT). Its colloquial formulation is as follows: a sum of many i.i.d. random variables is almost normally distributed. This is supposed to explains why experimental noises are often normally distributed as well. More precisely, suppose ${\displaystyle x_{i}}$ are i.i.d. random variables with mean ${\displaystyle \mu }$ and variance ${\displaystyle \sigma ^{2}}$. Then the CLT says that ${\displaystyle S_{n}={\frac {1}{\sqrt {n}}}\sum _{i=1}^{N}{\frac {x_{i}-\mu }{\sigma }}={\frac {1}{\sqrt {n}}}\sum _{i=1}^{N}\xi _{i}}$ is distributed according to ${\displaystyle N(0,1)}$ (called the standard normal distribution), provided ${\displaystyle n}$ is sufficiently large.
• Using either Matlab, Excel, or any other package, generate a sequence of ${\displaystyle n=25}$ random variables uniformly distributed between 0 and 1. Calculate ${\displaystyle S_{25}}$ for them. Do this a 100 times and histogram the resulting 100 values of ${\displaystyle S_{25}}$. Does the histogram look as if it's coming from a standard normal?
• For graduate students. Let's prove the CLT.
• First show that if ${\displaystyle z=x+y}$ (where all three are random variables), then ${\displaystyle M_{z}(t)=M_{x}(t)M_{y}(t)}$, or, alternatively, ${\displaystyle C_{z}=C_{x}+C_{y}}$. In particular, this means that, for ${\displaystyle z=\sum _{i=1}^{n}x_{i}}$, we have ${\displaystyle M_{z}=(M_{x})^{n}}$.
• Write ${\displaystyle M_{\xi }(t)}$ to the first few orders in the Taylor series in ${\displaystyle t}$. Use the identity ${\displaystyle \lim _{n\to \infty }\left(1+1/n\right)^{n}=e}$ to show that ${\displaystyle M_{S_{n}}}$ approaches the moment generating functional for a standard normal as ${\displaystyle n\to \infty }$.