Physics 212, 2018: Lecture 19

From Ilya Nemenman: Theoretical Biophysics @ Emory
Jump to: navigation, search
Emory Logo

Back to the main Teaching page.

Back to Physics 212, 2018: Computational Modeling.

In today's lecture, we will study how to generate different types of random numbers using just a standard, uniformly distributed random number. If time permits, we will also study how random errors scale with the number of random trials.

Different types of random numbers

In the previous lecture, we discussed how a computer would generate pseudorandom floating point numbers, uniformly distributed between zero and one. How do we use these random numbers to generate other random number types?

Uniform continuous random numbers

The standard linear congruential generator allows us to generate a uniform pseudo-random floating point number between 0 and 1, called the standard uniform random number. How do we then generate a number between, say, and ? For this, we only need to scale and shift the standard number. That is, if we say that , where is the standard uniform number, then is the uniform random number between the two required limits.

Multinomial random number

A dice is an example of a multinomial random number with six equally probable outcomes. In general, a multinomial random variable is a random variable that can have a finite, discrete set of values, with a given probability of each of these values. We generate these random numbers by sub-dividing the interval between zero and one into subintervals. The number of the sub-intervals is equal to the number of possible outcomes of the multinomial variable, and the length of each sub-interval is equal to the probability of each outcome. We then generate a uniform floating point number between zero and one and check which of the sub-intervals it fell into. The index of the subinterval is then the generated multinomial number. See the code in the script uploaded for this module.

Exponential random number

Real-valued variables that have an exponential distribution are also very common. For example, the time between two subsequent nuclear decays, or the time between two nearby spikes in the simplest models of neuronal dynamics is exponentially distributed. More specifically, a random variable is called exponentially distributed with the mean if the probability density function for this variable obeys . The distribution with is called the standard exponential distribution. Generation of exponential random numbers illustrates a few important properties of probability distributions. First of all, if we have an access to a standard exponential pseudo-random number, then by just multiplying it by we will get the exponentially distributed number with a mean of . This means that we only need to figure out how to generate standard numbers.

For this, let's go back to our definition of continuous random numbers as a limit of an ever-finer discretization of the real axis. Suppose we have a real valued variable , which is distributed according to . This means that the probability of having a sample land in the interval tends to . Suppose now we have defined a new variable , or, alternatively, , where is the inverse function to (note that we are now assuming that such inverse exists!) Of course, is also a random variable. How is it distributed? We note that the probability of a sample to land in our interval should not depend on whether we marked its end points as , or as . In other words, , where we added the absolute value signs to ensure that, even if an interval is negative (read from right to left), the probability is still positive. This gives, in the limit of small , .

How does this help with generating exponential numbers? Suppose that is a standard uniform number, so that . Let's choose , and then . Then . In other words, the variable is a pseudo-random standard exponential variable! So generating samples of such random variables is easy -- just generate standard uniform pseudo-random variables and take the negative log of them. The script that goes with this module shows that built-in functions for generation of exponential random samples (nprnd.exponential()) generate output that is indistinguishable from our approach of taking the negative log.

Uniform random number in a square

I hope you see that this is easy -- just generate a random and a random coordinates using the uniform pseudo-random number, and the job is done.

Uniform random number in a circle

Let's suppose we need a uniform random number in a circle around with a radius of 1. You work: of the following methods that might seem to generate such a random number in a circle, choose the one that you think will be the most accurate and the fastest and implement it. Or create your own method and implement it. Generate a 1000 such random points and plot them. Do you see what you expected to see?

Random angle / random scale
Generate a random pair between -1 and 1. The pair defines a ray from zero to this point. Now generate a random distance between 0 and 1, and then scale the point to be this random distance from zero on the ray.
Random radius / random angle
Generate a random angle between 0 and and a random radius between 0 and 1. The point will be at this angle and this radius.
Random radius-squared / random angle
Generate a random angle between 0 and and a random radius-squared between 0 and 1. Take a square root of the second number to get the random radius. The point will be at this angle and this radius.
Rejection method
Generate a random pair between -1 and 1. Verify that the pair landed within a circle. Reject it if it's outside and accept otherwise. Note that you will need to generate more than such random pairs to generate points inside the circle.

Don't forget to submit your code. Some of these methods will generate too many points at the center of the circle compared to the edges. Those of you who know multivariate calculus should extend the calculation we did for generation of exponential random numbers to the multivariate case and relate the higher density of points near the center of the circle to the Jacobian of the transformation we make when we generate such two-dimensional random numbers from uniform standard numbers.

The scripts uploaded with this module demonstrate these methods. Which one of the ones that work would be the fastest? Why? Verify your guess yourself and explain what you see.

Gaussian random variables

A Gaussian (or normal) random variable is a variable with the probability density function . is called the mean, and is called the standard deviation. This is the familiar bell-shaped distribution centered around with the width of </math>\sigma</math>. The standard normal distribution has and <\math>\sigma^2=1</math>. Clearly, if one can generate a standard normal random variable, one can get a more general normal variable from it by multiplying by and shifting by . So we only need to figure out how to generate a standard normal variable.

While we will skip the details, the generation of a random variable on a circle hints how to generate such standard random variables easily. First, imagine that I generate not one, but a pair of standard random normal variables . Then , and . Thus the radius-squared is exponentially distributed. So maybe if one should generate a random angle between 0 and and a random exponentially distributed variable, which one views as . Then taking it's square root and choosing and generates a pair of random standard normal samples. The code uploaded with this module compares this method against the built-in nprnd.randn() function provided by Python for generation of Gaussian numbers.

Uses of random numbers

We use random numbers to perform computations that are fundamentally random, and we will focus on those in the upcoming weeks. However, there are computations that are fundamentally deterministic, but may be easier to perform with random numbers. Suppose we want to calculate an integral under a curve , with the limits . We can, of course, do it analytically for some functions or numerically by subdividing the interval of into small segments and adding up the areas of all small rectangles, . An alternative way is as follows. Suppose we know that is positive and smaller than . Then the area under the curve can be written down as , where is the fraction of the square by that is actually under the curve. How do we estimate this fraction? Suppose I generate random points (dart throws), uniformly distributed over the square by . The fraction of those that end up under the curve (and it's easy to check which one does!) is an estimate of . The provided script performs such calculation for an arbitrary . This is called Monte-Carlo integration. Note that our implementation is incomplete -- we don't check if the function ever is below 0 or ever is above , which a better implementation should do.

A problem, of course, is that such calculation is probabilistic -- every time we perform it, the results are different. But how different? We can verify, first of all, that our probabilistic estimate of the area converges to to true one, where true is known, e.g, when we integrate a function . We can also show the distribution of the estimates of the area when we perform the are estimate with dart throws. We see that the larger is, the more concentrated the estimates are. To quantify this, we use the provided script to calculate the standard deviation of the estimates as a function of . We find that , where is some constant. The claim is that the general law holds true for essentially any function being integrated. It's only the constant that changes -- and we can verify this by integrating other functions.