Physics 434, 2014: Introduction

From Ilya Nemenman: Theoretical Biophysics @ Emory
Revision as of 11:28, 4 July 2018 by Ilya (talk | contribs) (1 revision imported)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Emory Logo

Back to the main Teaching page.

Back to Physics 434, 2014: Information Processing in Biology.

Welcome to the class! Let's have fun.

You should carefully read the course Syllabus for the class details. Briefly, the following are the key points.

This is a mathematical / computational course, akin to a typical upper division physics course. While hard, the course will quite doable for most of you, with sufficient effort. To help, we will schedule regular office hours and a weekly review session. You should be working in groups, but the eventual write-ups of homeworks and project reports should be done individually. The students in the class will consist of different groups, including physics majors, biology majors, and some graduate students. We will try to use the language of computer simulations to bring all of you together to the same page. None of you are probably experts in programming, but this is OK, as you will learn during the class. Those students registered for the 731 version of the class will have additional/harder homework assignments, to justify the difference in the course number. The course has a somewhat nontraditional grading policy, including homework problems, one in-class exam, and project work and reports. Please see details in the Syllabus. Homework assignments will be distributed on about Tuesdays, and they will be due by end of the day on the following Mondays. Starting mid-October, you will also be working on your projects, to be presented in class on Dev 4.

What is this class about?

This is a class about physics of living systems. At the first glance, using the words physics and life in one sentence is contradictory. Indeed, physics is about simple mathematical models of the underlying processes, about general understanding. It is about theories that find unanticipated connections among seemingly unconnected parts of the natural world, such as a ferromagnet and a water-to-vapor transition, or the Higgs boson and properties of engineered materials. In contrast, modern biology is traditionally represented as not accepting theories (beyond maybe the evolutionary theory of Darwin). Every detail matters. A cell surface receptor molecule will behave very differently depending on which cell it is on. A cell with the same genome will behave very differently, and may differentiate into different fates, depending on where in the organism it is. The same organism may respond quite differently to the same stimulus, sometimes choosing to completely ignore it. Why is it that we hope that there will exist physics in living systems?

All of you have taken sufficient number of biology classes. So if I ask you to name some important biological discoveries in the 20th century (e.g., some physiology/medicine Nobel prizes), you can probably name a few. You will probably name the Watson-Crick discovery of the structure of DNA as one of the most important of these biological discoveries, at least in the context of molecular/cellular biology. If I ask you to dig deeper into your memory, and nominate a few other discoveries, then very soon you will mention the Hodkin-Huxley model of action potential generation in neurons as maybe the greatest discovery ever made in neurobiology. And then very soon someone will nominate the Luria-Delbruck experiment, which has discovered that natural selection acts on pre-existing variation in the population, rather than causin new variation, as the most important Nobel in the context of evolutionary and population biology.

What do all three of these discovered have in common? They all were made by a pair of scientists, where the first was an experimental biologist, and the second was a theoretical physicist/applied mathematician. In fact, it seems that theory, and more specifically physics style theory, is a lot more common in biology than we are led to believe. One can naively say that it is because all biological systems are made of molecules, which are physical objects that obey physical laws. This is true, but it is only part of the story. As the three examples above show, physics exists in biology at every level, from molecules, to cells, organisms and populations. We will see a lot more examples of this as we dive deeper into the class.

What allows this to happen is that seemingly disparate biological phenomena are, in fact, much closer related to each other than one may naively hope for. The rules of chance ensure that an E. coli trying to figure out where food is based on activity of its surface receptors that bind various food source molecules needs to solve the same problem as a fly's brain needs to solve when it tries to estimate its own self-motion through the world based on the activity of its visual neurons, or the problem that we, as scientists, are solving when we are learning about the E. coli from our experiments. This is what this class is about -- we will identify some problems that biological systems need to solve in the course of their lives, understand their physical structure, and then see what are the limitations imposed by this physical structure, independent of the details of the organism that is trying to solve the problem. In other words, we will try to focus on things that truly matter, and to build broad theories of phenomena, rather than their narrow, focused models.

We cannot embrace the unembraceable, and so we will only be building these theories in one context, namely that of information processing, leaving aside such other important parts of biology and structural biology, development, or ecology. The main questions to be asked in the whole course (not necessarily in this order):

    • How well, in objective terms, do various biological systems transduce available sensory information?
    • What stays in the way of their performance?
    • Which strategies can be used by the systems to improve the quality of processing?

Before we leave this introduction, we still need to answer one more question: why numbers? That is, why such a focus on mathematics? Here the answer is very simple -- to see if we understand the world, we need to make a prediction, and then test it experimentally. Testing involves comparison of predictions to the observations. However, the only things we know how to compare are numbers. Think about it carefully: the only thing we can compare are numbers! Thus the language of any science, including biology, is mathematics. One can then ask if one wants to compare trends, which are basically binary numbers (e.g., up or down), or if one wants to predict, measure, and compare real-valued numbers like we normally do in physical sciences. A binary experiment gives us, at best, one bit of information (we will formalize this later in the class), and hence can rule out, at best, half of theories. Real-valued predictions have more discriminatory power, and are thus a lot more valuable. We will stick to them in this class.

Introducing the model systems

Theories rarely emerge directly on a piece of paper, out of nothing. They come from studies of specific experimental systems. In this class, we will introduce a series of such models systems. While many will appear for just a few lectures, some of the model systems will persist through most sections of the class. In the context of cell biology, our main hero will be the E. coli cell, and specifically its chemotactic behavior. For evolutionary questions, we will again focus on "E. coli", but this time on response of their populations to stresses, such as antibiotics and bacterial viruses. For neuroscience, we will focus on visual signal transduction. And finally, for behavior, we will explore foraging in rats.

Introduction to E. coli chemotaxis

So let's introduce our heroes, starting with the smallest, the E. coli

E. coli is a small cell about in linear dimensions. It's a natural measuring stick for cellular biology. Read more about the organism in the Sizing up E coli section in Physical Biology of the Cell. As every bacterium, E. coli has no internal organelles, such as nucleus, and it is basically a bag of molecules: DNA, RNA, proteins, and smaller metabolites, all packed rather densely and semi-regularly. E. coli spends at least a part of its life swimming around trying to find nutrients, then eating them, growing, and dividing (they can also grow as parts of biofilms or colonies, but we will not focus on this).

The swimming behavior of E. coli is rather intricate, as illustrated by now-classic movies from the Howard Berg lab, http://www.rowland.harvard.edu/labs/bacteria/movies/showmovie.php?mov=fluo_bundle. The bacterium is pushed forward by a bundle of flagella that rotate. Such smooth forward motion is called the run. Once in a while, the flagella bundle breaks apart, and the bacterium tumbles and reorients instead of going smoothly forward. We will spend a lot of time studying this system. In particular, you can look up the basic chemical diagram of the process at http://www.rowland.harvard.edu/labs/bacteria/projects/fret.php. However, now let's ask the question: why does E. coli do what it does? Why running and tumbling? Answering this question will be our first aha! moment -- we can understand a lot about biology by studying basic physical principles!

So the E. coli wants to go where life is greener. But how does it know where it's greener? To see where pizza is, you can look around, you can smell. The bacterium can do nothing of this. It can only count molecules of chemicals that it cares about, and go to where there are more of such molecules. Experiments tell us (Adler 1975, Budrene and Berg, 1991, 1995) that E. coli can find maxima of nutrient concentrations even when concentrations are as small as 1 nM. What does it mean in more reasonable units?

Thus . On the other hand, E. coli is roughly speaking a cylinder, of a diameter of about and the height of about . Its volume is then . Thus at 1 nM concentration, there are only about molecule of the nutrient in the entire E. coli volume. How can then the bacterium know where the grass is greener, when it only has total of 1 molecule to count? (This is a bit at the extreme of E. coli sensitivity, but it's important to be dramatic!)

The worst part about numbers so small is that they are also random. Molecules diffuse around (we will talk about Brownian motion a lot more later in the class). Thus if one has one molecule of food around, one can also have two, or zero, or three. Or even (with quite a bit lower probability) seven. Comparing numbers that are so small, and with fluctuations so large, is impossible. The only solution is to get more molecules. And this requires to capture them from a larger volume. There are, roughly speaking, three ways of doing so. One possibility is to stir the environment (and we have talked briefly about this in class), the second is to stay in place for a long time and to wait till more molecules diffuse one's way, and the third possibility is to run and hence to scoop molecules from a larger volume. We will talk more about properties of diffusion later -- but, roughly speaking, waiting is not very efficient -- it will take very long time for new molecules to arrive in the vicinity of the cell by diffusion. This leaves stirring and running, and E. coli chooses to run.

Grad Students (later denoted as GS ): Let's calculate how long one would need to wait to get a certain accuracy in determining the concentration around oneself. The accuracy of the concentration estimation is proportional to the accuracy in counting molecules, so that . But molecules come and go randomly, forcing Poisson statistics of arrivals, so that . At the same time, the average number of molecules in a volume of the bacterium is , where the volume is, roughly speaking, the cube of its linear dimension, . Combining these, we get . If the E. coli can wait for a long time then it can do observations, where is the duration of a single observation. This results in the usual square root decrease in the variance, so that .

Now, what is ? If we measured molecules once, they are going to stick around, and remeasuring them again soon is useless -- we are not making an independent measurement! One needs to wait while the old molecules diffuse away, and the new ones diffuse in. For a cell with the linear size of , it will take about for this to happen, where is the diffusion coefficient. Combining the two equations, results in . This is the celebrated Berg-Purcell limit in concentration sensing. Note a peculiar scaling, which is the results of the two-dimensional nature of diffusion (we will discuss what this means in later classes): the accuracy increases as a square root of the linear size of the cell. It also increases as a square root of time. Thus waiting in one spot, or growing bigger, is not a very effective strategy for improving concentration session. End GS.

By running, a bacterium is able to sweep molecules in a larger volume, roughly proportional to its cross section, times the length of the run (which is, in its turn, a product of the velocity and the run time), so that the number of captured molecules is . For the run to make sense, it must be long enough to not only produce , but also to ensure that more molecules are picked up by running, than simply by waiting, so that, using the result from the previous paragraph, , where is the linear size of the volume from which the bacterium can pick up nutrients by simple diffusion. This sets the smallest time of a second or so that the bacterium must run, if it runs with the velocity of about 20 microns/sec. It needs to run long enough to outrun diffusion! On the other hand, the bacterium cannot run straight for long (for longer than about 10s), because random hits by water molecules will make it change its direction by this time. Indeed, the bacterium in real life runs for about ~5 seconds before turning.

In conclusion, E. coli lives in a weird world: it must count individual molecules of chemicals to know where to go, and these numbers are small (it also lives in the world where masses are not important, as we will show in a homework problem). Its behavioral strategy is fully dictated by the physical structure of the world it leaves in: small size that limits total number of molecules in the cell volume, diffusion that takes molecules to and away from the cell and prevents the bacterium from keeping moving in a straight line, and the need to go to greener pastures.

E. coli life is probabilistic, weird, but quite understandable!

Introduction to visual neural computation

We've just convinced ourselves that, for E. coli, the world is probabilistic, and every single molecule of a signal matters. But the bacterium is small, and we are big. It seems that the randomness should be a lot less important for us. Indeed, some times it is less important -- if I want to go to an office door, I can mostly make it without random jitter, at least on large scales (though a lot of work is being done nowadays on studying randomness of individual motor responses). But in the sensory domain, it turns out that small numbers, and the associated randomness, often matter even for animals as big as us. So let's consider this in the context of neural computation in vision.

For the purpose of this class, neurons will be rather simple devices (see Dayan and Abbott, 2005). They collect electrical currents produced by neurons that are connected to them through synaptic connections. They discharge those collected charges (and hence lower their voltage) through the membrane, just like an RC circuit you studied in intro physics does. And when the voltage finally goes above a certain threshold in this tug of war between synaptic inputs and membrane discharge, the neurons spike -- they produce an impulse, an action potential, that travels through the axon of the neuron and feeds into the other neurons through their synapses. In vertebrate retina, the first set of cells, the photoreceptors (the rods and the cones) actually don't spike, but we will disregard this complication for now, till we study these cells in much detail in the third quarter of the class.

Because the number of input neurons is large, and they all fire at random times, the firing pattern of a single neuron is not deterministic either. Spikes occur at random time points, and a typical neuron may produce anywhere from ~100-200 spikes per second (for a visual neuron in an insect), to maybe 20-60 spikes in cells in the primary visual cortex of a monkey, to barely a spike a second or even less for neurons in other areas of our cortex. Some of the most interesting questions in today's neuroscience are about how important this randomness is for representing the information about the outside world, and we will study some of this later in the class (Rieke et al, 1999).

For now, just notice that we often can make decisions after being exposed to a visual image for ~200ms, and insects, whose vision is faster than ours, can make decisions in ~20ms or so. Let's take 100ms as a typical decision time for our current arguments. Thus a cell in our retina, a pixel that is measuring the brightness at some point of the visual world, even if it fires at a rate of 100 spikes per second, will only produce about before the cell in the next level of our brain must make a decision about the level of brightness that is being seen. Our neural cells have about as many spikes to count as E. coli have molecules to count before choosing where to go! Yes, we have many neurons, and we should be able to use their collective activity to guide our decision processes, but this doesn't change the fact that our neural computation is fundamentally probabilistic, with small number of spikes. And the design of the brain must somehow make this all work.

In fact, the problem of randomness in our eyes starts even earlier, all the way at photon capture. As one of the homework problems will show, even in bright light, a single photoreceptor in our eyes collects not much more than ~1000 photons during the typical reaction time of ~100 ms. When we move into a dim environment of a badly lit room, or the world half an hour after the sunset, our photoreceptors may be capturing ~1 photon per photoreceptor, and we can detect dim light flashes as small as ~10 or fewer photons falling on our entire retina (Bialek, 2013). So, even the neural computation aside, sensing in even large organisms as us, is also fundamentally probabilistic.

Main point to carry out

In both of these examples, and in the others that we will discuss later, a major complication standing in front of biological organisms is chance. A molecule of a nutrient may be there or not, photon may arrive or not, and spikes may be there or not. Biological signal processing is not deterministic -- randomness is important, and it must be dealt with, or leveraged, but it cannot be ignored. Hence randomness will be the thread that will connect all of the components of this course, and we will spend the first half of the course introducing mathematical and computational tools to study randomness in biology, and building up the necessary intuition.