Physics 212, 2019: Lecture 22
Back to the main Teaching page.
Back to Physics 212, 2019: Computational Modeling.
Contents
Agent-based vs. Cellular Automata Simulations
The same systems (usually stochastic, but often also deterministic) can sometimes be simulated in multiple different ways. Consider, for example, one of the projects that we have available for this module: the annihilation reaction, where two diffusing individuals of the same species annihilate each other when they meet. We can model this system by choosing random positions of individuals on the lattice. We would then run a loop over time steps, and in each of the time steps we would loop over the individuals. For each individual, we would first generate a step in a random direction, and then we would check if the individual is at the same point on the lattice as some other. If the latter is true, both individuals would then be removed from the future consideration.
This is called an agent-based simulation. To specify such a simulation, one needs to specify the number of agents, their state (the coordinates on the lattice in our case), and the rules, by which the agents evolve with time. The agents are the primary units, the degrees of freedom of the system. The behavioral rules can include the rules for motion, the rules for creation, destruction, for interaction with other agents, and so on. The state of each agent may be much more complex than just the position, and it can include such attributes as reactivity, age, or whether the agent is mobile or not (which is relevant to our second project, growth of the DLA cluster). The rules and states do not necessarily have to put the agents on the lattice; instead, they can move through the real space. The rules can be deterministic or probabilistic, or a mixture of both. All of these combinations are possible. Such agent-based simulations are used routinely in many different fields, and are a staple of modern science. We can track motion and interactions of individual molecules, including their creation and destruction in chemical reactions. We can track position of individual animals or humans in behavioral or sociological experiments.
An alternative way of simulating the same annihilation system is to take not the individual, but rather the space itself (always represented as a lattice on a computer) as the main variable. The lattice consists of many cells, and each cell has a certain number of individuals in it. The simulations proceed by, again, looping over time. However, now in each time step, we loop over the lattice sites, the cells, and not the individuals themselves. Each cell has rules for its change based on its own state, as well as the state of the surrounding cells, with which it interacts. In our annihilation problem, the rules could be the following. First if there are more than two individuals in a cells, then the number of them is decreased by two (two are annihilated). And then a random number of individuals from each cell (distributed according to some probability law) is removed from a cell and added to one of the neighboring cells at random. It's pretty clear that one can match the rules, by which we model the cells, to the rules, by which agents are modeled, so that the results of simulations are the same.
This type of simulations is called cellular automaton simulation (that is, there is an automaton that obeys a certain rules in every cell). To specify such simulation, we need to specific the number and the geometric arrangement of the cells, as well as the rules by which the cells evolve in time. As in agent-based approaches, the rules can be deterministic or probabilistic, and the state of the lattice can be either very simple (e.g., occupied or not) or very complex (how many individuals and of which type are in a cell). Cells may be real physical cells, or they can be discretized bins of the continuous space, for example exchanging molecules. These type of approaches are also used routinely in applications. For example, we can predict forest fires by viewing a forest as consisting of lattice sites that have states such as on fire, combustible, or barren, and with rules that define how fires emerge and how they spread to new combustible cells. Spread of epidemics can also be models by cellular automata. And even diffusion of molecules through space can be viewed as cells with rules for exchanging molecules.
Which method to use?
If both methods can be used often to achieve the same results for the same systems, which one should be used? Both methods can be made equally accurate, and both are similarly easy (or hard) to code. So often the choice is left to the scientist, and either choice is reasonable. However, sometimes there is a big distinction: the execution time. And here common sense rules. In both methods, the main loop is over time. But then the first loops over the agents, and the second over the cells. It is clear that the number of operations per each time step that would need to be performed in each cycle of an agent-based simulation would scale as , the number of agents. In contrast, for cellular automata, this internal cycle would scale with the number of cells, in a -dimensional system. If the agents and the cells have similarly complex rules, than the best type of simulations is determined by whether is larger than or not -- the smaller of the two should be chosen. For example, if we have only a handful of individuals diffusing on a big lattice and trying to interact with each other, then agent-based approaches are the way to go. If, on the other hand, each cell, on average, has individuals, then we can move many of them from one cell to another in just a few computer commands, rather than a few commands per individual -- and cellular automata simulations will be faster in this case.
In my experience reviewing papers for scientific journals, the choice of the representation between cellular automata and agents is probably one of the most common and the most costly mistakes an computational scientist can (and do) make. Choose wrong, and your code will spend idle cycles moving millions of individuals one at a time, or analyzing cells where nothing at all happens.
Tracking probabilities: Special cellular automata
Let's consider the following toy problem. Suppose particles of a certain type (a protein in a cell, or a molecule in a more simple chemical reaction, or a nucleus of a certain element, or even people with some attribute, such as infected by a disease) are produced at a certain constant rate randomly. They disappear constantly similarly randomly, at some fixed rate as well. This is known as the birth-death process. How do we model the number of individuals in the system, or the probabilities of this number? If we assume that molecules get created/diluted independently one at a time, then the deterministic equation describing the mean number of particles in the system is . How do we generalize this to account for the randomness? There are many ways, and we will consider a few.
Stochastic simulations approach
We can start by realizing that this is a stochastic system, and so every time that we simulate it, the result would be different. The system state is specified by a single number , the number of particles, which changes with time. How does it change? There are two reaction events that can happen -- the creation and the destruction of a particle. The creation happens with the propensity of per unit time, and the destruction with the propensity . The total propensity for any reaction to happen is . If we are dealing with real chemical reactions in a well-mixed container, then it has been shown that the distribution of times between two successive reaction events is exponential, with the mean equal to the inverse of the total propensity (for non chemical systems, the exponential distribution between the events is often a surprisingly good approximation too). This immediately suggests a way of simulating this system. We start at time with particles. We generate an exponentially distributed random number with the mean . This will correspond to the time the next reaction event will happen. We thus increment the time . But which of the two reactions will happen at that time? Clearly, the reaction with the higher propensity will have a higher chance to happen. So we generate a Bernoulli random number with the probability of 0 given by , and the probability of 1 given by . If the generated number is 0, it means that the creation event happened, and we change . If the number is 1, then a particle was destroyed, and so we change . We then continue by generating the next reaction event time and the next reaction event type, and so on.
This method of generating a single random trajectory of a state of the system with time is known as Gillespie or the stochastic simulation algorithm (SSA). One typically generates many equivalent random trajectories and then calculates averaged of these many realizations to find means or variances of the number of particles at any time. The code that accompanies this module has an SSA simulation of this simple birth-death system, showing diversity of the random trajectories that can be realized. The algorithm is easy to update for many different particle types -- the state of the system is then described by the set of numbers of particles of each species. It is also easy to account for many types of reactions: the total propensity is the sum of all propensities, and the identity of a reaction event is determined by drawing a multinomial random number, and not a Bernoulli one, with the probability proportional to the propensities.
Importantly for this lecture, SSA is a type of an agent-based simulation. We have a simple agent -- the system -- and the state of the agent, , is changed according to some probabilistic rules. It is easy to extend such algorithms to more complicated scenarios, where the state is given not just by the number, but also by the physical position of the individual particles.
Master equation
While this is a stochastic system, with different outcomes for every simulation run, which we can simulate stochastically, there's a very special cellular automaton that allows us to simulate the system deterministically. Specifically, let each cell represent a different value of , the number of particles in the system. The state of each cell will be denoted by , and it will represent the probability that the system at the time has exactly particles. What are the rules by which this cellular automaton evolves?
These rules are defined by what's called the master equation. The probability of being in the state changes with time. There are four ways, four rules, that can change it. First, starting in the state , a new particle may get created, resulting in particles. This has the propensity of , and leads to a decrease of . Second, starting in the state , a particle may be destroyed with the propensity , resulting in particles. This also decreases , since the system leaves this state. Since both of these processes can only happen if the system has particles, it means the total rate of negative change of is . Two other rules increase the probability . Starting with math particles, one can create one. This has a positive rate of change of the probability of having . Finally, starting with math particles, one can get destroyed. Since this ends in the state , this has a positive effect on .
Putting this all together, we get: , and for this changes to because particles cannot be destroyed there. This is a set of coupled differential equations that describes the temporal evolution of the set of probabilities . It is collectively known as the master equation (eve though these are many equations). If we discretize time, and use the usual Euler method of solving these equations, we will transform this system into a deterministic cellular automaton, which solved for probabilities of having particles at discrete time steps.
The script associated with this module solves the master equation for the evolution of probability in the birth-death system, and compares the solution to the individual trajectories generated by the SSA method. The two methods clearly give similar results -- most of the individual trajectories concentrate where the probability of is expected to be large.