Markov Chains vs Simulation: Flipping a Million Little Coins


I saw an interesting question on Reddit the other day. The problem was about estimating the amount of decaying radioactive isotopes in a sample after a set amount of time. I don’t think anyone in the thread brought up Markov chains, but that’s what I immediately thought of. Most people used simulations or solved differential equations, but I thought it sounded like a perfect example to illustrate some simple examples of how to do simulations and tackle the very simple basics of Markov chains.

The Problem

This problem comes through the /r/dailyprogrammer page on Reddit.

The exact problem can be found here. I’m leaving out the parsing input portion because it’s not what I want to talk about.

It states that you are given a large amount of some radioactive isotope, Compound A. It decays into a different isotope, Compound B. That, in turn, decays into a different compound, and so on. You are tasked with finding the proportion of each compound present in the sample after some number of seconds.

What You’re Given

p <- c(0.00007, # P(A -> B)
       0.0005,  # P(B -> C)
       0.00013, # P(C -> D)
       0.00022, # P(D -> S)
       0)       # S doesn't decay

atoms <- c(1000, # 1000 A
               0,   # 0 B
               0,   # 0 C
               0,   # 0 D
               0)   # 0 S

s <- 5000

# Set the seed for simulations

The simple approach - Simulation

The first and simplest approach is to simply simulate each atom at each time point.

We’ll start with an atom the decays 50% of the time. To see if it decays we can consider it to be a Bernoulli random variable (either 0 or 1).

# Does it decay?
rbinom(1, 1, 0.5) == 1
## [1] FALSE

Nope. So now we know if one atom decays or not. Let’s test them all.

rbinom(1, atoms[1], 0.5)
## [1] 512

Great, now we know 512 atoms decay from A to B in the first one second interval. If we repeat this again for the next interval we will have to simulate the first and second atom types.

Let’s assume we start with 56 A and 44 B atoms. We move one second forward by simulating these.

second.interval <- c(56, 44, 0, 0, 0)
a.decay <- rbinom(1, second.interval[1], 0.5)
b.decay <- rbinom(1, second.interval[2], 0.5)

(third.interval <- second.interval - c(a.decay, b.decay-a.decay, -b.decay, 0, 0))
## [1] 30 45 25  0  0

Now we have atoms of type C!

We can put all of this together to create a simple function that calculates this for our example.

sim.decay <- function(atoms, s, p) {
  n.atoms = length(atoms)

  # Create a matrix to store the results after each interval
  results <- matrix(0, nrow = s+1,
                       ncol = n.atoms)

  # The first interval is the starting point
  results[1,] <- atoms

  # It runs for 's' intervals after the first one
  for (i in 2:(s+1)) {
    # Determine the number of atoms that decay of each type
    decayed <- rbinom(rep(1, n.atoms), # 1 trial each
                      results[i-1,],      # of the number of current atoms
                      p)                   # each with a probability of decaying

    # We subtract the net change in each atom.
    # e.g. ome B turn to C but some A turn to B
    results[i,] <- results[i-1,] - (decayed - c(0, decayed[1:(n.atoms-1)]))


Now we can simulate the original trial.

original.sim <- sim.decay(atoms, s, p)

Great, that gives us a rough idea of what we should expect! It’s not exact, but it’s good start. Before we dive into another way of doing this, here’s a visualization of the process:

## Loading required package: methods
plot.decay.history <- function(data) {
  df <- data.frame(data, interval=1:nrow(data))
  names(df) <- c('A', 'B', 'C', 'D', 'S', 'interval')
  df <- melt(df, id.vars=c('interval'))
  p <- ggplot(data=df, aes(x=interval, y=value))
  p <- p + geom_area(aes(fill=variable))
  p <- p + labs(x='Seconds elapsed', y='Number of atoms')
  p <- p + guides(fill=guide_legend(title=NULL))
  p <- p + scale_y_continuous(expand=c(0,0))
  p <- p + scale_x_continuous(expand=c(0,0))



# And another example
second.sim <- sim.decay(c(900, 200, 1000, 50, 0),    # atoms
                            500,                         # intervals
                            c(0.01, 0.03, 0.09, 0.02, 0)) # P(decay)



The simpler approach - Markov Chains

A brief diversion

Markov chains are really interesting, and very useful, mathematical ideas. Formally, they are a sequence of random variables that depend only on the current state, but that means nothing to a lot of people. You’ve probably seen them in the form of random walks.

The easiest way I can explain them is with an analogy to baseball. Imagine a batter is up to the plate with two outs, and no one on base. There are only a few things that can happend next. He can strike out and end the inning. He can hit a single, a double, a triple, and be on base. Finally, he can hit a home run and increase his teams score by one. These make of a set of discrete results for this state. Each one has a different probability of happening, hence why we there are things like batting averages, slugging percentages, and other fancier statistics. These are the states a Markov chain can be in. Each at-bat is one of the random variables in the sequence.

Each at-bat is also independent of the others. The probability that Barry Bonds hits a home run doesn’t depend on whether or not the player before him struck out. All of the information that we need to know to calculate the probability of each result for this state is contained in this state (the player at bat, the number of players on base, and the pitcher). What happened previously doesn’t affect what’s happening now; this is what we call memoryless. That means every baseball game is essentially a Markov chain played out by people instead of in some mathmetician’s head.

Back to business

I hope you can see how this applies to our radioactive decay problem. Our state is just the number of atoms of each type we have. We don’t care if it decayed 10 mintues ago or 10 years ago. We only care what it is now. Each atom has it’s own little at-bat in our model. It’s batting average is the probability that it decays in this interval.

Luckily, many mathemeticians have tackled this problem time and time again. All we need to do is perform a little linear algebra.

Each state can be represented by its number of atoms:

The probability of transitioning to a decayed state for each atom can be represented by something called a transition matrix. This is its most general form:

Where is the probability A decays to B and so on.

To find the proportion of each atom after a given number of intervals, just multiply by that many times. It’s that simple. So, to find just find .

To do this we need to calculate the transition matrix.

trans.mat <- function(p) {
  n.p <- length(p)

  T <- matrix(0, nrow=n.p, ncol=n.p)

  for (i in 1:(n.p-1))
    T[i,] <- c(rep(0, i-1), # Pad with i-1 0's for the offset
               1-p[i], p[i], # P(stay) = 1 - P(decay)
               rep(0, n.p - (i+1)))

  T[n.p,] <- c(rep(0, n.p-1), 1)


Now we can this to find the decay from a Markov chain.

markov.decay <- function(atoms, s, p) {
  n.atoms <- length(atoms)

  T <- trans.mat(p)
  M <- matrix(0, ncol=n.atoms,

  M[1,] <- atoms

  for (i in 2:(s+1))
    M[i,] <- M[i-1,] %*% T


Let’s try out our previous examples using our new method and compare the results.

original.markov <- markov.decay(atoms, s, p)


second.markov <- markov.decay(c(900, 200, 1000, 50, 0),
                              c(0.01, 0.03, 0.09, 0.02, 0))


## [1] 708 105 147  29  11
## [1] 704.68 101.36 151.00  31.96  11.01
## [1]   10    3    1    6 2130
## [1]    5.913    2.957    1.109    9.923 2130.098

That’s pretty close.

I hope you enjoyed this, and that you learned something from this brief introduction to Markov chains and simulations.

If you have any questions, comments, or anything really, please comment below or send me an e-mail.