Direct Method

Once the state vector X has been initialized, Gillespie's direct method proceeds by repeatedly stepping forward in time until a termination condition is reached [Gillespie 1977]. At each step, one generates two uniform random deviates in the interval (0..1). The first deviate, along with the sum of the propensities, is used to generate an exponential deviate which is the time to the first reaction to fire. The second deviate is used to determine which reaction will fire. Below is the algorithm for a single step.

  1. for m in [1..M]: Compute am from X.
  2. α = Σm = 1M am(X)
  3. Generate two unit, uniform random numbers r1 and r2.
  4. τ = -ln(r1) / α
  5. Set μ to the minimum index such that Σm = 1μ am > r2 α
  6. t = t + τ
  7. X = X + Vμ

Consider the computational complexity of the direct method. We assume that the reactions are loosely coupled and hence computing a propensity am is O(1). Thus the cost of computing the propensities is O(M). Determining μ requires iterating over the array of propensities and thus has cost O(M). With our loosely coupled assumption, updating the state has unit cost. Therefore the computational complexity of a step with the direct method is O(M).

To improve the computational complexity of the direct method, we first write it in a more generic way. A time step consists of the following:

  1. τ = exponentialDeviate() / α
  2. μ = discreteFiniteDeviate()
  3. t = t + τ
  4. X = X + Vμ
  5. Update the discrete deviate generator.
  6. Update the sum of the propensities α.

There are several ways of improving the performance of the direct method:

The original formulation of the direct method uses the inversion method to generate an exponential deviate. This is easy to program, but is computationally expensive due to the evaluation of the logarithm. There are a couple of recent algorithms (ziggurat and acceptance complement) that have much better performance.

There are many algorithms for generating discrete deviates. The static case (fixed probability mass function) is well studied. The simplest approach is CDF inversion with a linear search. One can implement this with a build-up or chop-down search on the PMF. The method is easy to code and does not require storing the CDF. However, it has linear complexity in the number of events, so it is quite slow. A better approach is CDF inversion with a binary search. For this method, one needs to store the CDF. The binary search results in logarithmic computational complexity. A better approach still is Walker's algorithm, which has constant complexity. Walker's algorithm is a binning approach in which each bin represents either one or two events.

Generating discrete deviates with a dynamically changing PMF is significantly trickier than in the static case. CDF inversion with a linear search adapts well to the dynamic case; it does not have any auxiliary data structures. The faster methods have significant preprocessing costs. In the dynamic case these costs are incurred in updating the PMF. The binary search and Walker's algorithm both have linear preprocessing costs. Thus all three considered algorithms have the same complexity for the combined task of generating a deviate and modifying the PMF. There are algorithms that can both efficiently generate deviates and modify the PMF. In fact, there is a method that has constant complexity. See the documentation of the source code for details.

The original formulation of the direct method uses CDF inversion with a linear search. Subsequent versions have stored the PMF in sorted order or used CDF inversion with a binary search. These modifications have yielded better performance, but have not changed the worst-case computational complexity of the algorithm. Using a more sophisticated discrete deviate generator will improve the performance of the direct method, particularly for large problems.

For representing reactions and the state change vectors, one can use either dense or sparse arrays. Using dense arrays is more efficient for small or tightly coupled problems. Otherwise sparse arrays will yield better performance. Consider loosely coupled problems. For small problems one can expect modest performance benefits (10 %) in using dense arrays. For more than about 30 species, it is better to use sparse arrays.

For loosely coupled problems, it is better to continuously update the sum of the propensities α instead of recomputing it at each time step. Note that this requires some care. One must account for round-off error and periodically recompute the sum.

The following options are available with the direct method. Inversion with a 2-D search is the default; it is an efficient method for most problems. If performance is important (i.e. if it will take a long time to generate the desired number of trajectories) it may be worthwhile to try each method with a small number of trajectories and then select the best method for the problem.