Growth Estimation

The general problem is how to estimate the number of things existing (starting from a certain number of them) after a certain period of time if those things replicate themselves in a certain way at fixed time intervals.

As a special case of this general problem we shall consider this question: Suppose a particular CD may be obtained from some single source, and that some (but not all) of the recipients of the CD make copies and send them to others, some (not all) of whom also make copies and send them to others, and so on. We wish to know, given certain initial parameter values, how many people have received CDs after a certain number of weeks have elapsed.

This problem can be reinterpreted in other ways, e.g., as the spread of infection. One person is initially infected, and infects others, some (but not all) of whom infect others, and so on; after a certain time period how many people are infected? Or someone invents a new way of doing something, and tells others, some (but not all) of whom tell others; so how long does it take (or what is the longest period of time it will take) for everyone to know of the new thing? Or someone starts a chain letter (by email) with instructions to send it on, and some (but not all) do, then how many people have received it after a certain amount of time?

Let our unit of time be the week, and suppose that the original source sends out s copies of a CD per week. Call a person who receives a CD a level-i recipient if the copy they receive has passed through i hands. E.g., a person receiving a copy from a person who received a copy from the original source is a level-2 recipient.

Not all recipients will send out further copies (or, in the other interpretations, not all infected people will infect others; not everyone who learns of the new idea will tell others). Suppose that the proportion of level-i recipients who make copies and send them to others is p (where 0 < p <= 1). Suppose further that a recipient who sends out copies sends out exaclty n copies each and every week (beginning with the week following the week in which he receives his copy).

The original source sends out s copies per week. Of the s level-1 recipients who receive these, s.p of them send out a further n copies per week, so these s.p level-1 recipients together produce s.p.n new level-2 recipients per week. Each week the number of level-1 recipients increases by a further s, since the original source continues to send out s copies per week.

So how many recipients of the CD will there be after m weeks? Denote this number by Nm. We can try calculating this manually for the first few weeks, as follows.

For ease of calculation we make the assumption that a recipient receives a copy of the CD from only one prior recipient, so that, e.g., no level-2 recipient receives copies from two different level-1 recipients. Under this assumption the total number of recipients at the end of week m is the sum of the numbers of recipients at each level at that time. This assumption is not unrealistic when recipients are mostly independent of each other and the number of possible recipients is large (as in this case, where everyone in the world is a possible recipient and each recipient can send a copy of the CD to anyone else in the world). It is less realistic when the recipients are related and their number is small (e.g., infection within a population of animals in a game reserve).

At the end of the 1st week, there are 1 + s recipients (counting the original source as the sole level-0 recipient), so N1 = 1 + s.

At the end of the 2nd week, there are three levels of recipient, level-0, level-1 and level-2:

so N2 = 1 + 2.s + s.p.n.

Now things start to get complicated. At the end of the 3rd week there are four levels of recipient, level-0 through level-3:

so N3 = 1 + 3.s + 3.s.p.n + s.(p.n)2.

If we do the same calculation for the next week we find that

N4 = 1 + 4.s + 6.s.p.n + 4.s.(p.n)2 + s.(p.n)3

Is a pattern emerging?

Further calculation shows that

N5 = 1 + 5.s + 10.s.p.n + 10.s.(p.n)2 + 5.s.(p.n)3 + s.(p.n)4

Aha! The coefficients are the numbers in Pascal's Triangle, in which each number (other than the 1's) is the sum of the two numbers above it:

                  1                    
                1   1                  
              1   2   1                
            1   3   3   1              
          1   4   6   4   1            
        1   5   10   10   5   1          
      1   6   15   20   15   6   1        
    1   7   21   35   35   21   7   1      
  1   8   28   56   70   56   28   8   1    
1   9   36   84   126   126   84   36   9   1  

The number at position r in row n (counting rows and positions starting with 0) of Pascal's Triangle is nCr = n!/(r!.(n-r)!), where for any number k >= 1, k! (pronounced "k-shriek") is the product of k, k-1, ..., 2, 1. For example, the number at position 4 of row 6 is 6!/(4!.2!) = 6.5/2 = 15, and the number at position 3 of row 9 is 9!/(3!6!) = 9.8.7/3.2.1 = 84.

Thus, assuming that the pattern discovered above continues for weeks beyond 5, we have that

Nm = 1 + mC1.s + mC2.s.p.n + mC3.s.(p.n)2 + ... + mCj.s.(p.n)j-1 + ... + s.(p.n)m-1

Or more succinctly, Nm = 1 + Sj=1m [mCj.s.(p.n)j-1]

But can we be sure that this formula holds for m > 5? To be sure of this we must prove it.

Let Lm,j denote the number of level-j recipients at the end of week m, where j >= 0 and m >= 0. There is only one level-0 recipient, namely, the source of the CDs, and the end of week 0 is the start of week 1.

We first need to prove the following proposition:

For all m >= 0 and for all j such that 0 <= j <= m, Lm,j = 1 if j = 0 and Lm,j = mCj.s.(p.n)j-1 otherwise.

The proof is by induction. For m = 0 only one j is possible, namely 0, and since the number of level-0 recipients at the end of week 0 is 1 (the source of the CDs), the proposition is true for m = 0.

Suppose that the proposition is true for m = k, then we shall show that it is true for m = k+1. Let 0 <= j <= k+1. The number of level-j recipients at the end of week k+1 is the number at the end of week k (the old level-j recipients) plus the new level-j recipients, namely, those who receive copies of the CD either from the source (if they are level-1 recipients) or from those level-(j-1) recipients who send out copies during week k+1. Thus

Lk+1,j = 1 if j = 0,
Lk+1,j = Lk,1 + s if j = 1 and
Lk+1,j = Lk,j + Lk,j-1.(p.n) otherwise

Clearly the proposition is true for m = k+1 and j = 0. The second line implies that Lk+1,1 = (k+1).s = k+1C1.s.(p.n)0 so the proposition is true for m = k+1 and j = 1.

Suppose j > 1. From the third line, by the inductive hypothesis, we have that Lk+1,j = kCj.s.(p.n)j-1 + kCj-1.s.(p.n)j-2.(p.n)

so Lk+1,j = s.(p.n)j-1(kCj + kCj-1). It can easily be shown that kCj + kCj-1 = k+1Cj, so Lk+1,j = k+1Cj.s.(p.n)j-1, so the proposition is true for m = k+1 and j >1, so it is true for m = k+1 and all j such that 0 <= j <= k+1. This completes the proof by induction.

The final step in the proof of the formula for Nm is simply to note that Nm = 1 + Sj=1m Lm,j, so Nm = 1 + Sj=1m [mCj.s.(p.n)j-1], which is what was to be proved, or in more archaic language, quod erat demonstrandum.

Since s is independent of j we may take it out of the terms of the sum to obtain Nm = 1 + s.Sj=1m [mCj.(p.n)j-1].

So how about some actual numbers? Go to the next page to use the online calculator which is based on the theory above.

Mathematical Software
Index Home Page