Array Resizing

When the maximum number of entries to be stored in an array is either unknown or when the variation between the average and the maximum number of entries which must be stored, it is often useful to resort to dynamically resizing arrays:

  • Start with an appropriately capacity array,
  • Increase the array capacity when the array is full, and
  • Decrease the array capacity when the array is relatively empty.

Initial Appropriate Capacity

The initial appropriate capacity of an array depends on the situation. If statistics are available as to the expected number of entries in the array, then the average number plus one or two standard deviations may be appropriate. This will depend on the requirements which will dictate which is more critical: time or memory.

Increasing the Capacity of the Array

Increasing the capacity of the array by a constant results in an amortized Θ(n) copies being made for each insertion. Increasing the array capacity by an arithmetically increasing sequence of entries (e.g., 1, 2, 3, 4, ...) results in an amortized Θ(√n) copies being made for each insertion. When the capacity of the array is increased geometrically, i.e., by a multiplicative factor r > 1, the number of copies is an amortized Θ(1).

The consequence, however, of increasing the array capacity by a multiplicative factor is an increase in memory usage:

  • When increasing the array capacity by a constant, there is a relative amortized Θ(1/n) additional memory being allocated which is not being used;
  • When increasing the array capacity arithmetically, there is a relative amortized Θ(1/√n) additional memory being allocated which is not being used; but
  • When increasing the array capacity geometrically, the relative additional memory will be Θ(1).

The last point indicates that at any one time, a certain percentage of the capacity of the array will be unfilled.

Increasing the Array Capacity Geometrically

When increasing the capacity of an array geometrically by a multiplicative factor of r, the number of copies required is 1/(r − 1) while average number of empty entries in the array is ln(r)r/(r − 1) − 1. As the ratio r ↓ 1, the number of copies per insertion goes to infinity while the proportion of empty entries decreases linearly according to ½(r − 1). (All these results may be found by using Maple.)

The easiest and most convenient factor to use is r = 2 or doubling the size of the array. This results in only one copy (amortized) per insertion. Unfortunately, this also results in times when the array will be only half full and on average, the proportion of empty entries is 39%. If the initial capacity is a power of two, then all subsequent capacities will be a power of two, as well.

A recommended multiplicative factor is r = 1.5, on the other hand, results in two copies per insertion but only an average of 22% empty entries.

One negative consequence of using r = 1.5 is that the array capacity will almost never be a power of two. There are many circumstances where a power of two will allow simplified operations (e.g., modulus may be calculated using a single bit-wise and operation). Consequently, I will suggest two factors which bracket r = 1.5: 21/2 ≈ 1.4142 and 22/3 ≈ 1.6818. If the initial array capacity is a power of two, then every second and third array capacity, respectively, will also be a power of two.

With r = 21/2, the sequence of array capacities starting with eight entries is 8, 11, 16, 23, 32, 45, 64, ...; while with r = 22/3, the sequence of array capacities is 8, 13, 20, 32, 51, 81, 128, ... . These are summarized in Table 1.

Ratio rCopies per InsertionAverage Percent
of
Empty Entries
21/2 ≈ 1.414221/2 + 1 ≈ 2.414218.33%
1.5221.64%
22/3 ≈ 1.68181.702428.24%
2138.63%

Demonstration

To demonstrate this, the source code resizing.array.h was compiled with the test t0.cpp and timed for each of the ratios. The average of five test runs for each ratio were run yielding the plot in Figure 1. The overhead of the for loop (run without the function call). The red line is using a figure of r = ∞, that is, the memory for the full array was initially allocated.


Figure 1. Fitting the curves.

When a best-fitting least-squares curve of the form a + b/(r − 1) was fit to this data, it produced 0.91250 + 0.65306/(r − 1) with a very tight match. What is critical in this plot is the actual time: the time for the copies is less than twice the time for the overhead of both the for-loop and function calls. Consequently, I would recommend using r = √2. The easiest to accurately calculate the blocks is to pre-calculate the array sizes and store them:

int second[56] = {
	8, 11, 16, 23, 32, 45, 64, 91, 128, 181, 256, 362, 512, 724,
	1024, 1448, 2048, 2896, 4096, 5793, 8192, 11585, 16384, 23170,
	32768, 46341, 65536, 92682, 131072, 185364, 262144, 370728,
	524288, 741455, 1048576, 1482910, 2097152, 2965821, 4194304,
	5931642, 8388608, 11863283, 16777216, 23726566, 33554432,
	47453133, 67108864, 94906266, 134217728, 189812531, 268435456,
	379625062, 536870912, 759250125, 1073741824, 1518500250
};

int third[42] = {
	8, 13, 20, 32, 51, 81, 128, 203, 323, 512, 813, 1290, 2048,
	3251, 5161, 8192, 13004, 20643, 32768, 52016, 82570, 131072,
	208064, 330281, 524288, 832255, 1321123, 2097152, 3329021,
	5284492, 8388608, 13316085, 21137968, 33554432, 53264341,
	84551870, 134217728, 213057363, 338207482, 536870912,
	852229450, 1352829926
};

Alternatively, a fast Θ(1) algorithm is given in the source code which uses bit shifting to calculate floor( pow( sqrt(2), n ) ).

Prime Intermediate Values

In some cases, those array sizes which are not powers of two may be better to be prime numbers. The following two tables find the nearest prime number for any non power-of-two entry:

int second_prime[56] = {
	8, 11, 16, 23, 32, 47, 64, 89, 128, 181, 256, 359, 512, 727,
	1024, 1447, 2048, 2897, 4096, 5791, 8192, 11587, 16384, 23173,
	32768, 46337, 65536, 92683, 131072, 185363, 262144, 370723,
	524288, 741457, 1048576, 1482907, 2097152, 2965819, 4194304,
	5931641, 8388608, 11863279, 16777216, 23726569, 33554432,
	47453149, 67108864, 94906249, 134217728, 189812533, 268435456,
	379625047, 536870912, 759250133, 1073741824, 1518500279
};

int third_prime[42] = {
	8, 13, 19, 32, 53, 83, 128, 199, 317, 512, 811, 1291, 2048,
	3251, 5167, 8192, 13003, 20641, 32768, 52021, 82571, 131072,
	208067, 330287, 524288, 832253, 1321109, 2097152, 3329023,
	5284493, 8388608, 13316089, 21137959, 33554432, 53264339,
	84551867, 134217728, 213057371, 338207473, 536870912,
	852229451, 1352829937
};

Decreasing the Capacity of the Array

Decreasing the size of an array is potentially beneficial after the array size has been increased to deal with, for example, a peak load. As with the case of increasing the capacity of the array, the array should be decreased geometrically, as well. The only significant issue is when: if the array is full with N entries and another entry is inserted resulting in an increase in the array capacity, do not immediately reduce the array size if size drops back down to N. This is equivalent to reducing the array capacity when the array is 1/r full. In this case, assuming the array is initially full, then a sequence of alternating insertions and removals would result in Θ(n) time per insertion. It is better, instead, to only decrease the size of the array when the array is 1/r2 full and then only reduce the capacity by a factor of 1/r or one step.

To demonstrate, if r = 2, halve the capacity of the array when the array is one quarter full; and if r = √2, decrease the capacity of the array by √2 when the array is half full.

An Alternative to Storing Array Sizes

Instead of storing explicit arrays, it is also possible to take appropriate divisions by powers of two of either s = 1518500250 (s/227 = 11, s/226 = 22 (≈ 23), etc.); or s1 = 852229450 and s2 = 1352829926 (s1/226 = 12 (≈ 13), s2/226 = 20, s1/224 = 50 (≈ 51), s2/224 = 80 (≈ 81), etc.); >>for division by powers of 2.

Using the Cube Root of 2

Using the cube root of 2 (≈ 1.2699), the number of copies per insertion jumps to 3.8473 but the average ration of empty entries drops to 12.00% and every third entry is a power of two. The array capacities are listed here:

{
	8, 10, 13, 16, 20, 25, 32, 40, 51, 64, 81, 102, 128, 161, 203, 256, 323, 406,
	512, 645, 813, 1024, 1290, 1625, 2048, 2580, 3251, 4096, 5161, 6502,
	8192, 10321, 13004, 16384, 20643, 26008, 32768, 41285, 52016,
	65536, 82570, 104032, 131072, 165140, 208064, 262144, 330281, 416128,
	524288, 660561, 832255, 1048576, 1321123, 1664511, 2097152, 2642246, 3329021,
	4194304, 5284492, 6658043, 8388608, 10568984, 13316085, 16777216, 21137968, 26632170,
	33554432, 42275935, 53264341, 67108864, 84551870, 106528681,
	134217728, 169103741, 213057363, 268435456, 338207482, 426114725,
	536870912, 676414963, 852229450, 1073741824, 1352829926, 1704458901
};

Alternatively, appropriate divisions by powers of two of the two values 1352829926 and 1704458901 will produce these values.

More Optimal Pointer-based Array Resizing

An implementation using pointer arrithmetic is provided for the interest of the reader.