When the maximum number of entries to be stored in an array is either
unknown or when the variation between the average and the maximum number
of entries which must be stored, it is often useful to resort to dynamically
resizing arrays:
- Start with an appropriately capacity array,
- Increase the array capacity when the array is full, and
- Decrease the array capacity when the array is relatively empty.
Initial Appropriate Capacity
The initial appropriate capacity of an array depends on the situation. If
statistics are available as to the expected number of entries in the array,
then the average number plus one or two standard deviations may be appropriate.
This will depend on the requirements which will dictate which is more critical:
time or memory.
Increasing the Capacity of the Array
Increasing the capacity of the array by a constant results in an amortized Θ(n)
copies being made for each insertion. Increasing the array capacity by
an arithmetically increasing sequence of entries
(e.g., 1, 2, 3, 4, ...) results in an amortized
Θ(√n) copies being made for each insertion.
When the capacity of the array is increased geometrically, i.e., by a multiplicative
factor r > 1, the number of copies is an amortized Θ(1).
The consequence, however, of increasing the array capacity by a multiplicative
factor is an increase in memory usage:
- When increasing the array capacity by a constant, there is a relative
amortized Θ(1/n) additional memory being allocated which is not
being used;
- When increasing the array capacity arithmetically, there is a relative
amortized Θ(1/√n) additional memory being allocated which is not
being used; but
- When increasing the array capacity geometrically, the relative additional
memory will be Θ(1).
The last point indicates that at any one time, a certain percentage of the capacity of
the array will be unfilled.
Increasing the Array Capacity Geometrically
When increasing the capacity of an array geometrically by a multiplicative factor of r,
the number of copies required is 1/(r − 1) while average number of
empty entries in the array is ln(r)r/(r − 1) − 1.
As the ratio r ↓ 1, the number of copies per insertion goes to infinity while the proportion
of empty entries decreases linearly according to ½(r − 1). (All these
results may be found by using Maple.)
The easiest and most convenient factor to use is r = 2 or doubling the size of the
array. This results in only one copy (amortized) per insertion.
Unfortunately, this also results in times when the array will be only half full and
on average, the proportion of empty entries is 39%. If the initial capacity is a power of two, then
all subsequent capacities will be a power of two, as well.
A recommended multiplicative factor is r = 1.5, on the other hand, results in two copies
per insertion but only an average of 22% empty entries.
One negative consequence of using r = 1.5 is that the array capacity will almost never
be a power of two. There are many circumstances where a power of two will allow simplified
operations (e.g., modulus may be calculated using a single bit-wise and operation).
Consequently, I will suggest two factors which bracket r = 1.5: 21/2 ≈ 1.4142
and 22/3 ≈ 1.6818. If the initial array capacity is a power of
two, then every second and third array capacity, respectively, will also be a power of two.
With r = 21/2, the sequence of array capacities starting with eight
entries is 8, 11, 16, 23, 32, 45, 64, ...; while with r = 22/3, the
sequence of array capacities is 8, 13, 20, 32, 51, 81, 128, ... .
These are summarized in Table 1.
Ratio r | Copies per Insertion | Average Percent of Empty Entries |
21/2 ≈ 1.4142 | 21/2 + 1 ≈ 2.4142 | 18.33% |
1.5 | 2 | 21.64% |
22/3 ≈ 1.6818 | 1.7024 | 28.24% |
2 | 1 | 38.63% |
Demonstration
To demonstrate this, the source code resizing.array.h was compiled with the test t0.cpp and timed for each of the ratios. The average of five
test runs for each ratio were run yielding the plot in Figure 1. The overhead of the for loop (run without the function call).
The red line is using a figure of r = ∞, that is, the memory for the full array was initially allocated.
Figure 1. Fitting the curves.
When a best-fitting least-squares curve of the form a + b/(r − 1) was fit to
this data, it produced 0.91250 + 0.65306/(r − 1) with a very tight match. What is critical
in this plot is the actual time: the time for the copies is less than twice the time for the overhead of both the for-loop and
function calls. Consequently, I would recommend using r = √2. The easiest to accurately calculate the blocks is
to pre-calculate the array sizes and store them:
int second[56] = {
8, 11, 16, 23, 32, 45, 64, 91, 128, 181, 256, 362, 512, 724,
1024, 1448, 2048, 2896, 4096, 5793, 8192, 11585, 16384, 23170,
32768, 46341, 65536, 92682, 131072, 185364, 262144, 370728,
524288, 741455, 1048576, 1482910, 2097152, 2965821, 4194304,
5931642, 8388608, 11863283, 16777216, 23726566, 33554432,
47453133, 67108864, 94906266, 134217728, 189812531, 268435456,
379625062, 536870912, 759250125, 1073741824, 1518500250
};
int third[42] = {
8, 13, 20, 32, 51, 81, 128, 203, 323, 512, 813, 1290, 2048,
3251, 5161, 8192, 13004, 20643, 32768, 52016, 82570, 131072,
208064, 330281, 524288, 832255, 1321123, 2097152, 3329021,
5284492, 8388608, 13316085, 21137968, 33554432, 53264341,
84551870, 134217728, 213057363, 338207482, 536870912,
852229450, 1352829926
};
Alternatively, a fast Θ(1) algorithm is given in the source
code which uses bit shifting to calculate floor( pow( sqrt(2), n ) ).
Prime Intermediate Values
In some cases, those array sizes which are not powers of two may be better to be prime
numbers. The following two tables find the nearest prime number for any non power-of-two
entry:
int second_prime[56] = {
8, 11, 16, 23, 32, 47, 64, 89, 128, 181, 256, 359, 512, 727,
1024, 1447, 2048, 2897, 4096, 5791, 8192, 11587, 16384, 23173,
32768, 46337, 65536, 92683, 131072, 185363, 262144, 370723,
524288, 741457, 1048576, 1482907, 2097152, 2965819, 4194304,
5931641, 8388608, 11863279, 16777216, 23726569, 33554432,
47453149, 67108864, 94906249, 134217728, 189812533, 268435456,
379625047, 536870912, 759250133, 1073741824, 1518500279
};
int third_prime[42] = {
8, 13, 19, 32, 53, 83, 128, 199, 317, 512, 811, 1291, 2048,
3251, 5167, 8192, 13003, 20641, 32768, 52021, 82571, 131072,
208067, 330287, 524288, 832253, 1321109, 2097152, 3329023,
5284493, 8388608, 13316089, 21137959, 33554432, 53264339,
84551867, 134217728, 213057371, 338207473, 536870912,
852229451, 1352829937
};
Decreasing the Capacity of the Array
Decreasing the size of an array is potentially beneficial after the array size
has been increased to deal with, for example, a peak load. As with the case of
increasing the capacity of the array, the array should be decreased geometrically,
as well. The only significant issue is when: if the array is full with N entries
and another entry is inserted resulting in an increase in the array capacity, do not
immediately reduce the array size if size drops back down to N. This is equivalent
to reducing the array capacity when the array is 1/r full. In this case,
assuming the array is initially full, then a sequence of alternating insertions and
removals would result in Θ(n) time per insertion. It is
better, instead, to only decrease the size of the array when the array is 1/r2
full and then only reduce the capacity by a factor of 1/r or one step.
To demonstrate, if r = 2, halve the capacity of the array when the array is
one quarter full; and if r = √2, decrease the capacity of the array by
√2 when the array is half full.
An Alternative to Storing Array Sizes
Instead of storing explicit arrays, it is also possible to take appropriate
divisions by powers of two of either s = 1518500250 (s/227 = 11, s/226 = 22 (≈ 23), etc.);
or s1 = 852229450 and s2 = 1352829926
(s1/226 = 12 (≈ 13), s2/226 = 20,
s1/224 = 50 (≈ 51), s2/224 = 80 (≈ 81), etc.);
>>for division by powers of 2.
Using the Cube Root of 2
Using the cube root of 2 (≈ 1.2699), the number of copies per insertion jumps to 3.8473 but the
average ration of empty entries drops to 12.00% and every third entry is a power of two. The array
capacities are listed here:
{
8, 10, 13, 16, 20, 25, 32, 40, 51, 64, 81, 102, 128, 161, 203, 256, 323, 406,
512, 645, 813, 1024, 1290, 1625, 2048, 2580, 3251, 4096, 5161, 6502,
8192, 10321, 13004, 16384, 20643, 26008, 32768, 41285, 52016,
65536, 82570, 104032, 131072, 165140, 208064, 262144, 330281, 416128,
524288, 660561, 832255, 1048576, 1321123, 1664511, 2097152, 2642246, 3329021,
4194304, 5284492, 6658043, 8388608, 10568984, 13316085, 16777216, 21137968, 26632170,
33554432, 42275935, 53264341, 67108864, 84551870, 106528681,
134217728, 169103741, 213057363, 268435456, 338207482, 426114725,
536870912, 676414963, 852229450, 1073741824, 1352829926, 1704458901
};
Alternatively, appropriate divisions by powers of two of the two values
1352829926 and 1704458901 will produce these values.
More Optimal Pointer-based Array Resizing
An implementation using pointer arrithmetic is provided for the interest of the reader.