When creating an ArrayList you have the option to specify the initial capacity. Most of the time however you don’t really know how big the list should be. If you would know the exact size of the list then it is often better to go for a regular array as this is much more memory efficient. For this reason I often just don’t specify an initial capacity, even though I know that this will lead to costly copying of the array when the list becomes too small. But what do you have to do when you don’t know the size? I guess you can always estimate it, and make some kind of guess. To get a better understanding of how the initial capacity affects throughput, I did some tests with it.
An ArrayList is used when you need to have a list, but you don’t know the size up front. Therefor an ArrayList will grow when it reaches out of space, this however involves a memcopy. The size of an array is always grown by 50% of the current size. This means that specifying the exact size does not only prevent an expensive memcopy, it also prevents allocating memory that isn’t used.
Since we are explicitly using an ArrayList, I assume we don’t care too much about memory usage and are aiming more for simplicity. To see how bad it is not to specify an initial capacity I ran a test where I filled different arrays with elements. I let the amount of elements in the array double each time, and for each size I took the average duration over 1000 iterations. The different arrays where using different initial capacity:
- No initial capacity specified
- Initial capacity = # elements
- Initial capacity = # elements + 1
- Initial capacity = # elements – 1
The results of this tests are shown in the figures below.
The second image clearly shows that underestimating the size is expensive, especially in this case (size – 1), as we have to do the most expensive copy just to fit in the last element. You also clearly see that where the array with size and size + 1 are about linear, the one without size is clearly not. This is due to the fact that the size of the array is only increased by 50% each time. Meaning that if you double the elements to add, you actually have to grow the list more than one time extra.
The first image however shows that if your array remains relatively small (< 100,000) you won’t gain much in absolute numbers. It might even be neglect-able up until you have a very large array (> 500,000), but much will depend on what your application is all about and how fast you need to be able to respond.
An obvious result is that overestimating your array does not increase, nor decrease the performance of the array. This can be used to our advantage since we know that the ArrayList also tends to be too large if it has to grown on it’s own. So you are able to make an overestimation of how large the array will be, you can use this to prevent expensive memcopies.