Using ArrayList Capacity to Improve Performance

This article can be read in about 24 minutes.

Introduction

When creating a list in Java, you often see instances created like this: new ArrayList<>().

List<String> list = new ArrayList<>();

However, you can also specify a numerical value inside the parentheses, like this: new ArrayList<>(500).

List<String> list = new ArrayList<>(500);

The number 500 in this example is called the initialCapacity argument. By using initialCapacity, you can improve the performance of list operations.
In this article, we’ll explain what initialCapacity is and how to use it effectively.

Summary

This article has gotten a bit lengthy, so I’ll start with the conclusions. Detailed explanations will follow in the subsequent sections.

  • What is initialCapacity?
    • An ArrayList instance is internally based on an array.
    • By specifying initialCapacity, you can create an ArrayList instance that internally uses an array of the specified size.
    • Specifying initialCapacity does not affect operations for retrieving the number of elements, such as the size() method.
  • How to Improve the Performance of ArrayList
    • If you can reasonably predict the number of elements to be stored, such as between 10,000 and 50,000, it’s recommended to specify initialCapacity. This helps avoid the overhead associated with adding elements.
    • The value you set for initialCapacity should align with your priorities. For example, if you expect to store between 10,000 and 50,000 elements, setting the maximum value of 50,000 may lead to some memory waste, but it will eliminate the overhead.
    • The trimToSize() method can be used to reduce the size, but it’s not always advisable due to the potential overhead involved.

What is initialCapacity?

Let’s take a look at how initialCapacity works by examining the internal processing of an ArrayList. The internal processing of the constructor when initialCapacity is specified is as follows:

public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess, Cloneable, java.io.Serializable
{
    /**
     * Shared empty array instance used for empty instances.
     */
    private static final Object[] EMPTY_ELEMENTDATA = {};

    /**
     * The array buffer into which the elements of the ArrayList are stored.
     * The capacity of the ArrayList is the length of this array buffer. Any
     * empty ArrayList with elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA
     * will be expanded to DEFAULT_CAPACITY when the first element is added.
     */
    transient Object[] elementData; // non-private to simplify nested class access

    // ...Omitted...

    /**
     * Constructs an empty list with the specified initial capacity.
     *
     * @param  initialCapacity  the initial capacity of the list
     * @throws IllegalArgumentException if the specified initial capacity
     *         is negative
     */
    public ArrayList(int initialCapacity) {
        if (initialCapacity > 0) {
            this.elementData = new Object[initialCapacity];
        } else if (initialCapacity == 0) {
            this.elementData = EMPTY_ELEMENTDATA;
        } else {
            throw new IllegalArgumentException("Illegal Capacity: "+
                                                initialCapacity);
        }
    }

    // ...Omitted...
}

When initialCapacity is set to 1 or greater, an array with the specified size (number of elements) is created. This array is then assigned to the member variable elementData. If initialCapacity is set to 0, an empty array is assigned to elementData.
When you add or retrieve elements in the list, they are stored in or retrieved from this elementData array. In other words, internally, an ArrayList is a data structure that uses an array.
Thus, writing new ArrayList<>(500) creates an ArrayList instance backed by an array with a size of 500.

Next, let’s look at what happens when you don’t specify initialCapacity. The internal processing is as follows:

public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess, Cloneable, java.io.Serializable
{
    /**
     * Shared empty array instance used for default sized empty instances. We
     * distinguish this from EMPTY_ELEMENTDATA to know how much to inflate when
     * first element is added.
     */
    private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};

    // ...Omitted...

    /**
     * Constructs an empty list with an initial capacity of ten.
     */
    public ArrayList() {
        this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
    }

    // ...Omitted...
}

Similar to the case where initialCapacity is 0, an empty array is assigned to elementData.
This means that writing new ArrayList<>() creates an ArrayList instance with an array that has 0 elements.

You might be concerned that setting initialCapacity could affect operations that retrieve the number of elements (such as calling the size() method), but there is no issue here.
The size() and isEmpty() methods do not simply return the length of the array; instead, they use the value of the member variable size. This size variable is incremented or decremented as elements are added or removed, so the number of elements is managed by this variable internally in ArrayList.

public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess, Cloneable, java.io.Serializable
{
    /**
     * The size of the ArrayList (the number of elements it contains).
     *
     * @serial
     */
    private int size;

    // ...Omitted...

    /**
     * This helper method split out from add(E) to keep method
     * bytecode size under 35 (the -XX:MaxInlineSize default value),
     * which helps when add(E) is called in a C1-compiled loop.
     */
    private void add(E e, Object[] elementData, int s) {
        if (s == elementData.length)
            elementData = grow();
        elementData[s] = e;
        size = s + 1;
    }

    /**
     * Appends the specified element to the end of this list.
     *
     * @param e element to be appended to this list
     * @return {@code true} (as specified by {@link Collection#add})
     */
    public boolean add(E e) {
        modCount++;
        add(e, elementData, size);
        return true;
    }

    /**
     * Returns the number of elements in this list.
     *
     * @return the number of elements in this list
     */
    public int size() {
        return size;
    }

    // ...Omitted...
}

You can see that when the public add(e) method is executed, the value of the size variable is incremented within the private add method.

Therefore, even if initialCapacity is set to 500 and only 200 elements are actually stored, the size() method will correctly return the number of elements as 200.

Behavior of ArrayList When Elements Are Added

An ArrayList instance created by new ArrayList<>() has an empty array, but adding elements multiple times does not raise an exception.
This is because ArrayList internally processes the following: “When an element is added and there is not enough free space, a new, larger array is created, and the elements from the old array are copied into it.”

Specifically, when you call add(e) on an ArrayList instance created by new ArrayList<>(), an array with a size of 10 is initially created inside the ArrayList instance.
If add(e) is performed 10 more times, an array with a size of 15 is created. (Each time the maximum number of elements is exceeded, a new array that is 1.5 times larger is created.)
In this way, ArrayList has a mechanism that automatically expands the size of its array.

public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess, Cloneable, java.io.Serializable
{
    /**
     * Increases the capacity to ensure that it can hold at least the
     * number of elements specified by the minimum capacity argument.
     *
     * @param minCapacity the desired minimum capacity
     * @throws OutOfMemoryError if minCapacity is less than zero
     */
    private Object[] grow(int minCapacity) {
        return elementData = Arrays.copyOf(elementData,
                                           newCapacity(minCapacity));
    }

    private Object[] grow() {
        return grow(size + 1);
    }

    /**
     * This helper method split out from add(E) to keep method
     * bytecode size under 35 (the -XX:MaxInlineSize default value),
     * which helps when add(E) is called in a C1-compiled loop.
     */
    private void add(E e, Object[] elementData, int s) {
        if (s == elementData.length)
            elementData = grow();
        elementData[s] = e;
        size = s + 1;
    }

    /**
     * Appends the specified element to the end of this list.
     *
     * @param e element to be appended to this list
     * @return {@code true} (as specified by {@link Collection#add})
     */
    public boolean add(E e) {
        modCount++;
        add(e, elementData, size);
        return true;
    }

    // ...Omitted...
}

The above code is for Java 8. In other versions of Java, the internal processing may differ, but the commonality is that `Arrays.copyOf()` is called within the `grow()` method. Within the `grow()` method, you can see that the process involves creating a new, larger array and copying the element data from the old array.

Performance of ArrayList When Elements Are Added

The issue is that when elements are added repeatedly, there is an overhead due to memory reallocation. This cost increases in proportion to the size of the list.

For example, if you perform add(e) 500 times on an ArrayList instance created with new ArrayList<>(), the size of the internal array will grow incrementally: 10, 15, 22, 33, 49… until it finally reaches a size of 549. During this process, the array will have been expanded 11 times, each time incurring overhead. This overhead reduces processing speed.
On the other hand, if you specify new ArrayList<>(500), an array of size 500 is created initially, and no array expansion process is needed.

Comparing the execution speed with and without initialCapacity, the time it took to perform 5,000,000 add(e) operations in my environment was 258ms without initialCapacity and 144ms when initialCapacity was specified.
Specifying initialCapacity results in faster execution speed because it eliminates the overhead of resizing the array.

How to Improve the Performance of ArrayList

Specify initialCapacity

If you can anticipate the number of elements to be stored, it is recommended to specify an initialCapacity. This helps avoid the overhead associated with adding elements.
As shown in the previous example, if you perform add(e) 500 times on an ArrayList instance created with new ArrayList<>(), the array expansion process (overhead) will occur 11 times internally. However, if you use new ArrayList<>(500), even after 500 add(e) operations, the array expansion process will not be triggered. This results in faster execution speed.
Additionally, this approach prevents unnecessary memory usage. When you perform add(e) 500 times on an ArrayList created with new ArrayList<>(), you end up with an array of size 549, but only 500 elements are stored, leaving 49 unused slots.

As in the previous example, there are few cases where the number of elements can be clearly estimated. However, if you can predict the number of elements within a certain range, it is better to specify initialCapacity.
For example, if you know that the number of elements will be between 10,000 and 50,000, you may set initialCapacity to 10,000. Instead of using new ArrayList<>(), it’s better to use new ArrayList<>(10000), as this reduces overhead by avoiding the expansion process until the array exceeds 10,000 elements.
Alternatively, you could set initialCapacity to 50,000. While this would create 29,000 empty slots if only 21,000 elements are stored, the benefit is that no overhead is incurred because the array expansion process is avoided.

Use trimToSize with caution

ArrayList has a trimToSize() method that trims the size of the internal array (elementData) to match the number of elements actually stored. However, it’s not always advisable to use the trimToSize() method.

In the previous example, with 29,000 unused slots, it might seem like a good idea to call trimToSize() and reduce the array size to 21,000. While this does help reduce memory waste, it introduces overhead because the trimToSize() method creates a new array and copies the data from the old array. If processing speed is your priority, it’s better to avoid using trimToSize().

Additionally, there’s a drawback: adding elements after calling trimToSize() can negatively impact performance.
Since trimToSize() sets the array size to the current number of elements, any subsequent additions will cause overhead due to the need for array expansion. To avoid this problem, it’s best not to add elements after calling trimToSize(). However, depending on future modifications or feature updates, this might not always be feasible. If there’s a chance that elements may be added after trimToSize() is called, it’s wise to refrain from using the method lightly.

Conclusion

Summary is provided at the top of the article.

I wrote this article because I couldn’t find many resources with detailed information on the initialCapacity of ArrayList.
In the process of researching for this article, I realized that initialCapacity deserves more attention. As mentioned in the summary, there are clear benefits to using initialCapacity when the range of the number of elements to be stored can be reasonably predicted. When used appropriately in such cases, it’s possible to implement efficient processes with minimal drawbacks.

Comments