Basic Usage of the split Method
In Java, you can split a string using the split
method.
The following example demonstrates how to split a string by commas (,) and receive the result as an array.
String target = "apple,melon,banana,grape";
String[] array = target.split(",");
for (String element : array) {
System.out.println(element);
}
// ["apple", "melon", "banana", "grape"]
The value returned by the split
method is of type array. However, an array has limited functionality and is not always convenient to work with. In many cases, you may prefer to receive the split data as a list rather than an array.
To convert a string into a list, you can pass the array returned by the split
method as an argument to the Arrays.asList
method.
import java.util.Arrays;
import java.util.List;
// ...Omitted...
String target = "apple,melon,banana,grape";
List<String> list = Arrays.asList(target.split(","));
System.out.println(list);
// ["apple", "melon", "banana", "grape"]
Cautions for the split Method
While the split
method may seem straightforward, not fully understanding its specifications can lead to unexpected pitfalls.
In the following sections, I will explain some key points to be aware of when using the split
method.
Delimiters are treated as regular expressions
The delimiters passed as arguments to the split
method are treated as regular expressions internally. The internal workings of the split
method are shown below for reference.
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {
public String[] split(String regex, int limit) {
// ...Omitted...
return Pattern.compile(regex).split(this, limit);
}
}
Therefore, be cautious when using symbols as delimiters.
For example, if you try to split a string using a dot (.) without realizing that it is treated as a regular expression, the string will not be split as expected.
String target = "192.168.1.0";
String[] result = target.split(".");
// []
In the above example, the data obtained from splitting the string results in an empty array.
This is because, in regular expressions, a dot (.) represents “any single character,” so every character in the string is treated as a delimiter.
If you want to split a string using a dot (.), you will need to escape the dot with \
.
String target = "192.168.1.0";
String[] result = target.split("\\.");
// ["192", "168", "1", "0"]
By leveraging the fact that the split
method processes regular expressions, you can implement the following logic.
String target = "apple,melon, banana, grape";
String[] result = target.split(",\\s*");
// ["apple", "melon", "banana", "grape"]
The \s
is a regular expression that matches spaces, and the *
means that the preceding character can occur zero or more times.
In other words, this splits the text by “comma + space (if present).”
With this splitting method, each element of the resulting array will not contain spaces, regardless of whether there is a space after the comma.
By default, all trailing empty elements in the array are removed
If the second argument of the split
method is not specified, or if 0 is specified, the trailing empty elements are removed from the resulting array.
String target = "apple,melon,,banana,,";
// When the second argument is not specified
String[] result = target.split(",");
// ["apple", "melon", "", "banana"]
For example, if the above string is split using a comma as the delimiter, the two elements after “banana” will be removed because they are empty strings.
This behavior requires special attention when implementing processes like reading row data from a CSV file that assumes a fixed length.
You can avoid this issue by specifying a non-zero value for the second argument of the split
method.
If a negative value is specified for the second argument, empty elements will not be removed, as shown below.
String target = "apple,melon,,banana,,";
String[] resultMinus1 = target.split(",", -1);
// ["apple", "melon", "", "banana", "", ""]
If a positive value is specified as the second argument of the split
method, it will limit the number of splits to that value.
If the array has a fixed length, it’s best to specify that value.
However, be aware that the last element will contain all remaining characters that were not split.
String target = "apple,melon,,banana,,";
String[] result4 = target.split(",", 4);
// ["apple", "melon", "", "banana,,"]
String[] result5 = target.split(",", 5);
// ["apple", "melon", "", "banana", ","]
String[] result6 = target.split(",", 6);
// ["apple", "melon", "", "banana", "", ""]
String[] result7 = target.split(",", 7);
// ["apple", "melon", "", "banana", "", ""]
In my opinion, when using the split
method to split a string, it’s generally better to specify a negative value (-1) for the second argument. There aren’t many situations where you would want to remove empty string elements.
Splitting an empty string does not return an empty array
String empty = "";
String[] array = empty.split(",");
// [""]
When you split an empty string, the result is not an empty array, but an array with a single element: an empty string at index 0.
You might expect that passing an empty string to the split
method would return an empty array, but that’s not the case.
By the way, this behavior is not unique to Java; the split
methods in JavaScript and Python behave the same way.
Splitting null throws an exception
The split
method cannot be used on null
; an exception will be thrown if you try to use it on a null
value.
String target = null;
String[] array = target.split(",");
// NullPointerException
Considering a Generic Method for Splitting
I have created a general-purpose method for splitting strings based on the above information.
I hope it will be useful to you when implementing something.
This method has the following features:
- The return value is not an array, but a more convenient list type.
- If the string to be split is an empty string, an empty list is returned.
- Empty elements are not deleted in the splitting process. (A negative value is set to the second argument of split by default.)
import java.util.Arrays;
import java.util.List;
public final class CollectionsUtil {
private CollectionsUtil() {}
public static List<String> toList(String target, String delimiter) {
return toList(target, delimiter, -1);
}
public static List<String> toList(String target, String delimiter, int limit) {
if (target.isEmpty()) {
return Arrays.asList();
}
return Arrays.asList(target.split(delimiter, limit));
}
}
Result of executing the above method
List<String> list1 = CollectionsUtil.toList("apple,melon,banana,grape", ",");
// ["apple", "melon", "banana", "grape"]
List<String> list2 = CollectionsUtil.toList("apple,melon,,banana,,", ",");
// ["apple", "melon", "", "banana", "", ""]
List<String> list3 = CollectionsUtil.toList("apple,melon,,banana,,", ",", 4);
// ["apple", "melon", "", "banana,,"]
List<String> list4 = CollectionsUtil.toList("", ",");
// []
References
Notes
If you are using Google Guava, one of the most popular third-party libraries, you can opt to use Splitter
from Google Guava instead of the split
method provided by the Java standard library.
However, Splitter
differs significantly from the split
method, as shown below, so it’s important to read the documentation to understand how to use it properly.
(Which is easier to learn: mastering the split
method or mastering Splitter
…?)
import com.google.common.base.Splitter;
// ...Omitted...
String target = "apple,melon, ,banana,,";
List<String> list = Splitter.on(',')
.trimResults()
.omitEmptyStrings()
.splitToList(target);
// ["apple", "melon", "banana"]
Apache Commons also provides functionality for splitting strings.
However, be aware that the split
method in Apache Commons behaves differently from the one provided by the Java standard library.
import org.apache.commons.lang3.StringUtils;
// ...Omitted...
String target = "apple,melon,,banana,,";
// split method in Java standard library
List<String> list1 = Arrays.asList(target.split(","));
// ["apple", "melon", "", "banana"]
// split method in Apache Commons
List<String> list2 = Arrays.asList(StringUtils.split(target, ","));
// ["apple", "melon", "banana"]
Comments