JEP 103: Parallel Array Sorting
Summary
Add additional utility methods to java.util.Arrays
that use the JSR 166 Fork/Join parallelism common pool to provide sorting of arrays in parallel.
Non-Goals
There are many algorithms for parallel array sorting with different trade-offs for time and space. The goal here is to provide a generally useful utility operation, not to provide a framework of different algorithms from which the programmer can select.
Success Metrics
A minimum speedup of at least 1.3 over sequential sort on a two core system, for a suitably sized data set.
Motivation
Java 7 added the Fork/Join framework for lightweight data parallelism, but users must presently implement their own algorithms for simple/common tasks. This proposal addresses one common task by providing parallel sorting of arrays. By converting to/from arrays this can also be used to sort arbitrary collections (for those that have a defined traversal order).
Description
Current sorting implementations provided by the Java Collections Framework (Collections.sort and Arrays.sort) all perform the sorting operation sequentially in the calling thread. This enhancement will offer the same set of sorting operations currently provided by the Arrays class, but with a parallel implementation that utilizes the Fork/Join framework. These new API's are still synchronous with regard to the calling thread as it will not proceed past the sorting operation until the parallel sort is complete.
The actual sorting API this proposal adds will leverage the ForkJoinPool commonPool (default Fork/Join pool being defined by JEP 155).
public class Arrays {
...
public static void parallelSort(byte[] a) { ... }
public static void parallelSort(byte[] a, int fromIndex, int toIndex)
public static void parallelSort(short[] a) { ... }
public static void parallelSort(short[] a, int fromIndex, int toIndex)
{...}
...
}
Sort methods are defined for all primitive array types except boolean, plus Comparable object types, plus arbitrary Object types using a supplied Comparator. The sorting algorithm is that used in Doug Lea's ParallelArray implementation and requires a working space the same size as the array to be sorted (this is the whole array, not just the portion to be sorted).
Open issues:
-
Will some users require the ability to specify which pool to use?
-
Will users want a choice of algorithms to allow space/time trade-offs?
Alternatives
No general alternatives have been considered. The parallel sorting implementation comes from Doug Lea's ParallelArray framework. Some variations in the API's, particularly in regard to which pool to use, have been considered, but are presently deemed more complex than needed.
Testing
-
Includes unit tests adapted from the existing tests for Arrays.sort
-
Simple benchmarking tests showing speedup over serial sort also included.
-
Requires larger (8+ core) system for performance regression testing.
Risks and Assumptions
It is assumed that the choice of the Fork/Join system-wide common pool, and the tying of this API to that pool will not be contentious (or at least not to the point where it prevents the progress of this proposal). The ability to expand the API to allow further control over the pool could be added at a later stage.
It is also assumed that the simple algorithm choice will be sufficient for the general use case.
Impact
-
Compatibility: Forward compatible only
-
Security: Offers additional DoS attacks base upon shared resource (system fork/join pool)
-
Documentation: Javadoc only
-
Internationalization: Unchanged from existing serial sorting.