JEP 107: Bulk Data Operations for Collections
Summary
Add functionality to the Java Collections Framework for bulk operations upon data. This is commonly referenced as "filter/map/reduce for Java." The bulk data operations include both serial (on the calling thread) and parallel (using many threads) versions of the operations. Operations upon data are generally expressed as lambda functions.
Goals
Provide new features for bulk data processing utilizing lambda functions including parallel operations
Non-Goals
Convert existing usages to parallel operation.
Motivation
FlumeJava as used by Google internally and PLinq offered by Microsoft are the most directly similar offerings. Linq and Plinq in particular are seen as extremely valuable by .NET developers and a subject of much envy by Java developers.
The primary benefits are for developers currently building single threaded business process applications. To be able to take advantage of concurrency with minimal changes to their application is expected to be a huge benefit.
Description
The serial implementation provides a bridge from existing collections bulk-data operations to parallel operation that does not change the threading model of the application.
The parallel implementation is the central element of this feature. The parallel operation provides the opportunity to accelerate operations upon large amounts of data by dividing the task between multiple threads (processors). The parallel implementation builds upon the java.util.concurrency Fork/Join implementation introduced in Java 7.
For both the serial and parallel implementations an "eager" mode and a "lazy" mode are possible. In eager mode operations upon data are performed directly upon the data at the time the operation function is invoked. In lazy mode the operations upon the data are deferred until the final result is requested. Lazy mode operation allows the implementation more optimization opportunities based upon reorganization of the data and operations to be performed.
Testing
Benchmarking and performance regression testing is going to be critical to delivering a quality final product.
This work will require significant hardware resources to fully test. ie. dedicated 8+ core systems for all of the primary supported platforms.
Dependences
- Lambda language changes
- Core libraries changes described in JEP 109
- Involvement from JSR 335 EG and JSR 166 EG and Doug Lea in particular
Impact
- Compatibility: Forward compatibility only
- Security: Standard
- Performance/scalability: Significant testing and benchmarking required
- User experience: None
- I18n/L10n: None
- Portability: 100% java implementation. No native code planned.
- Packaging/installation: Delivered as part of JRE install
- Documentation: Standard
- TCK: No special requirements. New TCK tests will be required.
- Internationalization: Same as JCF
- Localization: None