JEP 280: Indify String Concatenation
Summary
Change the static String
-concatenation bytecode sequence generated by javac
to use invokedynamic
calls to JDK library functions. This will enable future optimizations of String
concatenation without requiring further changes to the bytecode emitted by javac
.
Goals
Lay the groundwork for building optimized String
concatenation handlers, implementable without the need to change the Java-to-bytecode compiler. Multiple translation strategies should be tried as the motivational examples for this work. Producing (and possibly switching to) the optimized translation strategies is a stretch goal for this JEP.
Non-Goals
It is not a goal to: Introduce any new String
and/or StringBuilder
APIs that might help to build better translation strategies, revisit the JIT compiler's support for optimizing String
concatenation, support advanced String
interpolation use cases, or explore other changes to the Java programming language.
Success Metrics
-
A stable, future-proof bytecode sequence is generated by
javac
. The intent is to change the bytecode shape at most once in any major Java release. -
String
concatenation performance does not regress. -
Startup time and time to performance do not regress beyond reasonable levels.
Motivation
Currently javac
translates String
concatenation into StringBuilder::append
chains. Sometimes this translation is not optimal, and sometimes we need to presize the StringBuilder
appropriately. The -XX:+OptimizeStringConcat
option enables aggressive optimizations in the JIT compiler, which recognize StringBuilder
append chains and do the presizing and in-place copying. Those optimizations, while fruitful, are fragile and difficult to extend and maintain; see, e.g., JDK-8043677, JDK-8076758, and JDK-8136469. It is tempting to change the translation in javac
itself; see, e.g., the recent proposal on the compiler-dev list.
When we consider changing anything in javac
, it seems inconvenient to change the compiler and the bytecode shape every time we want to improve performance. Users generally expect the same bytecode to run faster on newer JVMs. Asking users to recompile their Java programs for performance is not friendly, and it also explodes the testing matrix, since JVMs should recognize all the variants of the generated bytecode. Therefore, we might want to employ some trick to declare the intent of concatenating strings in bytecode, and then expand that intent in runtime.
In other words, we are introducing something similar to a new bytecode, "string concat". As in the Lambda work, this closes the gap between the JLS allowing a language feature (string concatentation, in this case) and the JVMS not having the appropriate facilities for it, forcing javac
to translate it into low-level code, and further forcing VM implementations to recognize all low-level translated forms. invokedynamic
allows us to overcome this impedance mismatch with one leap by providing a guaranteed interface to string concatenation at the level of the core libraries. The actual implementation of the interface can even be implemented using (potentially unsafe) private JVM APIs for concatenation, such as fixed-length string builders, tied to a particular use case. Making use of these APIs directly in javac
would mean exposing them to the public.
Description
We will use the power of invokedynamic
: It offers the facilities for a lazy linkage, by providing the means to bootstrap the call target once, during the initial invocation. This approach is nothing new, and we borrow extensively from current code that translates lambda expressions.
The idea is to replace the entire StringBuilder
append dance with a simple invokedynamic
call to java.lang.invoke.StringConcatFactory
, that will accept the values in the need of concatenation. For example,
String m(String a, int b) {
return a + "(" + b + ")";
}
is currently compiled to:
java.lang.String m(java.lang.String, int);
0: new #2 // class java/lang/StringBuilder
3: dup
4: invokespecial #3 // Method java/lang/StringBuilder."<init>":()V
7: aload_1
8: invokevirtual #4 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
11: ldc #5 // String (
13: invokevirtual #4 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
16: iload_2
17: invokevirtual #6 // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
20: ldc #7 // String )
22: invokevirtual #4 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
25: invokevirtual #8 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
28: areturn
But even with the naive indy translation, available in the proposed implementation via -XDstringConcat=indy
, it can be significantly simpler:
java.lang.String m(java.lang.String, int);
0: aload_1
1: ldc #2 // String (
3: iload_2
4: ldc #3 // String )
6: invokedynamic #4, 0 // InvokeDynamic #0:makeConcat:(Ljava/lang/String;Ljava/lang/String;ILjava/lang/String;)Ljava/lang/String;
11: areturn
BootstrapMethods:
0: #19 invokestatic java/lang/invoke/StringConcatFactory.makeConcat:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
Notice how we pass an int
argument without boxing. At runtime, the bootstrap method (BSM) runs and links in the actual code doing the concatenation. It rewrites the invokedynamic
call with an appropriate invokestatic
call. This loads the constant string from the constant pool, but we can leverage the BSM static args to pass these and other constants straight to the BSM call. This is what a proposed -XDstringConcat=indyWithConstants
flavor does:
java.lang.String m(java.lang.String, int);
0: aload_1
1: iload_2
2: invokedynamic #2, 0 // InvokeDynamic #0:makeConcat:(Ljava/lang/String;I)Ljava/lang/String;
7: areturn
BootstrapMethods:
0: #15 invokestatic java/lang/invoke/StringConcatFactory.makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
Method arguments:
#16 \u0001(\u0001)
Notice how we only pass the dynamic arguments ("a"
and "b"
) to the BSM. The static constants would be already handled during linkage. The BSM method is provided with a recipe that tells in what order to concatenate the dynamic and static arguments, and what the static arguments are. This strategy also treats nulls, primitives, and empty strings separately.
It is an open question which bytecode flavor should be the default. More details on bytecode shapes, concat flavors, and performance data can be found in the experimental notes. A proposed bootstrap method API can be found here. A complete implementation can be found in the sandbox branch:
$ hg clone http://hg.openjdk.java.net/jdk9/sandbox sandbox
$ cd sandbox/
$ sh ./common/bin/hgforest.sh up -r JDK-8085796-indyConcat
$ sh ./configure
$ make images
One can see the difference between the baseline and patched runtime by using:
$ hg diff -r default:JDK-8085796-indyConcat
$ cd langtools/
$ hg diff -r default:JDK-8085796-indyConcat
$ cd jdk/
$ hg diff -r default:JDK-8085796-indyConcat
The benchmarks can be found here: http://cr.openjdk.java.net/~shade/8085796/
The proposed implementation successfully builds the JDK, runs the regression tests (including new ones that test String
concat), and passes the smoke tests on all platforms. Multiple strategies can be used for the actual concatenation. Our proposed implementation shows there are no throughput hits when we move the bytecode sequence generated by javac
into the BSM that injects the same bytecode, which validates the approach. Optimized strategies are shown to perform as well or better than the baseline, especially when default StringBuilder
length is not enough and/or the VM's optimizations fail.
See the experimental notes for more data.
Alternatives
There are three alternatives to this proposal.
First, obvious alternative: Change the bytecode sequence in javac
itself, by transplanting one of the bytecode-generation strategies from the proposed implementation. While simple in the short-term, it breaks away from the unspoken contract with the Java community at large: JVMs are smart enough to recognize and optimize the existing bytecode. We cannot easily coerce users to recompile their Java programs every time we want to tune the String
-concatenation translation. Every time a bytecode shape changes the JIT compilers are forced to recognize a new form, and that explodes the test matrix considerably, as well as sets up unsuspecting users for strange performance (mis)behaviors. If we are to change the bytecode shape for performance reasons anyway, we better make it once in a major release, and in a future-proof way. Using private APIs for optimized string contatentation is out of the question as well: javac
-generated code should use only the public APIs.
Second alternative: Introduce a vararg StringConcat.concat(Object... args)
method, and use it to concatenate the arguments. The major downside is the boxing of primitives, and then boxing all the arguments into the Object[]
. While advanced JITs may recognize these idioms, we are putting ourselves at the mercy of new and untested optimizations in already-complex optimizing compilers. invokedynamic
does roughly the same, but it supports various target method signatures out of the box.
Third alternative: Continue doing as we do at present, and extend -XX:+OptimizeStringConcat
to cover more cases. The downside is that optimizing in the JIT compiler requires spelling out the toString
conversions and storage management with a compiler IR, which makes it hard to extend and requires rare expertise to get right. Also, the minute differences and oddities in the bytecode shape break -XX:+OptimizeStringConcat
, and we have to be extra careful in javac
to avoid this. See e.g. JDK-8043677, JDK-8076758, JDK-8136469.
All three alternatives basically put us at the mercy of fragile compiler optimizations.
Testing
String concatenation is exercised routinely by most Java code. In addition to regular functional and performance tests, we plan to investigate the performance model of this change with microbenchmarks. New regression tests for different strategies may need to be developed.