JEP 280: Indify String Concatenation

Summary

Change the static String-concatenation bytecode sequence generated by javac to use invokedynamic calls to JDK library functions. This will enable future optimizations of String concatenation without requiring further changes to the bytecode emitted by javac.

Goals

Lay the groundwork for building optimized String concatenation handlers, implementable without the need to change the Java-to-bytecode compiler. Multiple translation strategies should be tried as the motivational examples for this work. Producing (and possibly switching to) the optimized translation strategies is a stretch goal for this JEP.

Non-Goals

It is not a goal to: Introduce any new String and/or StringBuilder APIs that might help to build better translation strategies, revisit the JIT compiler's support for optimizing String concatenation, support advanced String interpolation use cases, or explore other changes to the Java programming language.

Success Metrics

A stable, future-proof bytecode sequence is generated by javac. The intent is to change the bytecode shape at most once in any major Java release.
String concatenation performance does not regress.
Startup time and time to performance do not regress beyond reasonable levels.

Motivation

Currently javac translates String concatenation into StringBuilder::append chains. Sometimes this translation is not optimal, and sometimes we need to presize the StringBuilder appropriately. The -XX:+OptimizeStringConcat option enables aggressive optimizations in the JIT compiler, which recognize StringBuilder append chains and do the presizing and in-place copying. Those optimizations, while fruitful, are fragile and difficult to extend and maintain; see, e.g., JDK-8043677, JDK-8076758, and JDK-8136469. It is tempting to change the translation in javac itself; see, e.g., the recent proposal on the compiler-dev list.

When we consider changing anything in javac, it seems inconvenient to change the compiler and the bytecode shape every time we want to improve performance. Users generally expect the same bytecode to run faster on newer JVMs. Asking users to recompile their Java programs for performance is not friendly, and it also explodes the testing matrix, since JVMs should recognize all the variants of the generated bytecode. Therefore, we might want to employ some trick to declare the intent of concatenating strings in bytecode, and then expand that intent in runtime.

In other words, we are introducing something similar to a new bytecode, "string concat". As in the Lambda work, this closes the gap between the JLS allowing a language feature (string concatentation, in this case) and the JVMS not having the appropriate facilities for it, forcing javac to translate it into low-level code, and further forcing VM implementations to recognize all low-level translated forms. invokedynamic allows us to overcome this impedance mismatch with one leap by providing a guaranteed interface to string concatenation at the level of the core libraries. The actual implementation of the interface can even be implemented using (potentially unsafe) private JVM APIs for concatenation, such as fixed-length string builders, tied to a particular use case. Making use of these APIs directly in javac would mean exposing them to the public.

Description

We will use the power of invokedynamic: It offers the facilities for a lazy linkage, by providing the means to bootstrap the call target once, during the initial invocation. This approach is nothing new, and we borrow extensively from current code that translates lambda expressions.

The idea is to replace the entire StringBuilder append dance with a simple invokedynamic call to java.lang.invoke.StringConcatFactory, that will accept the values in the need of concatenation. For example,

String m(String a, int b) {
  return a + "(" + b + ")";
}

is currently compiled to:

java.lang.String m(java.lang.String, int);
       0: new           #2                  // class java/lang/StringBuilder
       3: dup
       4: invokespecial #3                  // Method java/lang/StringBuilder."<init>":()V
       7: aload_1
       8: invokevirtual #4                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      11: ldc           #5                  // String (
      13: invokevirtual #4                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      16: iload_2
      17: invokevirtual #6                  // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
      20: ldc           #7                  // String )
      22: invokevirtual #4                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      25: invokevirtual #8                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      28: areturn

But even with the naive indy translation, available in the proposed implementation via -XDstringConcat=indy, it can be significantly simpler:

java.lang.String m(java.lang.String, int);
       0: aload_1
       1: ldc           #2                  // String (
       3: iload_2
       4: ldc           #3                  // String )
       6: invokedynamic #4,  0              // InvokeDynamic #0:makeConcat:(Ljava/lang/String;Ljava/lang/String;ILjava/lang/String;)Ljava/lang/String;
      11: areturn

BootstrapMethods:
  0: #19 invokestatic java/lang/invoke/StringConcatFactory.makeConcat:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;

Notice how we pass an int argument without boxing. At runtime, the bootstrap method (BSM) runs and links in the actual code doing the concatenation. It rewrites the invokedynamic call with an appropriate invokestatic call. This loads the constant string from the constant pool, but we can leverage the BSM static args to pass these and other constants straight to the BSM call. This is what a proposed -XDstringConcat=indyWithConstants flavor does:

java.lang.String m(java.lang.String, int);
       0: aload_1
       1: iload_2
       2: invokedynamic #2,  0              // InvokeDynamic #0:makeConcat:(Ljava/lang/String;I)Ljava/lang/String;
       7: areturn

BootstrapMethods:
  0: #15 invokestatic java/lang/invoke/StringConcatFactory.makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
    Method arguments:
      #16 \u0001(\u0001)

Notice how we only pass the dynamic arguments ("a" and "b") to the BSM. The static constants would be already handled during linkage. The BSM method is provided with a recipe that tells in what order to concatenate the dynamic and static arguments, and what the static arguments are. This strategy also treats nulls, primitives, and empty strings separately.

It is an open question which bytecode flavor should be the default. More details on bytecode shapes, concat flavors, and performance data can be found in the experimental notes. A proposed bootstrap method API can be found here. A complete implementation can be found in the sandbox branch:

$ hg clone http://hg.openjdk.java.net/jdk9/sandbox sandbox 
$ cd sandbox/ 
$ sh ./common/bin/hgforest.sh up -r JDK-8085796-indyConcat
$ sh ./configure 
$ make images

One can see the difference between the baseline and patched runtime by using:

$ hg diff -r default:JDK-8085796-indyConcat
$ cd langtools/
$ hg diff -r default:JDK-8085796-indyConcat
$ cd jdk/
$ hg diff -r default:JDK-8085796-indyConcat

The benchmarks can be found here: http://cr.openjdk.java.net/~shade/8085796/

The proposed implementation successfully builds the JDK, runs the regression tests (including new ones that test String concat), and passes the smoke tests on all platforms. Multiple strategies can be used for the actual concatenation. Our proposed implementation shows there are no throughput hits when we move the bytecode sequence generated by javac into the BSM that injects the same bytecode, which validates the approach. Optimized strategies are shown to perform as well or better than the baseline, especially when default StringBuilder length is not enough and/or the VM's optimizations fail.

See the experimental notes for more data.

Alternatives

There are three alternatives to this proposal.

First, obvious alternative: Change the bytecode sequence in javac itself, by transplanting one of the bytecode-generation strategies from the proposed implementation. While simple in the short-term, it breaks away from the unspoken contract with the Java community at large: JVMs are smart enough to recognize and optimize the existing bytecode. We cannot easily coerce users to recompile their Java programs every time we want to tune the String-concatenation translation. Every time a bytecode shape changes the JIT compilers are forced to recognize a new form, and that explodes the test matrix considerably, as well as sets up unsuspecting users for strange performance (mis)behaviors. If we are to change the bytecode shape for performance reasons anyway, we better make it once in a major release, and in a future-proof way. Using private APIs for optimized string contatentation is out of the question as well: javac-generated code should use only the public APIs.

Second alternative: Introduce a vararg StringConcat.concat(Object... args) method, and use it to concatenate the arguments. The major downside is the boxing of primitives, and then boxing all the arguments into the Object[]. While advanced JITs may recognize these idioms, we are putting ourselves at the mercy of new and untested optimizations in already-complex optimizing compilers. invokedynamic does roughly the same, but it supports various target method signatures out of the box.

Third alternative: Continue doing as we do at present, and extend -XX:+OptimizeStringConcat to cover more cases. The downside is that optimizing in the JIT compiler requires spelling out the toString conversions and storage management with a compiler IR, which makes it hard to extend and requires rare expertise to get right. Also, the minute differences and oddities in the bytecode shape break -XX:+OptimizeStringConcat, and we have to be extra careful in javac to avoid this. See e.g. JDK-8043677, JDK-8076758, JDK-8136469.

All three alternatives basically put us at the mercy of fragile compiler optimizations.

Testing

String concatenation is exercised routinely by most Java code. In addition to regular functional and performance tests, we plan to investigate the performance model of this change with microbenchmarks. New regression tests for different strategies may need to be developed.

Risks and Assumptions

Other String-related changes. This change has the potential to collide with other String-related changes, notably with Compact Strings (JEP 254). The mitigation plan is to generate exactly the same bytecode sequence as the current JDK does, then a "Compact Strings"-enabled JVM would exercise the same code path, oblivious of any String-concat translation changes. Indeed, our preliminary experiments show that our proposal plays nicely with Compact Strings.

java.base exemptions. Since this feature uses a core library feature (java.lang.invoke) to implement a core language feature (String concatenation), we have to exempt the java.base module from using the indified String concat. Otherwise a circularity happens when the java.lang.invoke.* machinery requires String concat to work, which in turn requires the java.lang.invoke.* machinery. This exemption may limit the performance improvements observed from this feature, since many java.base classes will not be able to use it. We consider this an acceptable downside, which should be covered by the VM's optimizing compilers.

Maintenance cost. The legacy String-concat translation is not going away, and therefore we have to maintain both -XX:+OptimizeStringConcat and the optimizations for newer translation strategies in the VM. javac will still need to provide a fallback for the cases where indified String concat cannot be used, notably some system classes. This does not pose a significant problem, since our performance story includes involving -XX:+OptimizeStringConcat in most strategies.

Compatibility: Platforms that don't spell invokedynamic. What should they do once they see the indified String concat? There is experience with Lambdas already, where the non-indy-spelling platforms have to desugar indys away, and re-generate statically that very same code that indy would emit on the fly. While the code sequence generated for naive String concat is simpler than LambdaFactory, it still creates more complexity in the VM code. Since javac still retains the code that emits the legacy String concat sequence anyway, the aforementioned platforms can use this fallback.

Compatibility: Other compilers. Should other (non-javac) compilers emit the same bytecode? Do we propose to change all compiler implementations to use the new translation strategy? Since the legacy String concat does not go away, other compilers are free to emit the same bytecode sequence they emit today. They can incrementally change some of the translations to the proposed indy-based translation scheme, as they see fit.

Compatibility: Bytecode manipulators/weavers. Tools would need to handle new indys in places they are not used to even in absence of source code changes, just by the virtue of recompilation. We don't see this as the major problem, since tools are expected to handle Lambdas already, and so they need to recognize invokedynamic. Corrections would be needed for the tools that used to instrument the StringBuilder::append chains generated by javac.

Static footprint. Classfile size will probably be an issue in String-concat-heavy code. There are machine-generated files out there which use up nearly all the entries in the CP and do a huge number of string concatenations. It seems that the method bytecode footprint is much lower for indified String concat, and there is additional static overhead in the constant pool for the java.lang.invoke machinery, plus a bit of overhead per each concat shape. See the notes above for more data.

Startup overhead. There is a risk that startup overhead would be prohibitive for this work to move forward. The implementation of Lambdas introduced these kinds of translation strategies, but you only pay the price if you use Lambdas. With String concat, once invokedynamic is compiled into the bytecode, there is no escape for uses of + in existing code (unless you change the code to avoid + and invoke StringBuilder explicitly). This risk is associated with the initialization of the invokedynamic machinery (see JDK-8086045), and therefore accepting the startup regression here would mean the absence of startup regressions in other changes that require invokedynamic to be initialized, notably the module system (JEP 261). Conversely, if the module system lands before this change then the startup costs will be amortized.

JEP 280: Indify String Concatenation

Summary​

Goals​

Non-Goals​

Success Metrics​

Motivation​

Description​

Alternatives​

Testing​

Risks and Assumptions​

Dependencies​