java.lang.String
is probably one of the most used classes in Java. Naturally, it contains its string data internally.
Do you know how the data is actually stored in String
, and what happens when instantiating a String
from a byte array? In this post, we’ll explore the internal structure of java.lang.String
and discuss ways to improve instantiation performance.
Internal structure of java.lang.String
in Java 8 or earlier
In Java 8, java.lang.String contains its string data as a 16-bit char array.
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {
/** The value is used for character storage. */
private final char value[];
Enter fullscreen mode Exit fullscreen mode
When instantiating a String from a byte array, StringCoding.decode()
is called.
public String(byte bytes[], int offset, int length, Charset charset) {
if (charset == null)
throw new NullPointerException("charset");
checkBounds(bytes, offset, length);
this.value = StringCoding.decode(charset, bytes, offset, length);
}
Enter fullscreen mode Exit fullscreen mode
In the case of US_ASCII
, sun.nio.cs.US_ASCII.Decoder.decode() is finally called, which copies the bytes of the source byte array into a char array one by one.
public int decode(byte[] src, int sp, int len, char[] dst) {
int dp = 0;
len = Math.min(len, dst.length);
while (dp < len) {
byte b = src[sp++];
if (b >= 0)
dst[dp++] = (char)b;
else
dst[dp++] = repl;
}
return dp;
}
Enter fullscreen mode Exit fullscreen mode
The newly created char array is used as the new String instance’s char array value
.
As you notice, even if the source byte array contains only single byte characters, the byte-to-char copy iteration occurs.
Internal structure of java.lang.String
in Java 9 or later
In Java 9 or later, java.lang.String contains its string data as a 8-bit byte array.
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {
/** * The value is used for character storage. * * @implNote This field is trusted by the VM, and is a subject to * constant folding if String instance is constant. Overwriting this * field after construction will cause problems. * * Additionally, it is marked with {@link Stable} to trust the contents * of the array. No other facility in JDK provides this functionality (yet). * {@link Stable} is safe here, because value is never null. */
@Stable
private final byte[] value;
Enter fullscreen mode Exit fullscreen mode
When instantiating a String from a byte array, StringCoding.decode()
is also called.
public String(byte bytes[], int offset, int length, Charset charset) {
if (charset == null)
throw new NullPointerException("charset");
checkBoundsOffCount(offset, length, bytes.length);
StringCoding.Result ret =
StringCoding.decode(charset, bytes, offset, length);
this.value = ret.value;
this.coder = ret.coder;
}
Enter fullscreen mode Exit fullscreen mode
In the case of US_ASCII
, StringCoding.decodeASCII() is called, which copies the source byte array using Arrays.copyOfRange()
, as both the source and destination are byte arrays. Arrays.copyOfRange()
internally uses System.arrayCopy()
that is a native method and significantly fast.
private static Result decodeASCII(byte[] ba, int off, int len) {
Result result = resultCached.get();
if (COMPACT_STRINGS && !hasNegatives(ba, off, len)) {
return result.with(Arrays.copyOfRange(ba, off, off + len),
LATIN1);
}
byte[] dst = new byte[len<<1];
int dp = 0;
while (dp < len) {
int b = ba[off++];
putChar(dst, dp++, (b >= 0) ? (char)b : repl);
}
return result.with(dst, UTF16);
}
Enter fullscreen mode Exit fullscreen mode
You may notice COMPACT_STRINGS
constant. This improvement introduced in Java 9 is called Compact Strings. The feature is enabled by default, but you can disable it if you want. See https://docs.oracle.com/en/java/javase/17/vm/java-hotspot-virtual-machine-performance-enhancements.html#GUID-D2E3DC58-D18B-4A6C-8167-4A1DFB4888E4 for detail.
The performance of new String(byte[])
in Java 8, 11, 17 and 21
I wrote this simple JMH benchmark code to evaluate the performance of new String(byte[])
:
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(1)
@Measurement(time = 3, iterations = 4)
@Warmup(iterations = 2)
public class StringInstantiationBenchmark {
private static final int STR_LEN = 512;
private static final byte[] SINGLE_BYTE_STR_SOURCE_BYTES;
private static final byte[] MULTI_BYTE_STR_SOURCE_BYTES;
static {
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < STR_LEN; i++) {
sb.append("x");
}
SINGLE_BYTE_STR_SOURCE_BYTES = sb.toString().getBytes(StandardCharsets.UTF_8);
}
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < STR_LEN / 2; i++) {
sb.append("あ");
}
MULTI_BYTE_STR_SOURCE_BYTES = sb.toString().getBytes(StandardCharsets.UTF_8);
}
}
@Benchmark
public void newStrFromSingleByteStrBytes() {
new String(SINGLE_BYTE_STR_SOURCE_BYTES, StandardCharsets.UTF_8);
}
@Benchmark
public void newStrFromMultiByteStrBytes() {
new String(MULTI_BYTE_STR_SOURCE_BYTES, StandardCharsets.UTF_8);
}
}
Enter fullscreen mode Exit fullscreen mode
The benchmark results are as follows:
- Java 8
Benchmark Mode Cnt Score Error Units
newStrFromMultiByteStrBytes thrpt 4 1672.397 ± 11.338 ops/ms
newStrFromSingleByteStrBytes thrpt 4 4789.745 ± 553.865 ops/ms
Enter fullscreen mode Exit fullscreen mode
- Java 11
Benchmark Mode Cnt Score Error Units
newStrFromMultiByteStrBytes thrpt 4 1507.754 ± 17.931 ops/ms
newStrFromSingleByteStrBytes thrpt 4 15117.040 ± 1240.981 ops/ms
Enter fullscreen mode Exit fullscreen mode
- Java 17
Benchmark Mode Cnt Score Error Units
newStrFromMultiByteStrBytes thrpt 4 1529.215 ± 168.064 ops/ms
newStrFromSingleByteStrBytes thrpt 4 17753.086 ± 251.676 ops/ms
Enter fullscreen mode Exit fullscreen mode
- Java 21
Benchmark Mode Cnt Score Error Units
newStrFromMultiByteStrBytes thrpt 4 1543.525 ± 69.061 ops/ms
newStrFromSingleByteStrBytes thrpt 4 17711.972 ± 1178.212 ops/ms
Enter fullscreen mode Exit fullscreen mode
The throughput of newStrFromSingleByteStrBytes()
was drastically improved from Java 8 to Java 11. It’s likely because of the change from the char array to the byte array in String
class.
Further performance improvement with zero copy
Okay, Compact Strings is a great performance improvement. But there is no room to improve the performance of String instantiation from a byte array? String(byte bytes[], int offset, int length, Charset charset)
in Java 9 or later copies the byte array. Even it uses System.copyArray()
that is a native method and fast, it takes some time.
When I read the source code of Apache Fury which is “a blazingly-fast multi-language serialization framework powered by JIT (just-in-time compilation) and zero-copy”, I found their StringSerializer achieves zero copy String instantiation. Let’s look into the implementation.
The usage of the StringSerializer is as follows:
import org.apache.fury.serializer.StringSerializer;
...
byte[] bytes = "Hello".getBytes();
String s = StringSerializer.newBytesStringZeroCopy(LATIN1, bytes);
System.out.println(s); // >>> Hello
Enter fullscreen mode Exit fullscreen mode
What StringSerializer.newBytesStringZeroCopy()
finally achieves is to call non-public String constructor new String(byte[], byte coder), where the source byte array is directly set to String.value
without copying bytes.
/* * Package private constructor which shares value array for speed. */
String(byte[] value, byte coder) {
this.value = value;
this.coder = coder;
}
Enter fullscreen mode Exit fullscreen mode
When StringSerializer.newBytesStringZeroCopy() is called, the method calls BYTES_STRING_ZERO_COPY_CTR BiFunction or LATIN_BYTES_STRING_ZERO_COPY_CTR Function.
public static String newBytesStringZeroCopy(byte coder, byte[] data) {
if (coder == LATIN1) {
// 700% faster than unsafe put field in java11, only 10% slower than `new String(str)` for
// string length 230.
// 50% faster than unsafe put field in java11 for string length 10.
if (LATIN_BYTES_STRING_ZERO_COPY_CTR != null) {
return LATIN_BYTES_STRING_ZERO_COPY_CTR.apply(data);
} else {
// JDK17 removed newStringLatin1
return BYTES_STRING_ZERO_COPY_CTR.apply(data, LATIN1_BOXED);
}
} else if (coder == UTF16) {
// avoid byte box cost.
return BYTES_STRING_ZERO_COPY_CTR.apply(data, UTF16_BOXED);
} else {
// 700% faster than unsafe put field in java11, only 10% slower than `new String(str)` for
// string length 230.
// 50% faster than unsafe put field in java11 for string length 10.
// `invokeExact` must pass exact params with exact types:
// `(Object) data, coder` will throw WrongMethodTypeException
return BYTES_STRING_ZERO_COPY_CTR.apply(data, coder);
}
}
Enter fullscreen mode Exit fullscreen mode
BYTES_STRING_ZERO_COPY_CTR
is initialized to a BiFunction
returned from getBytesStringZeroCopyCtr():
private static BiFunction<byte[], Byte, String> getBytesStringZeroCopyCtr() {
if (!STRING_VALUE_FIELD_IS_BYTES) {
return null;
}
MethodHandle handle = getJavaStringZeroCopyCtrHandle();
if (handle == null) {
return null;
}
// Faster than handle.invokeExact(data, byte)
try {
MethodType instantiatedMethodType =
MethodType.methodType(handle.type().returnType(), new Class[] {byte[].class, Byte.class});
CallSite callSite =
LambdaMetafactory.metafactory(
STRING_LOOK_UP,
"apply",
MethodType.methodType(BiFunction.class),
handle.type().generic(),
handle,
instantiatedMethodType);
return (BiFunction) callSite.getTarget().invokeExact();
} catch (Throwable e) {
return null;
}
}
Enter fullscreen mode Exit fullscreen mode
The method returns a BiFunction
that receives byte[] value, byte coder
as arguments. The function invokes a MethodHandle
for the String constructor new String(byte[] value, byte coder)
via CallSite
using LambdaMetafactory.metafactory()
. It seems faster than directly calling MethodHandle.invokeExact()
. I guess that’s because of skipping bootstrap process by reusing the CallSite.
https://cr.openjdk.org/~ntv/talks/eclipseSummit16/indyunderTheHood.pdf
https://cr.openjdk.org/~ntv/talks/eclipseSummit16/indyunderTheHood.pdf
LATIN_BYTES_STRING_ZERO_COPY_CTR
is initialized to a Function
returned from getLatinBytesStringZeroCopyCtr():
private static Function<byte[], String> getLatinBytesStringZeroCopyCtr() {
if (!STRING_VALUE_FIELD_IS_BYTES) {
return null;
}
if (STRING_LOOK_UP == null) {
return null;
}
try {
Class<?> clazz = Class.forName("java.lang.StringCoding");
MethodHandles.Lookup caller = STRING_LOOK_UP.in(clazz);
// JDK17 removed this method.
MethodHandle handle =
caller.findStatic(
clazz, "newStringLatin1", MethodType.methodType(String.class, byte[].class));
// Faster than handle.invokeExact(data, byte)
return _JDKAccess.makeFunction(caller, handle, Function.class);
} catch (Throwable e) {
return null;
}
}
Enter fullscreen mode Exit fullscreen mode
The method returns a Function
that receives byte[]
(coder
isn’t needed since it’s only for LATIN1) as arguments like getBytesStringZeroCopyCtr()
. This Function invokes a MethodHandle
for StringCoding.newStringLatin1(byte[] src) instead of the String constructor new String(byte[] value, byte coder)
. _JDKAccess.makeFunction()
wraps the invocation of a MethodHandle
with LambdaMetafactory.metafactory()
as well as in getBytesStringZeroCopyCtr()
.
StringCoding.newStringLatin1()
is removed at Java 17. So, BYTES_STRING_ZERO_COPY_CTR
function is used in Java 17 or later, while LATIN_BYTES_STRING_ZERO_COPY_CTR
function is used otherwise.
The points are:
- Call non-public StringCoding.newStringLatin1() or new String(byte[] value, byte coder) to avoid byte array copy
- Minimize the cost of MethodHandle invocation via CallSite as much as possible.
It’s time for the benchmark. I updated the JMH benchmark code as follows:
-
build.gradle.kts
dependencies {
implementation("org.apache.fury:fury-core:0.9.0")
...
Enter fullscreen mode Exit fullscreen mode
-
org/komamitsu/stringinstantiationbench/StringInstantiationBenchmark.java
package org.komamitsu.stringinstantiationbench;
import org.apache.fury.serializer.StringSerializer;
import org.openjdk.jmh.annotations.*;
import java.nio.charset.StandardCharsets;
import java.util.concurrent.TimeUnit;
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(1)
@Measurement(time = 3, iterations = 4)
@Warmup(iterations = 2)
public class StringInstantiationBenchmark {
private static final int STR_LEN = 512;
private static final byte[] SINGLE_BYTE_STR_SOURCE_BYTES;
private static final byte[] MULTI_BYTE_STR_SOURCE_BYTES;
static {
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < STR_LEN; i++) {
sb.append("x");
}
SINGLE_BYTE_STR_SOURCE_BYTES = sb.toString().getBytes(StandardCharsets.UTF_8);
}
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < STR_LEN / 2; i++) {
sb.append("あ");
}
MULTI_BYTE_STR_SOURCE_BYTES = sb.toString().getBytes(StandardCharsets.UTF_8);
}
}
@Benchmark
public void newStrFromSingleByteStrBytes() {
new String(SINGLE_BYTE_STR_SOURCE_BYTES, StandardCharsets.UTF_8);
}
@Benchmark
public void newStrFromMultiByteStrBytes() {
new String(MULTI_BYTE_STR_SOURCE_BYTES, StandardCharsets.UTF_8);
}
// Copied from org.apache.fury.serializer.StringSerializer.
private static final byte LATIN1 = 0;
private static final Byte LATIN1_BOXED = LATIN1;
private static final byte UTF16 = 1;
private static final Byte UTF16_BOXED = UTF16;
private static final byte UTF8 = 2;
@Benchmark
public void newStrFromSingleByteStrBytesWithZeroCopy() {
StringSerializer.newBytesStringZeroCopy(LATIN1, SINGLE_BYTE_STR_SOURCE_BYTES);
}
@Benchmark
public void newStrFromMultiByteStrBytesWithZeroCopy() {
StringSerializer.newBytesStringZeroCopy(UTF8, MULTI_BYTE_STR_SOURCE_BYTES);
}
}
Enter fullscreen mode Exit fullscreen mode
And the result is as follows:
- Java 11
Benchmark Mode Cnt Score Error Units
newStrFromMultiByteStrBytes thrpt 4 1505.580 ± 13.191 ops/ms
newStrFromMultiByteStrBytesWithZeroCopy thrpt 4 2284141.488 ± 5509.077 ops/ms
newStrFromSingleByteStrBytes thrpt 4 15246.342 ± 258.381 ops/ms
newStrFromSingleByteStrBytesWithZeroCopy thrpt 4 2281817.367 ± 8054.568 ops/ms
Enter fullscreen mode Exit fullscreen mode
- Java 17
Benchmark Mode Cnt Score Error Units
newStrFromMultiByteStrBytes thrpt 4 1545.503 ± 15.283 ops/ms
newStrFromMultiByteStrBytesWithZeroCopy thrpt 4 2273566.173 ± 10212.794 ops/ms
newStrFromSingleByteStrBytes thrpt 4 17598.209 ± 253.282 ops/ms
newStrFromSingleByteStrBytesWithZeroCopy thrpt 4 2277213.103 ± 13380.823 ops/ms
Enter fullscreen mode Exit fullscreen mode
- Java 21
Benchmark Mode Cnt Score Error Units
newStrFromMultiByteStrBytes thrpt 4 1556.272 ± 16.482 ops/ms
newStrFromMultiByteStrBytesWithZeroCopy thrpt 4 3698101.264 ± 429945.546 ops/ms
newStrFromSingleByteStrBytes thrpt 4 17803.149 ± 204.987 ops/ms
newStrFromSingleByteStrBytesWithZeroCopy thrpt 4 3817357.204 ± 89376.224 ops/ms
Enter fullscreen mode Exit fullscreen mode
The benchmark code failed with Java 8 due to NPE. Maybe I used the method in a wrong way.
The performance of StringSerializer.newBytesStringZeroCopy()
was more than 100 times faster in Java 17 and more than 200 times faster in Java 21 than the normal new String(byte[] bytes, Charset charset)
. Maybe this is one on the secrets of why Fury is blazing-fast.
A possible concern of using such a zero-copy strategy and implementation is that the byte array passed to new String(byte[] value, byte coder)
could be owned by multiple objects; the new String object and objects having reference to the byte array.
byte[] bytes = "Hello".getBytes();
String s = StringSerializer.newBytesStringZeroCopy(LATIN1, bytes);
System.out.println(s); // >>> Hello
bytes[4] = '!';
System.out.println(s); // >>> Hell!
Enter fullscreen mode Exit fullscreen mode
This mutability could cause an issue that a string content is unexpectedly changed.
Conclusion
- Use Java 9 or later as much as possible if you’re using Java 8, in terms of the performance of String instantiation.
- There is a technique to instantiate a String from a byte array with zero copy. It’s blazing-fast.
原文链接:Inside `java.lang.String`: Understanding and Optimizing Instantiation Performance
暂无评论内容