Inside `java.lang.String`: Understanding and Optimizing Instantiation Performance

java.lang.String is probably one of the most used classes in Java. Naturally, it contains its string data internally.

Do you know how the data is actually stored in String, and what happens when instantiating a String from a byte array? In this post, we’ll explore the internal structure of java.lang.String and discuss ways to improve instantiation performance.

Internal structure of java.lang.String in Java 8 or earlier

In Java 8, java.lang.String contains its string data as a 16-bit char array.

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final char value[];

Enter fullscreen mode Exit fullscreen mode

When instantiating a String from a byte array, StringCoding.decode() is called.

    public String(byte bytes[], int offset, int length, Charset charset) {
        if (charset == null)
            throw new NullPointerException("charset");
        checkBounds(bytes, offset, length);
        this.value =  StringCoding.decode(charset, bytes, offset, length);
    }

Enter fullscreen mode Exit fullscreen mode

In the case of US_ASCII, sun.nio.cs.US_ASCII.Decoder.decode() is finally called, which copies the bytes of the source byte array into a char array one by one.

        public int decode(byte[] src, int sp, int len, char[] dst) {
            int dp = 0;
            len = Math.min(len, dst.length);
            while (dp < len) {
                byte b = src[sp++];
                if (b >= 0)
                    dst[dp++] = (char)b;
                else
                    dst[dp++] = repl;
            }
            return dp;
        }

Enter fullscreen mode Exit fullscreen mode

The newly created char array is used as the new String instance’s char array value.

As you notice, even if the source byte array contains only single byte characters, the byte-to-char copy iteration occurs.

Internal structure of java.lang.String in Java 9 or later

In Java 9 or later, java.lang.String contains its string data as a 8-bit byte array.

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {

    /** * The value is used for character storage. * * @implNote This field is trusted by the VM, and is a subject to * constant folding if String instance is constant. Overwriting this * field after construction will cause problems. * * Additionally, it is marked with {@link Stable} to trust the contents * of the array. No other facility in JDK provides this functionality (yet). * {@link Stable} is safe here, because value is never null. */
    @Stable
    private final byte[] value;

Enter fullscreen mode Exit fullscreen mode

When instantiating a String from a byte array, StringCoding.decode() is also called.

    public String(byte bytes[], int offset, int length, Charset charset) {
        if (charset == null)
            throw new NullPointerException("charset");
        checkBoundsOffCount(offset, length, bytes.length);
        StringCoding.Result ret =
            StringCoding.decode(charset, bytes, offset, length);
        this.value = ret.value;
        this.coder = ret.coder;
    }

Enter fullscreen mode Exit fullscreen mode

In the case of US_ASCII, StringCoding.decodeASCII() is called, which copies the source byte array using Arrays.copyOfRange(), as both the source and destination are byte arrays. Arrays.copyOfRange() internally uses System.arrayCopy() that is a native method and significantly fast.

    private static Result decodeASCII(byte[] ba, int off, int len) {
        Result result = resultCached.get();
        if (COMPACT_STRINGS && !hasNegatives(ba, off, len)) {
            return result.with(Arrays.copyOfRange(ba, off, off + len),
                               LATIN1);
        }
        byte[] dst = new byte[len<<1];
        int dp = 0;
        while (dp < len) {
            int b = ba[off++];
            putChar(dst, dp++, (b >= 0) ? (char)b : repl);
        }
        return result.with(dst, UTF16);
    }

Enter fullscreen mode Exit fullscreen mode

You may notice COMPACT_STRINGS constant. This improvement introduced in Java 9 is called Compact Strings. The feature is enabled by default, but you can disable it if you want. See https://docs.oracle.com/en/java/javase/17/vm/java-hotspot-virtual-machine-performance-enhancements.html#GUID-D2E3DC58-D18B-4A6C-8167-4A1DFB4888E4 for detail.

The performance of new String(byte[]) in Java 8, 11, 17 and 21

I wrote this simple JMH benchmark code to evaluate the performance of new String(byte[]):

@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(1)
@Measurement(time = 3, iterations = 4)
@Warmup(iterations = 2)
public class StringInstantiationBenchmark {
  private static final int STR_LEN = 512;
  private static final byte[] SINGLE_BYTE_STR_SOURCE_BYTES;
  private static final byte[] MULTI_BYTE_STR_SOURCE_BYTES;
  static {
    {
      StringBuilder sb = new StringBuilder();
      for (int i = 0; i < STR_LEN; i++) {
        sb.append("x");
      }
      SINGLE_BYTE_STR_SOURCE_BYTES = sb.toString().getBytes(StandardCharsets.UTF_8);
    }
    {
      StringBuilder sb = new StringBuilder();
      for (int i = 0; i < STR_LEN / 2; i++) {
        sb.append("あ");
      }
      MULTI_BYTE_STR_SOURCE_BYTES = sb.toString().getBytes(StandardCharsets.UTF_8);
    }
  }

  @Benchmark
  public void newStrFromSingleByteStrBytes() {
    new String(SINGLE_BYTE_STR_SOURCE_BYTES, StandardCharsets.UTF_8);
  }

  @Benchmark
  public void newStrFromMultiByteStrBytes() {
    new String(MULTI_BYTE_STR_SOURCE_BYTES, StandardCharsets.UTF_8);
  }
}

Enter fullscreen mode Exit fullscreen mode

The benchmark results are as follows:

  • Java 8
Benchmark                        Mode  Cnt     Score     Error   Units
newStrFromMultiByteStrBytes     thrpt    4  1672.397 ±  11.338  ops/ms
newStrFromSingleByteStrBytes    thrpt    4  4789.745 ± 553.865  ops/ms

Enter fullscreen mode Exit fullscreen mode

  • Java 11
Benchmark                        Mode  Cnt      Score      Error   Units
newStrFromMultiByteStrBytes     thrpt    4   1507.754 ±   17.931  ops/ms
newStrFromSingleByteStrBytes    thrpt    4  15117.040 ± 1240.981  ops/ms

Enter fullscreen mode Exit fullscreen mode

  • Java 17
Benchmark                        Mode  Cnt      Score     Error   Units
newStrFromMultiByteStrBytes     thrpt    4   1529.215 ± 168.064  ops/ms
newStrFromSingleByteStrBytes    thrpt    4  17753.086 ± 251.676  ops/ms

Enter fullscreen mode Exit fullscreen mode

  • Java 21
Benchmark                        Mode  Cnt      Score      Error   Units
newStrFromMultiByteStrBytes     thrpt    4   1543.525 ±   69.061  ops/ms
newStrFromSingleByteStrBytes    thrpt    4  17711.972 ± 1178.212  ops/ms

Enter fullscreen mode Exit fullscreen mode

The throughput of newStrFromSingleByteStrBytes() was drastically improved from Java 8 to Java 11. It’s likely because of the change from the char array to the byte array in String class.

Further performance improvement with zero copy

Okay, Compact Strings is a great performance improvement. But there is no room to improve the performance of String instantiation from a byte array? String(byte bytes[], int offset, int length, Charset charset) in Java 9 or later copies the byte array. Even it uses System.copyArray() that is a native method and fast, it takes some time.

When I read the source code of Apache Fury which is “a blazingly-fast multi-language serialization framework powered by JIT (just-in-time compilation) and zero-copy”, I found their StringSerializer achieves zero copy String instantiation. Let’s look into the implementation.

The usage of the StringSerializer is as follows:

import org.apache.fury.serializer.StringSerializer;

...

    byte[] bytes = "Hello".getBytes();
    String s = StringSerializer.newBytesStringZeroCopy(LATIN1, bytes);
    System.out.println(s);    // >>> Hello

Enter fullscreen mode Exit fullscreen mode

What StringSerializer.newBytesStringZeroCopy() finally achieves is to call non-public String constructor new String(byte[], byte coder), where the source byte array is directly set to String.value without copying bytes.

   /* * Package private constructor which shares value array for speed. */
    String(byte[] value, byte coder) {
        this.value = value;
        this.coder = coder;
    }

Enter fullscreen mode Exit fullscreen mode

When StringSerializer.newBytesStringZeroCopy() is called, the method calls BYTES_STRING_ZERO_COPY_CTR BiFunction or LATIN_BYTES_STRING_ZERO_COPY_CTR Function.

  public static String newBytesStringZeroCopy(byte coder, byte[] data) {
    if (coder == LATIN1) {
      // 700% faster than unsafe put field in java11, only 10% slower than `new String(str)` for
      // string length 230.
      // 50% faster than unsafe put field in java11 for string length 10.
      if (LATIN_BYTES_STRING_ZERO_COPY_CTR != null) {
        return LATIN_BYTES_STRING_ZERO_COPY_CTR.apply(data);
      } else {
        // JDK17 removed newStringLatin1
        return BYTES_STRING_ZERO_COPY_CTR.apply(data, LATIN1_BOXED);
      }
    } else if (coder == UTF16) {
      // avoid byte box cost.
      return BYTES_STRING_ZERO_COPY_CTR.apply(data, UTF16_BOXED);
    } else {
      // 700% faster than unsafe put field in java11, only 10% slower than `new String(str)` for
      // string length 230.
      // 50% faster than unsafe put field in java11 for string length 10.
      // `invokeExact` must pass exact params with exact types:
      // `(Object) data, coder` will throw WrongMethodTypeException
      return BYTES_STRING_ZERO_COPY_CTR.apply(data, coder);
    }
  }

Enter fullscreen mode Exit fullscreen mode

BYTES_STRING_ZERO_COPY_CTR is initialized to a BiFunction returned from getBytesStringZeroCopyCtr():

  private static BiFunction<byte[], Byte, String> getBytesStringZeroCopyCtr() {
    if (!STRING_VALUE_FIELD_IS_BYTES) {
      return null;
    }
    MethodHandle handle = getJavaStringZeroCopyCtrHandle();
    if (handle == null) {
      return null;
    }
    // Faster than handle.invokeExact(data, byte)
    try {
      MethodType instantiatedMethodType =
          MethodType.methodType(handle.type().returnType(), new Class[] {byte[].class, Byte.class});
      CallSite callSite =
          LambdaMetafactory.metafactory(
              STRING_LOOK_UP,
              "apply",
              MethodType.methodType(BiFunction.class),
              handle.type().generic(),
              handle,
              instantiatedMethodType);
      return (BiFunction) callSite.getTarget().invokeExact();
    } catch (Throwable e) {
      return null;
    }
  }

Enter fullscreen mode Exit fullscreen mode

The method returns a BiFunction that receives byte[] value, byte coder as arguments. The function invokes a MethodHandle
for the String constructor new String(byte[] value, byte coder) via CallSite using LambdaMetafactory.metafactory(). It seems faster than directly calling MethodHandle.invokeExact(). I guess that’s because of skipping bootstrap process by reusing the CallSite.

图片[1]-Inside `java.lang.String`: Understanding and Optimizing Instantiation Performance - 拾光赋-拾光赋
https://cr.openjdk.org/~ntv/talks/eclipseSummit16/indyunderTheHood.pdf

图片[2]-Inside `java.lang.String`: Understanding and Optimizing Instantiation Performance - 拾光赋-拾光赋
https://cr.openjdk.org/~ntv/talks/eclipseSummit16/indyunderTheHood.pdf

LATIN_BYTES_STRING_ZERO_COPY_CTR is initialized to a Function returned from getLatinBytesStringZeroCopyCtr():

  private static Function<byte[], String> getLatinBytesStringZeroCopyCtr() {
    if (!STRING_VALUE_FIELD_IS_BYTES) {
      return null;
    }
    if (STRING_LOOK_UP == null) {
      return null;
    }
    try {
      Class<?> clazz = Class.forName("java.lang.StringCoding");
      MethodHandles.Lookup caller = STRING_LOOK_UP.in(clazz);
      // JDK17 removed this method.
      MethodHandle handle =
          caller.findStatic(
              clazz, "newStringLatin1", MethodType.methodType(String.class, byte[].class));
      // Faster than handle.invokeExact(data, byte)
      return _JDKAccess.makeFunction(caller, handle, Function.class);
    } catch (Throwable e) {
      return null;
    }
  }

Enter fullscreen mode Exit fullscreen mode

The method returns a Function that receives byte[] (coder isn’t needed since it’s only for LATIN1) as arguments like getBytesStringZeroCopyCtr(). This Function invokes a MethodHandle
for StringCoding.newStringLatin1(byte[] src) instead of the String constructor new String(byte[] value, byte coder). _JDKAccess.makeFunction() wraps the invocation of a MethodHandle with LambdaMetafactory.metafactory() as well as in getBytesStringZeroCopyCtr().

StringCoding.newStringLatin1() is removed at Java 17. So, BYTES_STRING_ZERO_COPY_CTR function is used in Java 17 or later, while LATIN_BYTES_STRING_ZERO_COPY_CTR function is used otherwise.

The points are:

It’s time for the benchmark. I updated the JMH benchmark code as follows:

  • build.gradle.kts
dependencies {
    implementation("org.apache.fury:fury-core:0.9.0")
    ...

Enter fullscreen mode Exit fullscreen mode

  • org/komamitsu/stringinstantiationbench/StringInstantiationBenchmark.java
package org.komamitsu.stringinstantiationbench;

import org.apache.fury.serializer.StringSerializer;
import org.openjdk.jmh.annotations.*;

import java.nio.charset.StandardCharsets;
import java.util.concurrent.TimeUnit;

@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(1)
@Measurement(time = 3, iterations = 4)
@Warmup(iterations = 2)
public class StringInstantiationBenchmark {
  private static final int STR_LEN = 512;
  private static final byte[] SINGLE_BYTE_STR_SOURCE_BYTES;
  private static final byte[] MULTI_BYTE_STR_SOURCE_BYTES;
  static {
    {
      StringBuilder sb = new StringBuilder();
      for (int i = 0; i < STR_LEN; i++) {
        sb.append("x");
      }
      SINGLE_BYTE_STR_SOURCE_BYTES = sb.toString().getBytes(StandardCharsets.UTF_8);
    }
    {
      StringBuilder sb = new StringBuilder();
      for (int i = 0; i < STR_LEN / 2; i++) {
        sb.append("あ");
      }
      MULTI_BYTE_STR_SOURCE_BYTES = sb.toString().getBytes(StandardCharsets.UTF_8);
    }
  }

  @Benchmark
  public void newStrFromSingleByteStrBytes() {
    new String(SINGLE_BYTE_STR_SOURCE_BYTES, StandardCharsets.UTF_8);
  }

  @Benchmark
  public void newStrFromMultiByteStrBytes() {
    new String(MULTI_BYTE_STR_SOURCE_BYTES, StandardCharsets.UTF_8);
  }

  // Copied from org.apache.fury.serializer.StringSerializer.
  private static final byte LATIN1 = 0;
  private static final Byte LATIN1_BOXED = LATIN1;
  private static final byte UTF16 = 1;
  private static final Byte UTF16_BOXED = UTF16;
  private static final byte UTF8 = 2;

  @Benchmark
  public void newStrFromSingleByteStrBytesWithZeroCopy() {
    StringSerializer.newBytesStringZeroCopy(LATIN1, SINGLE_BYTE_STR_SOURCE_BYTES);
  }

  @Benchmark
  public void newStrFromMultiByteStrBytesWithZeroCopy() {
    StringSerializer.newBytesStringZeroCopy(UTF8, MULTI_BYTE_STR_SOURCE_BYTES);
  }
}

Enter fullscreen mode Exit fullscreen mode

And the result is as follows:

  • Java 11
Benchmark                                  Mode  Cnt        Score      Error   Units
newStrFromMultiByteStrBytes               thrpt    4     1505.580 ±   13.191  ops/ms
newStrFromMultiByteStrBytesWithZeroCopy   thrpt    4  2284141.488 ± 5509.077  ops/ms
newStrFromSingleByteStrBytes              thrpt    4    15246.342 ±  258.381  ops/ms
newStrFromSingleByteStrBytesWithZeroCopy  thrpt    4  2281817.367 ± 8054.568  ops/ms

Enter fullscreen mode Exit fullscreen mode

  • Java 17
Benchmark                                  Mode  Cnt        Score       Error   Units
newStrFromMultiByteStrBytes               thrpt    4     1545.503 ±    15.283  ops/ms
newStrFromMultiByteStrBytesWithZeroCopy   thrpt    4  2273566.173 ± 10212.794  ops/ms
newStrFromSingleByteStrBytes              thrpt    4    17598.209 ±   253.282  ops/ms
newStrFromSingleByteStrBytesWithZeroCopy  thrpt    4  2277213.103 ± 13380.823  ops/ms

Enter fullscreen mode Exit fullscreen mode

  • Java 21
Benchmark                                  Mode  Cnt        Score        Error   Units
newStrFromMultiByteStrBytes               thrpt    4     1556.272 ±     16.482  ops/ms
newStrFromMultiByteStrBytesWithZeroCopy   thrpt    4  3698101.264 ± 429945.546  ops/ms
newStrFromSingleByteStrBytes              thrpt    4    17803.149 ±    204.987  ops/ms
newStrFromSingleByteStrBytesWithZeroCopy  thrpt    4  3817357.204 ±  89376.224  ops/ms

Enter fullscreen mode Exit fullscreen mode

The benchmark code failed with Java 8 due to NPE. Maybe I used the method in a wrong way.

The performance of StringSerializer.newBytesStringZeroCopy() was more than 100 times faster in Java 17 and more than 200 times faster in Java 21 than the normal new String(byte[] bytes, Charset charset). Maybe this is one on the secrets of why Fury is blazing-fast.

A possible concern of using such a zero-copy strategy and implementation is that the byte array passed to new String(byte[] value, byte coder) could be owned by multiple objects; the new String object and objects having reference to the byte array.

    byte[] bytes = "Hello".getBytes();
    String s = StringSerializer.newBytesStringZeroCopy(LATIN1, bytes);
    System.out.println(s);    // >>> Hello
    bytes[4] = '!';
    System.out.println(s);    // >>> Hell!

Enter fullscreen mode Exit fullscreen mode

This mutability could cause an issue that a string content is unexpectedly changed.

Conclusion

  • Use Java 9 or later as much as possible if you’re using Java 8, in terms of the performance of String instantiation.
  • There is a technique to instantiate a String from a byte array with zero copy. It’s blazing-fast.

原文链接:Inside `java.lang.String`: Understanding and Optimizing Instantiation Performance

© 版权声明
THE END
喜欢就支持一下吧
点赞7 分享
评论 抢沙发

请登录后发表评论

    暂无评论内容