TLDR; Today we will dive into the start up of Elasticsearch, how it parses the configurable JVM options and how it can ergonomically switch between JVM options on startup.
Elasticsearch is a distributed search & analytics engine. Elasticsearch’s full text search capabilities are based on Apache Lucene. It’s the heart of the Elastic Stack and powers its solutions Enterprise Search, Observability and Security as well as many well known internet websites like Wikipedia, GitHub or Stack Overflow.
Elasticsearch tries to be a good JVM ecosystem citizen and ships with a recent distribution of the JVM. Elasticsearch 7.9.3 ships with a recent OpenJDK 15 distribution. One of the core principles of Elasticsearch is to get up and running as simple as possible. This is the reason why Elasticsearch ships a JDK, so that the user does not have the trouble of installing one. Not everyone is a Java expert after all! At some point however, you need to become at least a small expert, as you need to configure some JDK options like setting the heap.
In order to be able to configure JDK options for Elasticsearch before startup, these options need to be parsed and evaluated. When the user runs ./bin/elasticsearch or ./bin/elasticsearch.bat, some more Java programs are started before the actual Elasticsearch process is fired up. First a program to create a temporary directory is launched, which acts differently on Windows than on other operating systems. Second, the JvmOptionsParser class is used to determine the Java options, and only after this is done, the output of the parser is used to start the main Elasticsearch process. This also allows to run the other Java programs with small heaps to make sure they are fast – by using the JDK defaults.
Let’s dive into the mechanism to configure JVM options.
Configuring JVM options with Elasticsearch
The most commonly used jvm option that requires configuration before the Elasticsearch Java process is started, is setting the heap size. In order to do so, Elasticsearch makes use of a mechanism, that not only reads the config/jvm.options
file but also reads the config/jvm.options.d
directory and appends the contents of all files to create a big list of JVM options. You could create a file like config/jvm.options.d/heap.options
like this:
# make sure we configure 2gb of heap
-Xms2g
-Xmx2g
Enter fullscreen mode Exit fullscreen mode
This would configure the heap on startup. However the configuration and parsing mechanism is more powerful. Not only you can configure options, you can also configure different options for different JDK major versions.
Side note: In case you are asking yourself, why is there a jvm.options.d
directory and not just a file: this caters properly for package upgrades of RPM or debian packages, so that the original jvm.options
can be replaced and does not need to be edited.
So, why is this useful you might ask yourself? Well, sometimes a new Java release deprecates features, and sometimes features get removed. One of those features was the CMS Garbage Collector, which got deprecated in Java 9 and finally removed more than two years later in Java 14. Elasticsearch has been a happy user of the CMS for years, but with the removal there had to be a mechanism to start with another garbage collector as of Java 14 onwards. In order to support this the JVM options parser also supports the ability to set certain options only for a certain Java version like this:
## GC configuration
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly
## G1GC Configuration
# NOTE: G1 GC is only supported on JDK version 10 or later
# to use G1GC, uncomment the next two lines and update the version on the
# following three lines to your version of the JDK
# 10-13:-XX:-UseConcMarkSweepGC
# 10-13:-XX:-UseCMSInitiatingOccupancyOnly
14-:-XX:+UseG1GC
14-:-XX:G1ReservePercent=25
14-:-XX:InitiatingHeapOccupancyPercent=30
Enter fullscreen mode Exit fullscreen mode
The same applies for different GC options with Java 8 and Java 9
## JDK 8 GC logging
8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:logs/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m
# JDK 9+ GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m
Enter fullscreen mode Exit fullscreen mode
You can read more about setting JVM options in the official Elastic docs.
There is another safeguard to append all configured and dynamically created JVM flags and start a JVM is to check if those options are compatible, before starting Elasticsearch in order to fail fast.
Also, Elasticsearch logs all JVM options on start up to allow for easy comparison of what is assumed by the user. Also, those options are not only logged, but can be retrieved using the nodes info API.
Ergonomic defaults
So, with an infrastructure in place like that, can we do more fancy things than just parsing JVM options? Of course we can! Ideas anyone?
One of the advantages is to supply some useful standard JVM options, when starting Elasticsearch. There is a SystemJvmOptions class, that lists a couple of interesting options like setting the default encoding to UTF-8 or configuring the DNS TTL caching – which is important as Elasticsearch always enables the Java Security Manager.
Also, we can enable some options only, when a certain JDK version is in use. This enables dereferenced null pointer exceptions in Java 14 and above
private static String maybeShowCodeDetailsInExceptionMessages() {
if (JavaVersion.majorVersion(JavaVersion.CURRENT) >= 14) {
return "-XX:+ShowCodeDetailsInExceptionMessages";
} else {
return "";
}
}
Enter fullscreen mode Exit fullscreen mode
But this infrastructure can go even further, and become smarter over time. How about providing different JVM options depending on configuration settings like the heap?
This is exactly what has been worked on in a recent addition to Elasticsearch.
If a small heap is configured in combination with the G1 garbage collectors, some additional options are configured.
final boolean tuneG1GCForSmallHeap = tuneG1GCForSmallHeap(heapSize);
final boolean tuneG1GCHeapRegion =
tuneG1GCHeapRegion(finalJvmOptions, tuneG1GCForSmallHeap);
final boolean tuneG1GCInitiatingHeapOccupancyPercent =
tuneG1GCInitiatingHeapOccupancyPercent(finalJvmOptions);
final int tuneG1GCReservePercent =
tuneG1GCReservePercent(finalJvmOptions, tuneG1GCForSmallHeap);
Enter fullscreen mode Exit fullscreen mode
So, what happens here and why? If less than 8GB of heap are configured – which is more often than you think, as many users are also running smaller instances of Elasticsearch and there is an ongoing effort of using less heap and offload this to other parts of the system – three additional options are set. Of course everything can be manually overwritten.
First, the size of a G1 heap region is set to 4 MB, using XX:G1HeapRegionSize=4m
.
Second, the heap occupancy threshold, which triggers a marking cycle is set to XX:InitiatingHeapOccupancyPercent=30
, somewhat earlier than the default of 45
.
Third, the G1ReservePercent
options is set to 15 instead of 25 percent in the small heap case, in both cases deviating from the default of 10 percent.
It took months of benchmarking and testing to come to these numbers, if you are interested in the discussion, there is a lengthy GitHub issue. In case you are wondering how those kind of issues surface during testing Elasticsearch. Elasticsearch is using nightly benchmarks on bare metal hardware to easily spot and investigate regressions. You can check out those benchmarks here. The tool used for this is called rally, a macrobenchmarking framework for Elasticsearch. One of the great features of rally is, that you can use your own data and queries to test and benchmark, so having your own nightly benchmarks is possible.
So, why have those options been picked, you may ask yourself. Thanks to the benchmark infrastructure testing became easy, but not the reason for testing. After switching from CMS to G1 a few benchmark results got worse and required investigation. One of the approaches was also to test the ParallelGC for really small heaps instead of G1, but this was abandoned.
We even managed to find a bug in our G1 configuration options. In order to understand the issue let’s explain some Elasticsearch functionality. Elasticsearch utilizes circuit breakers to prevent overloading of a single node by accounting memory, for example when creating an aggregation response or receiving requests over the network. Once a certain limit is reached, Elasticsearch’s circuit breaker will trip and return an exception. The idea here is to prevent the famous OutOfMemoryError
, and tell the user that the request cannot be processed and also indicate if that is temporal or permanent issue. Since Elasticsearch 7.0 a real memory circuit breaker has been added, that takes the total heap into account instead of only the currently accounted data, which is more exact.
However this circuit breaker did not work in combination with the shipped G1 settings, as the configured settings assumed a heap bigger than 100% of what was configured and so the circuit breaker tripped before the garbage collector started its job of garbage collection per the supplied configuration. Also, the memory circuit breaker was enhanced with some G1 specific code to nudge G1 to do a young GC at some point.
Summary
As you can see, properly handling and parsing as well as choosing good default JDK options like switching from one garbage collector to another involves quite a bit of steps, infrastructure, testing, running in production & verification – and the same probably applies to your own applications as well.
The same applies to all the new generation garbage collectors like ZGC and
shenandoah. Those will require extensive testing, proper CI integration and maybe a even a few changes in the code. Albeit those GCs promise huge improvements, make sure you are testing properly with your own workloads before jumping on those.
Also, never forget, that a tiny portion of your users will want to set their own options and cater for that properly, including upgrades.
暂无评论内容