Overriding default Kryo serialization for linked-data objects

We found an interesting issue when using Kryo the other day.

One of our clients had a large set of data and they were sending each item to another process, a bit like so:

// some large set full of elements we want to send off
set.forEach {
    send(it)  // <-- (de)Serialization happens here
}

This was all well and good; until the set grew to a couple of thousand items. Then they started encountering stack overflow errors (!).

exception="java.lang.StackOverflowError
at com.esotericsoftware.kryo.io.Output.flush(Output.java:185)
at com.esotericsoftware.kryo.io.OutputChunked.flush(OutputChunked.java:59)
at com.esotericsoftware.kryo.io.Output.require(Output.java:164)
at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:251)
at com.esotericsoftware.kryo.io.Output.write(Output.java:219)
at com.esotericsoftware.kryo.io.Output.flush(Output.java:185)
at com.esotericsoftware.kryo.io.OutputChunked.flush(OutputChunked.java:59)
at com.esotericsoftware.kryo.io.Output.require(Output.java:164)
at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:251)
at com.esotericsoftware.kryo.io.Output.write(Output.java:219)
at com.esotericsoftware.kryo.io.Output.flush(Output.java:185)
at com.esotericsoftware.kryo.io.OutputChunked.flush(OutputChunked.java:59)

And when they increased the stack size they ran out of memory instead (!!).

exception="java.lang.OutOfMemoryError
at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at com.esotericsoftware.kryo.io.Output.flush(Output.java:185)
at com.esotericsoftware.kryo.io.Output.close(Output.java:196)

We had a look at what was happening during serialization and the trick was due to the fact that our client was using (at our suggestion) a LinkedHashSet. Since the LinkedHashSet is effectively a doubly-linked list under the hood (yes, it’s more than that but lets keep it simple) when Kryo went to serialize the current entry in the set, it would also serialize the previous and next elements. Then for those elements it would carry on, and on, effectively serializing every element in the set each time!

This, interestingly, was incidental to the serialization of the set itself, which works perfectly fine. But exposing the sets iterator and corresponding entry lead to this problem.

In order to solve it we decided we’d have to override the default implementation of the entry serializer and skip the previous/next references.

object LinkedHashMapEntrySerializer : Serializer<Map.Entry<*, *>>() {
    // Create a dummy map so that we can get the LinkedHashMap$Entry from it
    // The element type of the map doesn't matter. The entry is all we want
    private val DUMMY_MAP = linkedMapOf(1L to 1)
    fun getEntry(): Any = DUMMY_MAP.entries.first()
    private val constr: Constructor<*> = getEntry()::class.java.declaredConstructors.single().apply { isAccessible = true }

    /**  * Kryo would end up serialising "this" entry, then serialise "this.after" recursively, leading to a very large stack.  * we'll skip that and just write out the key/value  */
    override fun write(kryo: Kryo, output: Output, obj: Map.Entry<*, *>) {
        val e: Map.Entry<*, *> = obj
        kryo.writeClassAndObject(output, e.key)
        kryo.writeClassAndObject(output, e.value)
    }

    override fun read(kryo: Kryo, input: Input, type: Class<Map.Entry<*, *>>): Map.Entry<*, *> {
        val key = kryo.readClassAndObject(input)
        val value = kryo.readClassAndObject(input)
        return constr.newInstance(0, key, value, null) as Map.Entry<*, *>
    }
}

A couple of things to note with this line:

object LinkedHashMapEntrySerializer : Serializer<Map.Entry<*, *>>() {

You might be thinking, “Wasn’t this post about a set?”

And that’s correct – it is. But it turns out that HashSets are implemented as a LinkedHashMap with all of the values pointing to a singleton (called PRESENT). So while handling the Map case we end up getting the Set for free.

Also note that the overridden Serializer is for types Map.Entry<*, *>. What we actually want to capture, though, is the LinkedHashMap.Entry which is private. In order to do that we had to get a bit dirty and play with reflection.

    // Create a dummy map so that we can get the LinkedHashMap$Entry from it
    // The element type of the map doesn't matter. The entry is all we want
    private val DUMMY_MAP = linkedMapOf(1L to 1)
    fun getEntry(): Any = DUMMY_MAP.entries.first()

And once more to get the constructor for the LinkedHashMap.Entry.

    private val constr: Constructor<*> = getEntry()::class.java.declaredConstructors.single().apply { isAccessible = true }

Finally we had to register our new Serializer with Kryo.

    register(LinkedHashMapEntrySerializer.getEntry()::class.java, LinkedHashMapEntrySerializer)

Et voilà! We’re finished…

Well, not quite. We also had to override the serialization for the iterator. This was simply a matter of recording the key of the element the iterator referenced on serialization and walking the map back on deserialization until we were at the correct place. I’ll leave that implementation as homework. Or, possibly, another post.

The end result is that we can now (de)serialize many thousands of objects without the worry of stackoverflow errors.

P.S. The quick and dirty workaround was to call .toList() before the .forEach. This problem doesn’t exist in non-linked lists.

原文链接:Overriding default Kryo serialization for linked-data objects

© 版权声明
THE END
喜欢就支持一下吧
点赞5 分享
评论 抢沙发

请登录后发表评论

    暂无评论内容