When parallelStream is not a big deal in Java

To understand when you should use or avoid the use of parallelStream is important to understand the concept of stateful and stateless in java streams.

Stateful

Whenever an action in a stream needs to keep a state to finish its works, the operation would be considered a stateful operation, these actions would be the invocation of operations like:

distinct()
sorted()
limit()
skip()

And many others conditions that you can put in your code and will make it stateful. For a fast way to determine if an operation is stateful I use the documentation when I’m not sure:

A stateful operation should not be used in parallel, it will kill performances!

Stateless

Stateless operations keep no state during the pipeline execution, because of this characteristic those operations is much more performative.

Nondeterministic values

Because of the stateful behavioral, it’s not possible to determine the execution result when running it in parallel, let’s see an example in code:

for (int i = 0; i < 5; i++) {
    Set<Integer> alreadySeen = new HashSet<>();
    IntStream stream = IntStream.of(3, 4, 1, 2, 1, 2, 3, 4, 4, 5);
    int sum = stream.parallel().map(
            // Here we add a stateful behavioral parameter.
            value -> alreadySeen.add(value) ? value : 0).sum();
    System.out.println(sum);
}

And here is the output:

16
15
15
19
15

This result is nondeterministic, you gonne receive a different result for each execution you do, so to make it correct we gonna change our code taking off the parallel():

for (int i = 0; i < 5; i++) {
    Set<Integer> alreadySeen = new HashSet<>();
    IntStream stream = IntStream.of(3, 4, 1, 2, 1, 2, 3, 4, 4, 5);
    int sum = stream.map(
            // Here we add a stateful behavioral parameter.
            value -> alreadySeen.add(value) ? value : 0).sum();
    System.out.println(sum);
}

Now we can see a correct result that will be expected to be equal in all new execution:

15
15
15
15
15

This is because when executing an action in parallel, it is not possible to guarantee a correct validation with a previous state.

Workarounds

There’re some workarounds that would help us with this, but these fix would undermine the benefits of parallelism.
Use of a synchronizedSet:

Set<Integer> alreadySeen = Collections.synchronizedSet(new HashSet<>());

Another one is to use distinct() that will keep the operation stateful, but is safe and deterministic:

int sum = stream.parallel().distinct().sum();

The use of parallel can speed up processing, but it can also cause problems in the application, so it’s important to know if we’re in a stateful or stateless flow to know how to handle it.

原文链接:When parallelStream is not a big deal in Java

© 版权声明
THE END
喜欢就支持一下吧
点赞9 分享
评论 抢沙发

请登录后发表评论

    暂无评论内容