If you deal with a large datasets stored in a Java collection, lets say java.util.List, sooner or later you would encounter with a situation that allocated memory is not enough to hold all your data. It may be even small number of elements in a collection, but many of them take much of memory. It happens that a process must accumulate large dataset, and there is no way to process data in chunks to fit allocated memory. There are multiple ways to resolve an issue, from storing data in a file to temporarily store data in Redis.
We encountered with similar issue in a system which is developed for many years, and it is not so simple just to change way how data is accumulated in memory and utilized. A Java collection is full of large objects and there are multiple threads in the same app for that specific activity. Of course, there are few instances of a service which processes data, sometimes a dataset is small, sometimes a dataset is large and that is really frustrating, it is not simple to choose a scaling model. In such cases I usually say – simpler, faster, and more reliable to rewrite rather than “fine-tune”, but this time we do fine-tuning 🙂 Let’s leave behind a curtain why a system operates with data which does not fit allocated memory 🤠 it happens. Sometimes we can change an implementation in more robust way, but sometimes we need to find a compromise with existing solution.
We implemented a small library: a Java collection data is stored in a file system instead of RAM and a convenient, Java Stream like interface for operating on data is provided.
Okay, the library – FStream
. A central class FCollection
which is similar to java.util.List
but with reduced number of methods. With FCollection
you can add new items, sort them with a comparator, iterate elements, and create a instance of FStream
which is also reduced version of Java Stream
. With FStream
you can apply sequential operations on elements of a collection.
Example
See, in following code snapshot all data is stored in a file located in a temporary directory. Data is written to a file system immediately as data is added. But it it possible to operate on the items over FStream
.
FCollection<SomeClassName> collection = FCollection.create();
// add elements to a collection
collection.add(instance);
// iterate elements of a collection
Iterator<SomeClassName> i = collection.iterator();
while (i.hasNext()) {
consumer.accept(i.next());
}
// also iterates over all elements in a collection
collection.forEach(this::consumer);
// create a new collection
FCollection<AnotherClassName> collection2 = collection.stream()
.filter(o -> o.isActive() == true)
.map(this::convert)
.sort((o1, o2) -> o1.compareTo(o2))
.collect();
// destroy collections' data in a file storage
collection.close();
collection2.close();
Enter fullscreen mode Exit fullscreen mode
How it works
Create a collection
When a collection is created, for instance with a method create
, then a new file is created in a /tmp
directory or in a custom directory if specified.
FCollection<SomeClassName> collection = FCollection.create();
Enter fullscreen mode Exit fullscreen mode
Add items in a collection
Adding operation of a new item to a collection consists of an item serialization and writing to a collection’s file in a file storage. Serialization is done by default with a FJdkSerializer
, but it is possible to use a custom serializer. Customization is described below.
// add elements to a collection
collection.add(instance);
Enter fullscreen mode Exit fullscreen mode
Apply operations on a collection
An approach here is absolutely the same with Java Stream – a developer can specify operations takes on each element of a collection in a function way. As result, a new collection is created, stored in a file storage.
FCollection<AnotherClassName> collection2 = collection.stream()
.filter(o -> o.isActive() == true)
.map(this::convert)
.sort((o1, o2) -> o1.compareTo(o2))
.collect();
Enter fullscreen mode Exit fullscreen mode
Customization
So far it is possible to specify where to store temporary data of a collections, and assign a custom serializer for a collection. A serializer must implement FSerializer
interface. After that a collection can be created with a builder.
FCollection<String> c =
FCollection.builder()
.serializer(new CustomSerializer())
.storageLocation("/your/location")
.build();
Enter fullscreen mode Exit fullscreen mode
Want to try it out?
Visit project’s GitHub repository: https://github.com/alex-53-8/fstream
暂无评论内容