Sometimes, we need to process big json file or stream but we don’t need to store all contents in memory.
For example, when we count the number of items in a big array, we just need to load 1 item, increment the count, throw it away and repeat until the whole array is counted.
I found big json file from this git repository https://github.com/zemirco/sf-city-lots-json (~190MB).
The file looks this and I want to count the number of features.
{ "type": "FeatureCollection", "features": [ /* lots of feature objects */ ] }
Enter fullscreen mode Exit fullscreen mode
This is how feature object looks like if you are interested.
{ "type": "Feature", "properties": { "MAPBLKLOT": "0001001", "BLKLOT": "0001001", "BLOCK_NUM": "0001", "LOT_NUM": "001", "FROM_ST": "0", "TO_ST": "0", "STREET": "UNKNOWN", "ST_TYPE": null, "ODD_EVEN": "E" }, "geometry": { "type": "Polygon", "coordinates": [ [ [ -122.422003528252475, 37.808480096967251, 0.0 ], [ -122.422076013325281, 37.808835019815085, 0.0 ], [ -122.421102174348633, 37.808803534992904, 0.0 ], [ -122.421062569067274, 37.808601056818148, 0.0 ], [ -122.422003528252475, 37.808480096967251, 0.0 ] ] ] } }
Enter fullscreen mode Exit fullscreen mode
Let’s say my application can only allocate 50MB and I try to load the whole file into memory.
Path filePath = Path.of("/src/sf-city-lots-json/citylots.json");
String content = Files.readString(filePath);
Enter fullscreen mode Exit fullscreen mode
Obviously, we can’t load it to memory.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
Enter fullscreen mode Exit fullscreen mode
Gson provides JsonReader
which allows reading data stream.
public int getFeatureCount(Path filePath) throws Exception {
int count = 0;
try (JsonReader reader = new JsonReader(Files.newBufferedReader(filePath))) {
reader.beginObject();
while (reader.hasNext()) {
String name = reader.nextName();
if ("features".equals(name)) {
count = getFeatureCountFromArray(reader);
} else {
reader.skipValue();
}
}
reader.endObject();
}
return count;
}
private int getFeatureCountFromArray(JsonReader reader) throws Exception {
int count = 0;
reader.beginArray();
while (reader.hasNext()) {
count++;
reader.beginObject();
while (reader.hasNext()) {
reader.skipValue();
}
reader.endObject();
}
reader.endArray();
return count;
}
Enter fullscreen mode Exit fullscreen mode
Greater power comes with greater responsibility. Unlike Gson.fromJson
, we need to call begin*
, end*
and skipValue
in the right timing (according to the structure of the json object) to let it process the data correctly, otherwise it will throw exception. So it should be used only when you have restriction on memory footprint or performance.
暂无评论内容