Java deserialization speed
I am writing a Java application that, among other things, has to read a text dictionary file (each line is one word) and store it in a HashSet. Every time I start the application, the same file is read again (6MB Unicode file).
It seemed expensive, so I decided to serialize the resulting HashSet and store it in a binary. I expected my application to run faster after this. Instead, it got slower: from ~ 2.5 seconds to ~ 5 seconds after serialization.
Is this the expected result? I thought that in cases like this, serialization should increase speed.
a source to share
It is not a matter of one serialization mechanism, it is a matter of the data structure you are serializing.
You have a very efficient, natural representation of these words: a simple list in a text file. It's quick to read.
You have created a data structure to store them that is different: a hash table. More memory is required to represent the hash table. However, the advantage is that word searches are very fast compared to a simple list.
But this trade-off means that serialization will also slow down, as naive serialization of the hash table will serialize more data and be larger and therefore slower.
I think you should stick to simple text file reading.
a source to share
@Correct answer. Java serialization / deserialization has significant overhead. If you need to speed up the loading of the dictionary (or ...), consider the following approaches:
- Using classes
java.nio.*
to read the file can speed up the process. - If your application does not necessarily require the dictionary to be loaded immediately at startup, consider using a separate thread to enter the dictionary asynchronously. Loading the dictionary is not faster, but (for example) the application's GUI launches anyway.
a source to share