lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Wang (Created) (JIRA)" <>
Subject [jira] [Created] (LUCENE-3516) Add serialization support for RAMDirectory
Date Thu, 13 Oct 2011 17:59:14 GMT
Add serialization support for RAMDirectory

                 Key: LUCENE-3516
             Project: Lucene - Java
          Issue Type: Improvement
          Components: core/store
    Affects Versions: 3.4
            Reporter: John Wang

We are building Lucene indexes via Hadoop, and using byte[] for of RAMDirectory as intermediate
storage. (we are using Hadoop's indexing contrib package)
Currently Java serialization is used, and seemed wasteful and not portable across languages.
Since RAMDirectory essentially is a collection of byte[], writing a serializer seemed easy.
Attached please find a utility class that does such. This includes a light performance comparison
test comparing the java serialization:

Results: (input, number of files and avg file size, output, size percentage: our ser/java
ser, time, serialization: percentage, our ser/java ser, deserialization: percentage, our ser/java

test 1:(3, 1k)
size: 66.93%
ser time: 1.89%
deser time: 4.48%

test 2: (100,100)
size: 95.16%
ser time: 4.01%
deser time: 13.36%

test 3: (3,50k)
size: 98.42%
ser time: 3.09%
deser time: 8.10%

test 4: (1,1k)
size: 41.70%
ser time: 1.85%
dser time: 3.85%

The implementation is very elementary, yet still much better than java ser. Esp on the time
(avg 50x faster)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message