lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karel Tejnora <>
Subject Re: Storing whole documents in the index
Date Mon, 19 Mar 2007 10:10:31 GMT
To store document (specially large ones) out of the index is better than
in index. Every merge of segments or optimize will copy those data.
Stored in index is possible, but it requires 1-4x more space, depends on
read/write speed of the fs, merge and optimize takes longer time.


On Sun, 2007-03-18 at 23:11 +0330, jafarim wrote:
> Hello
> It's a whil that I am using lucene and as most of people seemingly do, I
> used to save only some important fields of a docuemnt in the index. But
> recently I thought why not store the whole document bytes as an untokenized
> field in the index in order to ease the retrieval process? For example
> serialize the pdf file into a byte[] and then save the bytes as a field in
> the index.(some gzip and base64 encodings may be needed as glue logic). Then
> I can delete the original file from the system. Is there any reason against
> this idea? Can lucene bear this large volume of input streamed data?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message