lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Storing whole document in the index
Date Sat, 17 Mar 2007 15:22:27 GMT
Please ask these type of questions on the user mailing list, you will  
get much better responses.  The dev list is for developers of Lucene.

To answer your question, yes you can do this.  You may find the  
FieldSelector API additions and Lazy Field Loading to be helpful  
performance wise, as well.


On Mar 17, 2007, at 8:36 AM, jafarim wrote:

> Hello
> It's a whil that I am using lucene and as most of people seemingly  
> do, I
> used to save only some important fields of a docuemnt in the index.  
> But
> recently I thought why not store the whole document bytes as an  
> untokenized
> field in the index in order to ease the retrieval process? For example
> serialize the pdf file into a byte[] and then save the bytes as a  
> field in
> the index.(some gzip and base64 encodings may be needed as glue  
> logic). Then
> I can delete the original file from the system. Is there any reason  
> against
> this idea? Can lucene bear this large volume of input streamed data?

Grant Ingersoll
Center for Natural Language Processing

Read the Lucene Java FAQ at 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message