lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Glen Newton <>
Subject Field with reader limitation arbitrary
Date Mon, 14 Sep 2009 20:03:11 GMT

In 2.4.1, Field has 2 constructors that involve a Reader:
public Field(String name,
                  Reader reader)
public Field(String name,
                  Reader reader,
                  Field.TermVector termVector),,%20org.apache.lucene.document.Field.TermVector)

The Reader references a text file on the filesystem. These
constructors do the following:
"Create a tokenized and indexed field that is not stored, optionally
with storing term vectors. The Reader is read only when the Document
is added to the index, i.e. you may not close the Reader until
IndexWriter.addDocument(Document)  has been called."

Someone has made the decision that we will not be interested in
storing files read using a Reader (at least not with these
This is rather arbitrary.
As someone who has massively parralelized my indexing AND sometimes
might want to also store files in the index,  having a queue of 1000
Documents with 1000 Readers to files is vastly preferable to having
1000 documents with 1000 (perhaps very large) Strings with all the
contents of the files. While this is not the best for all cases (#open
file handles, etc), this is a use case which would benefit from being
able to do this (i.e. reduced memory footprint, especially for large
files or large queues).

Suggestion: replace or add a constructor with:
public Field(String name,
             Reader reader,
             Field.Store store,
             Field.Index index,
             Field.TermVector termVector)


Glen Newton


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message