lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Almost parallel indexes
Date Sat, 29 Sep 2007 01:07:49 GMT

: I can't really use ParallelReader to keep the indexes the same; it 
: requires me to add documents to both indexes which means I have to 
: retokenize the large fields anyway. I would want to do a "join" on an 
: external id, and as far as I can tell, Lucene doesn't support that.

correction: it requires that the documents be added/deleted in the same 
order.  you can have a static index containing your large tokenized fields 
which you don't change and just rebuild your second index containing hte 
small fields from scratch -- as long as you add those documents in the 
same order that they appear in the first index.

if you search for past discussions of ParallelReader someone (i forget 
who) uses them a lot, and goes into this in greater detail)

: Alternatively, what I'd like is a way to either store a pre-tokenized 
: version of the large fields, or to be able to add fields to a document 
: that come from an existing document in the index.

i believe that's semi-possible with the trunk code.  you can construct a 
Field instance using a TokenStream, so if you store a 
representation of your TokenStream (instead of the orriginal raw text) you 
can reindex it without doing a full Analysis phase.

(probably only worth while if your Analyzer does a lot of complex stuff 
... if it just splits on whitespace and lowercases you probably won't get 
much faster then that reading whatever serialization techniquee you use 
for hte TokenStream)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message