lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Sturge <tstu...@metaweb.com>
Subject Almost parallel indexes
Date Thu, 27 Sep 2007 23:14:01 GMT
Hi,

I have an index which contains two very distinct types of fields:

- Some fields are large (many term documents) and change fairly slowly.
- Some fields are small (mostly titles, names, anchor text) and change fairly rapidly.

Right now I keep around the large fields in raw form and when the small fields change, I retokenize
the large and the small fields together. The problem is that this retokenization is sucking
up most of my CPU time, making the indexing process too slow (this index needs to track changes
in almost real time; I'm using one of the reopen() patches from LUCENE-743 in JIRA to achieve
this).

I can't really use ParallelReader to keep the indexes the same; it requires me to add documents
to both indexes which means I have to retokenize the large fields anyway. I would want to
do a "join" on an external id, and as far as I can tell, Lucene doesn't support that.

Alternatively, what I'd like is a way to either store a pre-tokenized version of the large
fields, or to be able to add fields to a document that come from an existing document in the
index. 

I suspect there is more to this question than meets the eye, but I'd be interested in any
strategies that people have used in the past.

Thanks,

Tim

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message