lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nixon" <>
Subject Re: Almost parallel indexes
Date Fri, 28 Sep 2007 13:01:41 GMT

I encounter such condition of having two indexes
for one forum related application which had
Topics and its posts which were mapped in db in two different tables in 
relational database.

But i had decided to have one index

like this

topicId , topicName , timestamp , postId , postedcomment , ISTOPICFLAG

ISTOPICFLAG  is set to 1 for topic and 0 for posts duplicating the topic 
which was need for UI display

I used query AND ISTOPICFLAG  =1 for only topic and  ISTOPICFLAG  =0
for posts

The application works fine , but frequent updates to index is slower it.



----- Original Message ----- 
From: "Erick Erickson" <>
To: <>
Sent: Friday, September 28, 2007 5:43 AM
Subject: Re: Almost parallel indexes

> OK, this isn't well thought out, more the first thing that
> pops to mind...
> You're right, Lucene doesn't do joins. But would it serve
> to keep two indexes? One the slow-changing stuff
> and one the fast-changing stuff. They are related by
> some *external*  (as in "not the Lucene doc id)
> field.
> You'd have to custom roll something that searched
> across both indexes. Depending on your search
> semantics this may be hard or easy. It would be
> easy if your search were simple, something like
> (stuff in the fast-changing index) OR/AND/NOT
> (stuff in the slow-changing index). Handling
> (some in the fast) AND/OR/NOT (some in the
> slow) would be much harder......
> The idea is to collect a set of your external doc
> IDs, then use TermEnum/TermDocs in each
> index to get at the underlying document parts.
> Lots depends upon how many hits you expect. 100 is
> one thing, 1,000,000 is another.
> This, of course, adds a layer of complexity that makes
> things harder, but it might be worth a shot.
> Disclaimer: I haven't personally done something like this,
> so take it for what it's worth.
> Best
> Erick
> On 9/27/07, Tim Sturge <> wrote:
>> Hi,
>> I have an index which contains two very distinct types of fields:
>> - Some fields are large (many term documents) and change fairly slowly.
>> - Some fields are small (mostly titles, names, anchor text) and change
>> fairly rapidly.
>> Right now I keep around the large fields in raw form and when the small
>> fields change, I retokenize the large and the small fields together. The
>> problem is that this retokenization is sucking up most of my CPU time,
>> making the indexing process too slow (this index needs to track changes 
>> in
>> almost real time; I'm using one of the reopen() patches from LUCENE-743 
>> in
>> JIRA to achieve this).
>> I can't really use ParallelReader to keep the indexes the same; it
>> requires me to add documents to both indexes which means I have to
>> retokenize the large fields anyway. I would want to do a "join" on an
>> external id, and as far as I can tell, Lucene doesn't support that.
>> Alternatively, what I'd like is a way to either store a pre-tokenized
>> version of the large fields, or to be able to add fields to a document 
>> that
>> come from an existing document in the index.
>> I suspect there is more to this question than meets the eye, but I'd be
>> interested in any strategies that people have used in the past.
>> Thanks,
>> Tim
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> -- 


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message