lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <...@teamware.com>
Subject Re: Using Lucene partly as DB and 'joining' search results.
Date Tue, 15 Apr 2008 03:59:04 GMT
Thanks all for the suggestions - there was also another thread "Lucene index on 
relational data" which had crossover here.

That's an interesting idea about using ParallelReader for the changable index. 
I had thought to just have a triplet indexed 'owner:mailId:label' in each Doc 
and have multiple Documents for the same mailId, e.g. if each recipient adds 
labels for the same mail, or if multiple labels are added by one recipient.  I 
would then have to make a join using mailId against the core.  However, if I 
want to use PR, I could have a single Document with multiple field, and using 
stored fields can 'modify' that Document.  However, what happens to the DocId 
when the delete+add occurs and how do I ensure it stays the same.

I'm on 2.3.1.  I seem to recall a discussion on this in another thread, but 
cannot find it.

Antony



Chris Hostetter wrote:
> : The archive is read only apart from bulk deletes, but one of the requirements
> : is for users to be able to label their own mail.  Given that a Lucene Document
> : cannot be updated, I have thought about having a separate Lucene index that
> : has just the 3 terms (or some combination of) userId + mailId + label.
> : 
> : That of course would mean joining searches from the main mail data index and
> : the label index.
> 
> tangential to the existing follwups about ways to use Filters efficiently 
> to get some of the behavior, take a look at ParallelReader ... your use 
> case sounds like it might be perfect for it: one really large main dataset 
> that changes fairly infrequently, and what changes do occur are mainly 
> about adding new records; plus a small "parallel" set of fields about 
> each record in the main set which do change fairly frequently.
> 
> you build up an index for the main data, and then you periodicly build up 
> a second index with the docs in the exact same order as the main index.
> 
> additions to the main index do't need to block on rebuilding the secondary 
> index.  deletes do (since you need to delete from both indexes in parallel 
> to keep the ids in sync) ... but that's ok since you said you only need 
> occasional bulk deletes (you could process them as an initial step of your 
> recuring rebuild of the smaller index).
> 
> 
> 
> -Hoss
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message