lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: ParallelReader
Date Mon, 10 Oct 2005 20:22:34 GMT
The use case is when there is some data that changes frequently, but  
some data is static, _and_ that the volatile index can be rebuilt in  
the same order that the static one was built.  The indexes must be  
"parallel" in terms of the document index order.  If you delete, then  
you should delete from both indexes, and likewise with add.

     Erik

On Oct 10, 2005, at 4:14 PM, John Smith wrote:

> Sorry to bug people on this again and again.
> I might be missing something or confused totally, But what is the  
> use case for a ParallelReader if the use case is not addressing the  
> situation where we have a index changing frequently( meaning  
> deletes and reindex)  and index not changing , but has same number  
> of docs. Wouldn't people want to stick with just one index in any  
> case?
>
> Any comments or response appreciated.
>
> JZ
>
> John Smith <john_smith9910@yahoo.com> wrote:
> A while ago I had asked a question on what would be a good solution  
> for a situation mentioned below and I was pointed in the direction  
> of Parallel Reader. Looks like that will not work.
> Thank you for alerting me on this.
>
> So other than delete and reindex the document to a single index,  
> there is no way of addressing the situation.
>
> JZ
>
>
> Eyal wrote:Run a search on "Lucene ParallelReader" in google -  
> You'll find something
> Doug Cutting wrote that I believe is what you're looking for.
>
> Eyal
>
>
>
>> -----Original Message-----
>> From: John Smith [mailto:john_smith9910@yahoo.com]
>> Sent: Thursday, August 11, 2005 21:12 PM
>> To: java-user@lucene.apache.org
>> Subject: Updating existing documents in index: Solutions
>>
>>
>> Hi all
>>
>>
>>
>> This is a slightly long email. Pardon me.
>>
>>
>>
>> As Lucene does not allow for updating an existing document in
>> the index, the only option is to delete and reindex the
>> message.When you have too many updates, this gets a little
>> cumbersome. In our case, as such the actual content of the
>> document being indexed does
>>
>> not change, but the fields around the content, like say
>> "LastReadby" or something like Folder associated with it etc
>> change. These are all fields that have been indexed as a part
>> of the original document in the index.
>>
>>
>>
>> I have been contemplating putting these "commonly changing
>> fields" into one index and allow for delete and reindex on
>> this index alone and keep the static data in another index.
>> DocumentID will be a stored field and will be stored in both
>> the static and dynamic index, as a way of identifying the document.
>>
>>
>>
>> Static index: Contains content of document indexed and
>> documentID stored.
>>
>> Dynamic index: Contains all fields about the document which
>> change frequently indexed and documentID stored.
>>
>>
>>
>>
>>
>> Questions
>>
>>
>>
>> 1. First of all, is there a better solution to this
>> frequently changing fields having to be reindexed ?
>>
>>
>>
>> 2. Let's say I go with the 2 index approach,
>>
>>
>>
>> Example query: Content: "Hello world" AND Folder:Folder1 AND
>> LastReadBy: jane. If we execute these queries on our static
>> and dynamic indexes, they will obviously fail to get hits.
>>
>>
>>
>>
>>
>> Let's say I have a way of splitting my queries such that
>> all content queries go to static (content) index only and
>> queries on other fields go to the dynamic index, basically
>> allow for queries to come in such a way that it is always a
>> AND between the dynamic index result set and static index
>> result set. So on the results set, I would have to retrieve
>> the document ID and make sure we have the same documentID in
>> both the result sets, in order for it to be a match.
>>
>> In cases where the result sets are really huge from
>> both the queries, then even to get the number of hits, I will
>> have to retrieve each and every document from the results, in
>> order to get the documentID for comparison. Queries can get
>> really slow.
>>
>>
>>
>> Has anyone faced similar problems, If so what was your solution?
>>
>> Any comments/thoughts will be appreciated.
>>
>>
>>
>> Thank you
>>
>> JS
>>
>>
>>
>>
>
>
> Daniel Naber wrote:
> On Montag 10 Oktober 2005 20:24, John Smith wrote:
>
>
>> My understanding is ParallelReader works for situations where you  
>> have a
>> static index and a dynamic index.
>>
>
> That's no correct. Quoting the documentation:
>
> It is up to you to make sure all indexes
> are created and modified the same way. For example, if you add
> documents to one index, you need to add the same documents in the
> same order to the other indexes. Failure to do so will result in
> undefined behavior.
>
> Regards
> Daniel
>
> -- 
> http://www.danielnaber.de
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------
> Yahoo! Music Unlimited - Access over 1 million songs. Try it free.
>
> ---------------------------------
>  Yahoo! Music Unlimited - Access over 1 million songs. Try it free.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message