lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Smith <john_smith9...@yahoo.com>
Subject RE: Updating existing documents in index: Solutions
Date Thu, 11 Aug 2005 20:43:44 GMT
Thank you . That does look like what I want
 
JS

Eyal <eyal.junk@gmail.com> wrote:
Run a search on "Lucene ParallelReader" in google - You'll find something
Doug Cutting wrote that I believe is what you're looking for.

Eyal 


> -----Original Message-----
> From: John Smith [mailto:john_smith9910@yahoo.com] 
> Sent: Thursday, August 11, 2005 21:12 PM
> To: java-user@lucene.apache.org
> Subject: Updating existing documents in index: Solutions 
> 
> 
> Hi all
> 
> 
> 
> This is a slightly long email. Pardon me.
> 
> 
> 
> As Lucene does not allow for updating an existing document in 
> the index, the only option is to delete and reindex the 
> message.When you have too many updates, this gets a little 
> cumbersome. In our case, as such the actual content of the 
> document being indexed does 
> 
> not change, but the fields around the content, like say 
> "LastReadby" or something like Folder associated with it etc 
> change. These are all fields that have been indexed as a part 
> of the original document in the index.
> 
> 
> 
> I have been contemplating putting these "commonly changing 
> fields" into one index and allow for delete and reindex on 
> this index alone and keep the static data in another index. 
> DocumentID will be a stored field and will be stored in both 
> the static and dynamic index, as a way of identifying the document.
> 
> 
> 
> Static index: Contains content of document indexed and 
> documentID stored.
> 
> Dynamic index: Contains all fields about the document which 
> change frequently indexed and documentID stored.
> 
> 
> 
> 
> 
> Questions
> 
> 
> 
> 1. First of all, is there a better solution to this 
> frequently changing fields having to be reindexed ?
> 
> 
> 
> 2. Let's say I go with the 2 index approach, 
> 
> 
> 
> Example query: Content: "Hello world" AND Folder:Folder1 AND 
> LastReadBy: jane. If we execute these queries on our static 
> and dynamic indexes, they will obviously fail to get hits.
> 
> 
> 
> 
> 
> Let's say I have a way of splitting my queries such that 
> all content queries go to static (content) index only and 
> queries on other fields go to the dynamic index, basically 
> allow for queries to come in such a way that it is always a 
> AND between the dynamic index result set and static index 
> result set. So on the results set, I would have to retrieve 
> the document ID and make sure we have the same documentID in 
> both the result sets, in order for it to be a match.
> 
> In cases where the result sets are really huge from 
> both the queries, then even to get the number of hits, I will 
> have to retrieve each and every document from the results, in 
> order to get the documentID for comparison. Queries can get 
> really slow.
> 
> 
> 
> Has anyone faced similar problems, If so what was your solution?
> 
> Any comments/thoughts will be appreciated.
> 
> 
> 
> Thank you
> 
> JS
> 
> 
> 
> 
> ---------------------------------
> Start your day with Yahoo! - make it your home page 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message