lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eyal <eyal.j...@gmail.com>
Subject RE: Updating existing documents in index: Solutions
Date Thu, 11 Aug 2005 18:44:39 GMT
Run a search on "Lucene ParallelReader" in google - You'll find something
Doug Cutting wrote that I believe is what you're looking for.

Eyal 
 

> -----Original Message-----
> From: John Smith [mailto:john_smith9910@yahoo.com] 
> Sent: Thursday, August 11, 2005 21:12 PM
> To: java-user@lucene.apache.org
> Subject: Updating existing documents in index: Solutions 
> 
> 
> Hi all
> 
>  
> 
> This is a slightly long email. Pardon me.
> 
>  
> 
> As Lucene does not allow for updating an existing document in 
> the index, the only option is to delete and reindex the 
> message.When you have too many updates, this gets a little 
> cumbersome. In our case, as such the actual content of the 
> document being indexed does 
> 
> not change, but the fields around the content, like say 
> "LastReadby" or something like Folder associated with it etc 
> change. These are all fields that have been indexed as a part 
> of the original  document in the index.
> 
>  
> 
> I have been contemplating putting these "commonly changing 
> fields" into one  index and allow for delete and reindex on 
> this  index alone and keep the static data in another index. 
> DocumentID will be a stored field and will be stored in both 
> the static and dynamic index, as a way of identifying the document.
> 
>  
> 
> Static index: Contains content of document indexed and 
> documentID stored.
> 
> Dynamic index: Contains all fields about the document which 
> change frequently indexed  and documentID stored.
> 
>  
> 
>  
> 
> Questions
> 
>  
> 
> 1. First of all, is there a better solution to this 
> frequently changing fields having to be reindexed ?
> 
>  
> 
> 2. Let's say I go  with the 2 index approach, 
> 
>  
> 
> Example query:  Content: "Hello world" AND Folder:Folder1 AND 
> LastReadBy: jane. If we execute these queries on our static 
> and dynamic indexes, they will obviously fail to get hits.
> 
> 
> 
>  
> 
>      Let's say I have a way of splitting my queries such that 
>  all content queries go to static (content) index only and 
> queries on other fields go to the dynamic index, basically 
> allow for queries to come in such a way that it is always a 
> AND between the dynamic index result set and static index 
> result set. So on the results set, I would have to retrieve 
> the document ID and make sure we have the same documentID in 
> both the result sets, in order for it to be a match.
> 
>       In  cases where the result sets are really huge from  
> both the queries, then even to get the number of hits, I will 
> have to retrieve each and every document from the results, in 
> order to get the documentID for comparison. Queries can get 
> really slow.
> 
>  
> 
> Has anyone faced similar problems, If so what was your solution?
> 
> Any comments/thoughts will be appreciated.
> 
>  
> 
> Thank you
> 
> JS
> 
> 
> 
> 		
> ---------------------------------
>  Start your day with Yahoo! - make it your home page 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message