lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Smith <>
Subject Updating existing documents in index: Solutions
Date Thu, 11 Aug 2005 18:12:27 GMT

Hi all


This is a slightly long email. Pardon me.


As Lucene does not allow for updating an existing document in the index, the only option is
to delete and reindex the message.When you have too many updates, this gets a little cumbersome.
In our case, as such the actual content of the document being indexed does 

not change, but the fields around the content, like say "LastReadby" or something like Folder
associated with it etc change. These are all fields that have been indexed as a part of the
original  document in the index.


I have been contemplating putting these "commonly changing fields" into one  index and allow
for delete and reindex on this  index alone and keep the static data in another index. DocumentID
will be a stored field and will be stored in both the static and dynamic index, as a way of
identifying the document.


Static index: Contains content of document indexed and documentID stored.

Dynamic index: Contains all fields about the document which change frequently indexed  and
documentID stored.





1. First of all, is there a better solution to this frequently changing fields having to be
reindexed ?


2. Let's say I go  with the 2 index approach, 


Example query:  Content: "Hello world" AND Folder:Folder1 AND LastReadBy: jane. If we execute
these queries on our static and dynamic indexes, they will obviously fail to get hits.


     Let's say I have a way of splitting my queries such that  all content queries go to static
(content) index only and queries on other fields go to the dynamic index, basically allow
for queries to come in such a way that it is always a AND between the dynamic index result
set and static index result set. So on the results set, I would have to retrieve the document
ID and make sure we have the same documentID in both the result sets, in order for it to be
a match.

      In  cases where the result sets are really huge from  both the queries, then even to
get the number of hits, I will have to retrieve each and every document from the results,
in order to get the documentID for comparison. Queries can get really slow.


Has anyone faced similar problems, If so what was your solution?

Any comments/thoughts will be appreciated.


Thank you


 Start your day with Yahoo! - make it your home page 
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message