lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Various Ideas from ApacheCon
Date Tue, 08 May 2007 11:10:24 GMT
I agree with you characterization.   We love the speed and  
performance of Lucene, but the updating process just doesn't feel  
right in that context.

I think the common use case that comes to mind is tagging a  
document.  Every time a doc gets tagged, you have to rebuild it, or  
manage multiple indices, etc.  The ParallelReader is supposed to help  
with the scenario to some extent, but it seems difficult to be able  
to maintain doc ids in sync.

I know the problem is hard and I don't know if it is solvable.  I was  
just thinking that perhaps something like the Layers facility in  
photo editing software might be a good model to start from.  Whereby  
we could "mask" the document somehow with the updated information.  I  
haven't dug into the code to see how it would work.

I was also thinking about something like an  
"AsynchronousParallelReader" that took on construction the  
designation of the field that contains the OID and could manage where  
each document lives in each index and we could drop the doc ids in  
sync requirement of the ParallelReader at the expense of some extra  
work.  Again, a hypothetical to optimize high update environments and  
I am not sure how fast it would be.

On May 8, 2007, at 2:24 AM, Chris Hostetter wrote:

> : I am not sure I agree with that.
> i don't think i understand what part you don't agree with :)
> : Document management systems are quite common these days, and people
> : are used to "checking out" a document, making changes, and checking
> : the entire document back in.
> :
> : In many ways Lucene can be viewed as a self-contained document mngt
> : system if you store every field.
> agreed.
> : If the user is savvy enough to 'rebuild' their documents from an
> : external source, then the fields do not need to be stored (just the
> : OID field for convenience).
> it's this rebuilding that people tend to dislike about the delete/ 
> re-add
> process that's currently neccessary to "update" a document in  
> Lucene ..
> people don't wnat to have to be savvy enough to rebuild their  
> documents
> from an external source, they want to throw a bunch of docs in, do  
> some
> searches, pull a doc out, modify one field and throw it back in again.
> at least: that's how i would characterize most questions about  
> "updating"
> docs.
> if the issue was just one of supporting an updateDoc(Document) method
> where the client is expected to "rebuild" the entire doc before  
> calling the
> method, then we've already got that ... it's
> IndexWriter.updateDocument(Term,Document).
> -Hoss
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Grant Ingersoll

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message