lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject Re: Incremental Field Updates
Date Mon, 29 Mar 2010 14:11:47 GMT
>Of course, but what about the Lucene doc id doesn't provide that?

The question being how you determine the correct doc id to use in the first place (especially
when they are know to be volatile) - the current answer is to use a stable identifier term
which your app holds in the index, AKA a primary key. 
To support single-doc updates, app developers currently have to :
a) allocate keys uniquely
b) ensure they do not store >1 document with the same key.

My suggestion was, being fundamental requirements to supporting updates Lucene could, as a
convenience, provide some support for this in it's API - in the same way a database typically

Earwin has perhaps extended your (and my) original thinking to incorporate set-based updates
(a single set of values applied to many documents which match a query).
His proposal (correct me if I'm wrong, Earwin) is that single and set-based changes could
both be supported by a single IndexWriter.updateDocuments(query, changedFields) type method.
The benefit of this scheme is that we are providing a simple method, re-using established
concepts (Queries for document selection) but this does not change the fact that many users
will still need to use primary keys for single-doc updates and they have to assume responsibility
for a) and b) above.

On reflection, I guess these responsibilities are not too tough.
a) is catered for by the fact that Lucene is not typically the master data store (yet!) and
filesystem/webserver/database datasources where document content is sourced  usually have
the responsibility to allocate some form of unique identifier in the form of URLs, database
keys or filenames which can be used. Also, b) is not too hard to handle in app code if you
always use the IndexWriter.updateDocument(term,doc) method for inserts.


From: Grant Ingersoll <>
Sent: Mon, 29 March, 2010 13:11:56
Subject: Re: Incremental Field Updates

On Mar 29, 2010, at 2:26 AM, Mark Harwood wrote:

>>Of course introducing the idea of updates also introduces the notion of a primary
key and there's probably an entirely separate discussion to be had around user-supplied vs
Lucene-generated keys.
>>Not sure I see that need.  Can you explain your reasoning a bit more?
>If you want to update a document you need a way of expressing *which* document you are

Of course, but what about the Lucene doc id doesn't provide that?

View raw message