lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Incremental Field Updates
Date Tue, 30 Mar 2010 10:30:16 GMT

On Mar 29, 2010, at 10:11 AM, mark harwood wrote:

> >Of course, but what about the Lucene doc id doesn't provide that?
> 
> The question being how you determine the correct doc id to use in the first place (especially
when they are know to be volatile) - the current answer is to use a stable identifier term
which your app holds in the index, AKA a primary key. 
> To support single-doc updates, app developers currently have to :
> a) allocate keys uniquely
> b) ensure they do not store >1 document with the same key.
> 
> My suggestion was, being fundamental requirements to supporting updates Lucene could,
as a convenience, provide some support for this in it's API - in the same way a database typically
does.

I don't think Lucene needs a primary key.  I don't see why this number can't be determined
in the usual ways.

> 
> Earwin has perhaps extended your (and my) original thinking to incorporate set-based
updates (a single set of values applied to many documents which match a query).
> His proposal (correct me if I'm wrong, Earwin) is that single and set-based changes could
both be supported by a single IndexWriter.updateDocuments(query, changedFields) type method.
> The benefit of this scheme is that we are providing a simple method, re-using established
concepts (Queries for document selection) but this does not change the fact that many users
will still need to use primary keys for single-doc updates and they have to assume responsibility
for a) and b) above.

Hmmm, this sounds like the Parallel Incr. Indexing Busch has put up in a patch.

> 
> On reflection, I guess these responsibilities are not too tough.
> a) is catered for by the fact that Lucene is not typically the master data store (yet!)
and filesystem/webserver/database datasources where document content is sourced  usually have
the responsibility to allocate some form of unique identifier in the form of URLs, database
keys or filenames which can be used. Also, b) is not too hard to handle in app code if you
always use the IndexWriter.updateDocument(term,doc) method for inserts.
> 
> 
> Cheers,
> Mark
> 
> From: Grant Ingersoll <gsingers@apache.org>
> To: java-dev@lucene.apache.org
> Sent: Mon, 29 March, 2010 13:11:56
> Subject: Re: Incremental Field Updates
> 
> 
> On Mar 29, 2010, at 2:26 AM, Mark Harwood wrote:
> 
>> 
>>> 
>>>> Of course introducing the idea of updates also introduces the notion of a
primary key and there's probably an entirely separate discussion to be had around user-supplied
vs Lucene-generated keys.
>>> 
>>> Not sure I see that need.  Can you explain your reasoning a bit more?
>>>> 
>> 
>> If you want to update a document you need a way of expressing *which* document you
are updating.
> 
> Of course, but what about the Lucene doc id doesn't provide that?
> 
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search


Mime
View raw message