lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <>
Subject Re: [RFE] IndexWriter.updateDocument()
Date Tue, 14 Dec 2004 19:01:43 GMT
petite_abeille wrote:

> Well, the subject says it all...
> If there is one thing which is overly cumbersome in Lucene, it's 
> updating documents, therefore this Request For Enhancement:
> Please consider enhancing the IndexWriter API to include an 
> updateDocument(...) method to take care of all the gory details involved 
> in such operation.

I agree, this is always a hassle to do right due to having to use 
IndexWriter and IndexReader and properly opening/closing them.

I have a prelim version of a "batched index writer" that I use. The code 
is kinda messy, but for discussion here's what it does:

Briefly the methods are:

// [1]
// the ctr has parameters:
//    'batch size # docs' e.g. it will flush pending updates every 100 docs
//    'batch freq' e.g. auto flush every 60 sec

// [2]
// queue a document to be added to the index
// 'key' is the primary key name e.g. "url"
// 'val' is the primary key val e.g. ""
// 'doc' is the doc to be added
update( String key, String val, Document doc)

// [3]
//  queue a document for removal
// 'key' and 'val' are the params, as from [2]
remove( String key, String val)

// [4]
// periodic flush, called automatically or on demand, 2 stages:
//         1. call IndexReader.delete() on all pending (key,val) pairs
//         2. close IndexReader
//         3. call IndexWriter.add() on all pending documents
//         4. optionally call optimze()
//         5. close IndexWriter


So in normal usage you just keep calling update() and it peridically 
flushes the pending updates to the index. By its nature this uses memory 
however it's tunable as to how many documents it'll queue in memory.

Does the algorithm above, esp flush(), sound correct? It seems to work 
right for me and I can post this if people want to see it...

- Dave

> Thanks in advance.
> Cheers,
> PA.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message