lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Quail <>
Subject RFC/Proposal: *adding* further fields to an previously-indexed Document
Date Wed, 05 May 2004 06:26:45 GMT
Okay, hear me out for a sec:

This is *not* another "How can I update an existing Document in a Lucene 
Index" question. I fully understand why "update" is not available, and 
why update == delete + re-insert.

My request is not for a "update document" feature, but a "add to 
document" feature.

What I would ideally be looking for is a method on IndexWriter like this:

public void addToExistingDocument(int docNum, Document doc)
   throws IOException

Consider this example usage:
Document doc1 = new Document();
doc1.add(Field.Keyword("f1", "john"));
Document doc2 = new Document();
doc2.add(Field.Keyword("f2", "doe"));

IndexWriter w = ...
int docid = ... // compute docid of previous "add"
w.addToExistingDocument(docid, doc2);

Doing a search that returned "docid" would result in a Document that 
contained two field/value pairs: "f1"/"john" and "f2"/"doe".

Firstly, I suspect that it is possible to implement 
addToExisitingDocument(): can anyone confirm or deny this? If it is 
possible, I'm happy to put my hand up for having a go at implementing it.

Why do I want this? I have an index of documents that contain a "primary 
key" and a whole bunch of other fields. However, I'm unable to collect 
all the fields in one pass. Infact, some of the field values actually 
depend on the results of a search over the "partial" documents! (This 
may sound wierd but I can explain the usecase if anyone wants to hear it.)

My current solution is to have multiple indexes; each "pass" inserts 
Documents into different indexes, where corresponding documents share 
the same "primary key". This is annoying, because queries that span 
fields from different passes require me to manually "merge" the results 
from seperate queries. (I "join" two Hits objects along the shared 
primary key; this is way slow as you might imagine.)

So, is this idea implementable? Any and all flamage welcome.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message