lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From karl wettin <karl.wet...@gmail.com>
Subject optimization behaviour
Date Thu, 10 May 2007 18:21:05 GMT
I really want to use document numbers as a secondary key in my object  
storage. If I got it all right, the main problem is deleted documents  
and optimization. Are there any other issues?

All my tests tells me optimization does this:

Legend:
action
docNum	doc.toString()

> for (int i=0; i<4; i++) indexWriter.add(documentFactory(i);
> 0	Document<stored/uncompressed,indexed<f:0>>
> 1	Document<stored/uncompressed,indexed<f:1>>
> 2	Document<stored/uncompressed,indexed<f:2>>
> 3	Document<stored/uncompressed,indexed<f:3>>
>
> indexReader.deleteDocument(1);
> 0	Document<stored/uncompressed,indexed<f:0>>
> 1	DELETED
> 2	Document<stored/uncompressed,indexed<f:2>>
> 3	Document<stored/uncompressed,indexed<f:3>>
>
> indexWriter.add(documentFactory(4);
> 0	Document<stored/uncompressed,indexed<f:0>>
> 1	DELETED
> 2	Document<stored/uncompressed,indexed<f:2>>
> 3	Document<stored/uncompressed,indexed<f:3>>
> 4	Document<stored/uncompressed,indexed<f:4>>
>
> indexWriter.optimize();
> 0	Document<stored/uncompressed,indexed<f:0>>
> 1	Document<stored/uncompressed,indexed<f:2>>
> 2	Document<stored/uncompressed,indexed<f:3>>
> 3	Document<stored/uncompressed,indexed<f:4>>

Given this is true at all times, would it not be fairly easy to  
inspect the index prior to optimization in order to find out how  
document numbers will change during optimization?

It might end up beeing really expensive to update virtually all  
references to documents in the object storage, and the current thread  
on update/replace document on the dev-list mde me look in to the  
problem.

I don't know too much about the file format and SegementMerger (as  
far as I know, this is the class that handle optimization), but what  
is it that makes it so hard to insert a document at the position of a  
deleted one? Tracing the code a bit gave me the feeling it should be  
possible to make exceptions for deleted documents. Something like an  
alternative merge policy (or so) for segments containing the document  
to be assigned specific document numbers. Or? If it's not a waste of  
time, I'd be happy to give it a try.


-- 
karl



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message