lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <...@teamware.com>
Subject Re: How to not overwrite a Document if it 'already exists'?
Date Wed, 06 May 2009 01:33:01 GMT
>> Thanks for that info.  These indexes will be large, in the 10s of millions.
>>  id field is unique and is 29 bytes.  I guess that's still a lot of data to
>> trawl through to get to the term.
> 
> Have you tested how long it takes to look up docs from your id?

Not in indexes that size in a live environment as I don't have the hardware to 
make those sorts of test :( although I know in general, lookup is fast.

> Couldn't you just give the base & full docs different ids?  Then you
> can independently choose which one to update?

I considered that, but as the normal case will not need to worry about this 
scenario.

There is only ever one instance of a mail Doc, whether it is a root mail or part 
of a forward chain and a root mail can of course be part of a forward chain at 
some point, so it should be optimal to just fetch the one Document for the mail 
Id without first trying the true Id, then some pseudo Id if it isn't found.

Unfortunately, I'm having to solve this problem in my Lucene app as the tool 
that's generating this data is unable to know what has or has not been handled 
previously.

I'm implementing it using the IndexReader approach for now and will try to get 
some performance data, so thanks for your comments Mike.

Antony








---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message