lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Danil Ε’ORIN <>
Subject Re: Adding segments to an optimized index
Date Wed, 28 Oct 2009 12:50:58 GMT
There is no such thing in lucene as "unique" doc.

They might be unique from your application point of view (have some ID
that is unique)
>From lucene's point of view it's perfectly fine to have duplicate documents.

So the "deleted" documents in combined index are coming from your second index.

Even more: if you search your combined index you'll see that there are
duplicate documents
that came from 1st index and were not deleted.

That's because lucene simply adds to combined index all documents that
aren't marked as deleted.
Remember that document is (kind of) opaque to lucene and it doesn't
have (and doesn't need)
any logic to handle such situations, these should be handled by your

On Wed, Oct 28, 2009 at 13:36, Marc Sturlese <> wrote:
> I am doing some test with optimize and adding segments and I am wondering if
> someone knows if what I am doing can give document inconsistency.
> I have 2 folders with one index each. One have a non optimized index1 with 1
> milion docs and a mergeFactor=10. The other one, index2 has the same index
> optimized with compound file. I add and delete some docuements in the no
> optimized index1. And a few segements desapear and somew are created. I now
> I copy the new created files in the optimized index2 and optimized it again.
> I get no errors doing that but... docuemenst will be the same in index1 and
> index2? I am asking because when I added some docs and delete others in
> index1 some segments desapear and index2 is suposed to still have that
> segements optimized with the others... or it doesn't work this way?
> What I try to explain is:
> index1:
> seg1,seg2,seg3,seg4,seg5
> index2: (index1 optimized with compound)
> seg8
> adding and deleteting docs to index1 will get:
> seg1,seg2,seg3,seg6 (seg4 and seg5 have desapeared and seg6 has been
> created)
> now I do in index2:
> seg8+seg6+optimize=seg9 (but seg8 is suposed to still contain seg4 and seg5)
> The question is: index1 (seg1,seg2,seg3,seg6) and index2(seg9) will contain
> the same docs??
> Thanks in advance and let me know if I wasn't clear in my explanation
> please.
> --
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message