lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges
Date Tue, 28 Jul 2009 03:48:14 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735905#action_12735905
] 

Shai Erera commented on LUCENE-1076:
------------------------------------

Can someone please help me understand what's going on here? After I applied the patch to trunk,
TestIndexWriter.testOptimizeMaxNumSegments2() fails. The failure happens only if CMS is used,
and doesn't when SMS is used. I dug deeper into the test and what happens is that the test
asks to optimize(maxNumSegments) and expects that either: (1) if the number of segments was
< maxNumSegments than the resulting number of segments is exactly as it was before and
(2) otherwise it should be exactly maxNumSegments.

First, the javadocs of optimize(maxNumSegments) say that it will result in <= maxNumSegments,
but I understand the LogMergePolicy ensures that if you ask for maxNumSegments, that's the
number of segments you'll get.

While trying to debug what's wrong w/ the change so far, I managed to reduce the test to this
code:

{code}
public void test1() throws Exception {
    MockRAMDirectory dir = new MockRAMDirectory();

    final Document doc = new Document();
    doc.add(new Field("content", "aaa", Field.Store.YES, Field.Index.ANALYZED));

    IndexWriter writer  = new IndexWriter(dir, new WhitespaceAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED);
//    writer.setMergeScheduler(new SerialMergeScheduler());
    LogDocMergePolicy ldmp = new LogDocMergePolicy();
    ldmp.setMinMergeDocs(1);
    writer.setMergePolicy(ldmp);
    writer.setMergeFactor(3);
    writer.setMaxBufferedDocs(2);

    MergeScheduler ms = writer.getMergeScheduler();
//  writer.setInfoStream(System.out);
    
    // Add enough documents to create several segments (uncomitted) and kick off
    // some threads.
    for (int i = 0; i < 20; i++) {
      writer.addDocument(doc);
    }
    writer.commit();
    
    if (ms instanceof ConcurrentMergeScheduler) {
      // Wait for all merges to complete
      ((ConcurrentMergeScheduler) writer.getMergeScheduler()).sync();
    }
    
    SegmentInfos sis = new SegmentInfos();
    sis.read(dir);
    
    System.out.println("numSegments after add + commit ==> " + sis.size());
    
    final int segCount = sis.size();
    
    int maxNumSegments = 3;
    writer.optimize(maxNumSegments);
    writer.commit();
    
    if (ms instanceof ConcurrentMergeScheduler) {
      // Wait for all merges to complete
      ((ConcurrentMergeScheduler) writer.getMergeScheduler()).sync();
    }
    
    sis = new SegmentInfos();
    sis.read(dir);
    final int optSegCount = sis.size();
    
    System.out.println("numSegments after optimize (" + maxNumSegments + ") + commit ==>
" + sis.size());
    
    if (segCount < maxNumSegments)
      Assert.assertEquals(segCount, optSegCount);
    else
      Assert.assertEquals(maxNumSegments, optSegCount);
}
{code}

This fails almost every time that I run it, so if you try it - make sure to run it a couple
of times. I then switched to trunk, but it fails almost consistently on trunk also !?!?

Can someone please have a look and tell me what's wrong (is it the test, or did I hit a true
bug in the code?)?

> Allow MergePolicy to select non-contiguous merges
> -------------------------------------------------
>
>                 Key: LUCENE-1076
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1076
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-1076.patch
>
>
> I started work on this but with LUCENE-1044 I won't make much progress
> on it for a while, so I want to checkpoint my current state/patch.
> For backwards compatibility we must leave the default MergePolicy as
> selecting contiguous merges.  This is necessary because some
> applications rely on "temporal monotonicity" of doc IDs, which means
> even though merges can re-number documents, the renumbering will
> always reflect the order in which the documents were added to the
> index.
> Still, for those apps that do not rely on this, we should offer a
> MergePolicy that is free to select the best merges regardless of
> whether they are continuguous.  This requires fixing IndexWriter to
> accept such a merge, and, fixing LogMergePolicy to optionally allow
> it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message