lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1012) Problems with maxMergeDocs parameter
Date Mon, 01 Oct 2007 13:26:53 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531510
] 

Yonik Seeley commented on LUCENE-1012:
--------------------------------------

> We could just fix the javadocs to match the current approach?
That sounds like the right approach.

> Problems with maxMergeDocs parameter
> ------------------------------------
>
>                 Key: LUCENE-1012
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1012
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael Busch
>            Priority: Minor
>             Fix For: 2.3
>
>
> I found two possible problems regarding IndexWriter's maxMergeDocs value. I'm using the
following code to test maxMergeDocs:
> {code:java} 
>   public void testMaxMergeDocs() throws IOException {
>     final int maxMergeDocs = 50;
>     final int numSegments = 40;
>     
>     MockRAMDirectory dir = new MockRAMDirectory();
>     IndexWriter writer  = new IndexWriter(dir, new WhitespaceAnalyzer(), true);     

>     writer.setMergePolicy(new LogDocMergePolicy());
>     writer.setMaxMergeDocs(maxMergeDocs);
>     Document doc = new Document();
>     doc.add(new Field("field", "aaa", Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
>     for (int i = 0; i < numSegments * maxMergeDocs; i++) {
>       writer.addDocument(doc);
>       //writer.flush();      // uncomment to avoid the DocumentsWriter bug
>     }
>     writer.close();
>     
>     new SegmentInfos.FindSegmentsFile(dir) {
>       protected Object doBody(String segmentFileName) throws CorruptIndexException, IOException
{
>         SegmentInfos infos = new SegmentInfos();
>         infos.read(directory, segmentFileName);
>         for (int i = 0; i < infos.size(); i++) {
>           assertTrue(infos.info(i).docCount <= maxMergeDocs);
>         }
>         return null;
>       }
>     }.run();
>   }
> {code} 
>   
> - It seems that DocumentsWriter does not obey the maxMergeDocs parameter. If I don't
flush manually, then the index only contains one segment at the end and the test fails.
> - If I flush manually after each addDocument() call, then the index contains more segments.
But still, there are segments that contain more docs than maxMergeDocs, e. g. 55 vs. 50. The
javadoc in IndexWriter says:
> {code:java}
>    /**
>    * Returns the largest number of documents allowed in a
>    * single segment.
>    *
>    * @see #setMaxMergeDocs
>    */
>   public int getMaxMergeDocs() {
>     return getLogDocMergePolicy().getMaxMergeDocs();
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message