lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
Date Tue, 14 Jun 2011 02:16:47 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-3200:
----------------------------------

    Attachment: LUCENE-3200.patch

New patch with some minor issues fixed:
- fixed the RuntimeException
- fixed readByte to throw EOF if we are at the end of the n-1 th buffer. as buffer n may be
size 0, we will throw BufferUnderFlow in the chatch block. I added hasRemaining() there, so
its consistent with readBytes.
- The check for an invalid power was bogus (0 is allowed, leads to buffer size 1)
- The check for RandomAccessFile too big for maximum buffer size did not respect the additional
buffer. nrBuffers can then overflow easily


> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers
of 2
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3200
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3200
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>         Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch
>
>
> Robert and me discussed a little bit after Mike's investigations, that using SingleMMapIndexinput
together with MultiMMapIndexInput leads to hotspot slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the switching between
buffer boundaries is done in exception catch blocks. So normal code path is always the same
like for Single*
> - Only the seek method uses strange calculations (the modulo is totally bogus, it could
be simply: int bufOffset = (int) (pos % maxBufSize); - very strange way of calculating modulo
in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We should pass
only the power of 2 to the indexinput as size. All calculations in seek and anywhere else
would be simple bit shifts and AND operations (the and masks for the modulo can be calculated
in the ctor like NumericUtils does when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an issue at all.
In my opinion, a buffer size of 2^31-1 is stupid in all cases, as it will no longer fit page
boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message