lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
Date Sun, 01 Jul 2012 15:22:42 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404742#comment-13404742
] 

Michael McCandless commented on LUCENE-4123:
--------------------------------------------

bq. You should make the II correctly throw IOExceptions like MMap does, so catch the AIOOBE
and rethrow as EOFException (just copy the code).

+1.  Are we sure the catch + rethrow adds no cost?

Though, I think tests don't actually fail as is, because I intentionally skip caching segments_N.
 Probably we should improve that to skip any file that's opened with readOnce=true.

bq. Can we make this IndexInput impl extend ByteArrayDataInput somehow?

+1

I won't have time for this any time soon so if you want to work on it Uwe feel free!
                
> Add CachingRAMDirectory
> -----------------------
>
>                 Key: LUCENE-4123
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4123
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/store
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM.  You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[].  It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
>                 Task    QPS base StdDev base  QPS cachedStdDev cached      Pct diff
>              Respell      197.00        7.27      203.19        8.17   -4% -   11%
>             PKLookup      121.12        2.80      125.46        3.20   -1% -    8%
>               Fuzzy2       66.62        2.62       69.91        2.85   -3% -   13%
>               Fuzzy1      206.20        6.47      222.21        6.52    1% -   14%
>        TermGroup100K      160.14        6.62      175.71        3.79    3% -   16%
>               Phrase       34.85        0.40       38.75        0.61    8% -   14%
>       TermBGroup100K      363.75       15.74      406.98       13.23    3% -   20%
>             SpanNear       53.08        1.11       59.53        2.94    4% -   20%
>     TermBGroup100K1P      222.53        9.78      252.86        5.96    6% -   21%
>         SloppyPhrase       70.36        2.05       79.95        4.48    4% -   23%
>             Wildcard      238.10        4.29      272.78        4.97   10% -   18%
>            OrHighMed      123.49        4.85      149.32        4.66   12% -   29%
>              Prefix3      288.46        8.10      350.40        5.38   16% -   26%
>           OrHighHigh       76.46        3.27       93.13        2.96   13% -   31%
>               IntNRQ       92.25        2.12      113.47        5.74   14% -   32%
>                 Term      757.12       39.03      958.62       22.68   17% -   36%
>          AndHighHigh      103.03        4.48      133.89        3.76   21% -   39%
>           AndHighMed      376.36       16.58      493.99       10.00   23% -   40%
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message