lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches
Date Tue, 19 Jun 2012 21:36:43 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397082#comment-13397082
] 

Michael McCandless commented on LUCENE-4069:
--------------------------------------------

Results from last patch:
{noformat}
                Task    QPS base StdDev base   QPS bloomStdDev bloom      Pct diff
              IntNRQ       11.35        1.27       10.14        0.62  -24% -    6%
              Fuzzy1      108.52        3.34      101.82        2.90  -11% -    0%
             Prefix3       64.87        2.17       61.55        1.61  -10% -    0%
            Wildcard       43.18        1.74       41.33        1.17  -10% -    2%
              Fuzzy2       41.76        1.40       40.05        1.00   -9% -    1%
                Term      151.71        4.38      147.24        4.42   -8% -    2%
            SpanNear        5.23        0.09        5.11        0.12   -6% -    1%
           OrHighMed       12.60        0.88       12.34        0.48  -11% -    9%
        SloppyPhrase        8.25        0.20        8.09        0.07   -5% -    1%
        TermBGroup1M       69.98        0.68       68.80        1.13   -4% -    0%
          OrHighHigh       10.06        0.66        9.93        0.39  -11% -    9%
              Phrase       12.73        0.30       12.57        0.35   -6% -    3%
         TermGroup1M       35.44        0.42       35.08        0.67   -4% -    2%
          AndHighMed       63.40        2.27       62.90        1.11   -5% -    4%
             Respell       93.11        3.70       92.81        2.33   -6% -    6%
      TermBGroup1M1P       50.93        1.53       50.96        1.75   -6% -    6%
         AndHighHigh       15.86        0.71       15.93        0.27   -5% -    6%
            PKLookup      127.44        2.15      134.85        8.68   -2% -   14%
{noformat}

Looks like FuzzyN/Respell is good again ... PKLookup is a bit faster ... the rest is likely
noise.
                
> Segment-level Bloom filters for a 2 x speed up on rare term searches
> --------------------------------------------------------------------
>
>                 Key: LUCENE-4069
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4069
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 3.6, 4.0
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 4.0, 3.6.1
>
>         Attachments: BloomFilterPostingsBranch4x.patch, MHBloomFilterOn3.6Branch.patch,
PrimaryKeyPerfTest40.java
>
>
> An addition to each segment which stores a Bloom filter for selected fields in order
to give fast-fail to term searches, helping avoid wasted disk access.
> Best suited for low-frequency fields e.g. primary keys on big indexes with many segments
but also speeds up general searching in my tests.
> Overview slideshow here: http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
> Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
> Patch based on 3.6 codebase attached.
> There are no 3.6 API changes currently - to play just add a field with "_blm" on the
end of the name to invoke special indexing/querying capability. Clearly a new Field or schema
declaration(!) would need adding to APIs to configure the service properly.
> Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message