lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it
Date Fri, 07 Oct 2011 19:04:30 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123072#comment-13123072
] 

Robert Muir commented on LUCENE-1536:
-------------------------------------

Here's the results... F0.1 for example means filter accepting a random 0.1% of documents.

{noformat}
                Task   QPS trunkStdDev trunk   QPS patchStdDev patch      Pct diff
          PhraseF0.1       67.61        1.89       29.85        2.52  -60% -  -50%
          PhraseF0.5       20.08        0.72       13.09        1.11  -42% -  -26%
          PhraseF1.0       12.37        0.46        8.84        0.88  -37% -  -18%
      OrHighHighF0.1       78.84        1.19       59.96        2.87  -28% -  -19%
            TermF0.5      133.27        4.80      125.91        7.29  -14% -    3%
          OrHighHigh       12.73        0.45       12.13        0.92  -14% -    6%
              Fuzzy1       57.63        1.70       56.62        2.33   -8% -    5%
              Fuzzy2       96.92        2.25       96.19        2.63   -5% -    4%
   AndHighHighF100.0       16.99        0.50       16.92        1.38  -11% -   10%
    AndHighHighF99.0       17.00        0.48       16.94        1.37  -10% -   10%
    AndHighHighF95.0       17.00        0.48       16.98        1.35  -10% -   10%
          Fuzzy2F0.1      107.24        2.74      107.29        2.68   -4% -    5%
    AndHighHighF90.0       17.04        0.47       17.13        1.36   -9% -   11%
          Fuzzy1F0.1       74.60        1.58       75.03        1.55   -3% -    4%
  SloppyPhraseF100.0        7.82        0.16        7.89        0.24   -4% -    6%
   SloppyPhraseF99.0        7.82        0.16        7.92        0.23   -3% -    6%
        Fuzzy2F100.0       97.16        2.31       98.43        2.19   -3% -    6%
            PKLookup      171.71        6.83      174.15        7.28   -6% -   10%
        WildcardF0.1       67.96        1.06       69.08        1.95   -2% -    6%
            Wildcard       43.40        0.89       44.13        0.92   -2% -    5%
         Fuzzy2F99.0       96.83        2.46       98.49        2.21   -3% -    6%
         Fuzzy2F95.0       97.01        2.47       98.79        2.18   -2% -    6%
      SpanNearF100.0        3.11        0.04        3.18        0.09   -1% -    6%
    AndHighHighF75.0       17.13        0.48       17.57        1.36   -7% -   13%
         Fuzzy2F90.0       97.01        2.53       99.49        2.10   -2% -    7%
      OrHighHighF0.5       31.57        0.45       32.41        1.07   -2% -    7%
   SloppyPhraseF95.0        7.82        0.18        8.03        0.25   -2% -    8%
       SpanNearF99.0        3.11        0.04        3.20        0.09   -1% -    7%
     AndHighHighF0.1      136.96        3.21      140.94        5.15   -3% -    9%
    SloppyPhraseF0.1       56.27        0.88       57.97        1.47   -1% -    7%
          Fuzzy2F0.5      100.39        2.48      103.57        2.47   -1% -    8%
          PhraseF2.0        7.95        0.31        8.20        0.65   -8% -   15%
         AndHighHigh       17.97        0.46       18.55        0.84   -3% -   10%
            TermF0.1      351.76        9.38      363.42       16.25   -3% -   10%
        SloppyPhrase        7.90        0.16        8.19        0.19    0% -    8%
              Phrase        3.69        0.12        3.83        0.13   -3% -   10%
        WildcardF0.5       62.57        0.88       65.31        2.07    0% -    9%
   SloppyPhraseF90.0        7.83        0.16        8.18        0.24    0% -    9%
         Fuzzy2F75.0       96.77        2.46      101.14        2.41    0% -    9%
            SpanNear        3.15        0.04        3.30        0.07    1% -    8%
                Term       71.54        4.98       74.98        5.61   -9% -   21%
       SpanNearF95.0        3.11        0.05        3.26        0.09    0% -    9%
        PhraseF100.0        3.49        0.13        3.68        0.15   -2% -   14%
         PhraseF99.0        3.49        0.12        3.69        0.15   -2% -   14%
        SpanNearF0.1       31.54        0.48       33.49        0.73    2% -   10%
         PhraseF95.0        3.49        0.12        3.72        0.16   -1% -   15%
       SpanNearF90.0        3.12        0.04        3.35        0.09    3% -   11%
         Fuzzy2F50.0       97.08        2.32      104.79        2.66    2% -   13%
         PhraseF90.0        3.49        0.13        3.78        0.16    0% -   17%
        Fuzzy1F100.0       47.68        1.41       52.27        1.08    4% -   15%
         Fuzzy1F99.0       47.57        1.49       52.28        1.19    4% -   16%
    AndHighHighF50.0       17.30        0.48       19.12        1.47    0% -   22%
        WildcardF1.0       58.03        0.81       64.32        2.40    5% -   16%
         Fuzzy1F95.0       47.59        1.50       52.84        1.17    5% -   17%
   SloppyPhraseF75.0        7.85        0.15        8.73        0.24    6% -   16%
          Fuzzy2F1.0       98.59        2.36      110.12        2.89    6% -   17%
         Fuzzy1F90.0       47.51        1.40       53.54        1.09    7% -   18%
         PhraseF75.0        3.51        0.13        3.98        0.18    4% -   22%
            TermF1.0       92.28        3.05      104.56        7.44    1% -   25%
       WildcardF99.0       36.01        0.76       40.88        1.16    8% -   19%
          Fuzzy1F0.5       59.00        1.10       67.10        1.36    9% -   18%
      WildcardF100.0       35.92        0.79       40.86        1.19    8% -   19%
       WildcardF95.0       36.01        0.75       41.02        1.19    8% -   19%
       WildcardF90.0       36.06        0.70       41.14        1.20    8% -   19%
         Fuzzy2F20.0       98.32        2.34      112.69        2.91    9% -   20%
       WildcardF75.0       36.19        0.62       41.69        1.15   10% -   20%
     AndHighHighF0.5       49.93        1.37       57.85        4.13    4% -   27%
         Fuzzy1F75.0       47.25        1.50       55.55        1.11   11% -   23%
         Fuzzy2F10.0       98.47        2.46      116.18        3.00   12% -   24%
       WildcardF50.0       36.77        0.55       43.44        1.29   12% -   23%
      OrHighHighF1.0       24.37        0.38       28.99        1.90    9% -   28%
          Fuzzy1F2.0       52.64        1.05       63.12        1.32   15% -   24%
       SpanNearF75.0        3.11        0.04        3.74        0.10   15% -   24%
          Fuzzy2F5.0       97.96        2.31      118.02        3.48   14% -   27%
          Fuzzy2F2.0       98.02        2.22      119.13        3.42   15% -   27%
    OrHighHighF100.0        7.70        0.34        9.51        0.34   13% -   33%
     OrHighHighF99.0        7.70        0.36        9.56        0.34   14% -   34%
         Fuzzy1F50.0       47.46        1.24       59.15        1.18   19% -   30%
         PhraseF50.0        3.57        0.12        4.45        0.23   14% -   35%
     OrHighHighF95.0        7.73        0.35        9.73        0.35   16% -   36%
   SloppyPhraseF50.0        7.92        0.16       10.09        0.28   21% -   33%
        WildcardF2.0       53.32        0.69       68.29        3.44   20% -   36%
     OrHighHighF90.0        7.77        0.35        9.97        0.35   18% -   39%
       WildcardF20.0       41.13        0.60       54.63        2.12   25% -   39%
     OrHighHighF75.0        7.91        0.32       10.73        0.36   26% -   45%
        WildcardF5.0       47.44        0.57       65.42        3.11   29% -   46%
       WildcardF10.0       44.01        0.53       61.16        2.61   31% -   46%
         Fuzzy1F20.0       49.57        1.20       69.49        1.70   33% -   47%
          Fuzzy1F1.0       54.39        1.07       76.95        2.03   35% -   48%
     AndHighHighF1.0       34.63        1.07       50.01        4.02   28% -   60%
          PhraseF5.0        5.16        0.20        7.61        0.75   27% -   68%
         Fuzzy1F10.0       50.23        1.07       75.36        2.11   42% -   57%
     OrHighHighF50.0        8.36        0.29       12.58        0.48   39% -   61%
      OrHighHighF2.0       19.65        0.34       29.58        2.27   36% -   65%
       SpanNearF50.0        3.11        0.04        4.76        0.12   47% -   58%
            TermF2.0       68.99        2.38      106.22        8.65   36% -   72%
          Fuzzy1F5.0       50.74        1.06       79.90        2.38   49% -   65%
         PhraseF20.0        3.81        0.13        6.10        0.45   43% -   78%
           TermF50.0       42.19        1.41       67.96        4.63   45% -   77%
           TermF75.0       41.36        1.46       67.47        5.30   45% -   82%
           TermF90.0       41.05        1.47       68.08        5.85   46% -   86%
           TermF95.0       41.03        1.49       68.08        6.14   45% -   87%
         PhraseF10.0        4.22        0.16        7.02        0.62   46% -   87%
           TermF99.0       40.99        1.56       68.31        6.21   45% -   89%
          TermF100.0       40.88        1.61       68.28        6.32   45% -   89%
    SloppyPhraseF0.5       18.81        0.30       31.53        0.96   59% -   75%
    AndHighHighF20.0       17.62        0.52       30.63        2.79   53% -   95%
      OrHighHighF5.0       14.99        0.29       27.44        1.98   66% -  100%
        SpanNearF0.5        9.17        0.12       17.12        0.42   79% -   93%
           TermF20.0       45.25        1.50       84.63        6.04   68% -  107%
     OrHighHighF20.0       10.35        0.25       19.60        1.08   74% -  104%
            TermF5.0       52.49        1.71       99.90        8.02   69% -  112%
     AndHighHighF2.0       25.97        0.81       50.45        4.72   70% -  119%
     OrHighHighF10.0       12.36        0.22       24.25        1.56   80% -  112%
           TermF10.0       46.97        1.47       92.60        7.08   76% -  119%
   SloppyPhraseF20.0        8.18        0.16       16.35        0.58   89% -  111%
        SpanNearF1.0        6.05        0.09       12.21        0.28   94% -  109%
    AndHighHighF10.0       18.44        0.55       40.77        4.15   92% -  151%
     AndHighHighF5.0       20.34        0.63       50.83        5.67  115% -  186%
   SloppyPhraseF10.0        8.52        0.17       22.79        0.96  151% -  184%
       SpanNearF20.0        3.15        0.05        9.03        0.24  174% -  198%
    SloppyPhraseF1.0       13.62        0.23       42.77        2.29  192% -  236%
        SpanNearF2.0        4.45        0.06       14.31        0.37  209% -  234%
    SloppyPhraseF5.0        9.12        0.17       29.98        1.41  207% -  250%
    SloppyPhraseF2.0       10.85        0.19       38.31        2.00  229% -  278%
       SpanNearF10.0        3.25        0.05       13.71        0.39  303% -  339%
        SpanNearF5.0        3.52        0.05       19.51        0.67  428% -  481%
{noformat}
                
> if a filter can support random access API, we should use it
> -----------------------------------------------------------
>
>                 Key: LUCENE-1536
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1536
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>             Fix For: 4.0
>
>         Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch,
LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch,
LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
LUCENE-1536.patch, LUCENE-1536.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
>     10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
>     means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
>     AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
>     95, 98, 99, 99.99999 (filter is non-null but all bits are set),
>     100 (filter=null, control)).
>   * Method high means I use random-access filter API in
>     IndexSearcher's main loop.  Method low means I use random-access
>     filter API down in SegmentTermDocs (just like deleted docs
>     today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
>     "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message