lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4600) Explore facets aggregation during documents collection
Date Mon, 21 Jan 2013 13:04:13 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558738#comment-13558738
] 

Michael McCandless commented on LUCENE-4600:
--------------------------------------------

ALL_PARENTS StandardFacetsCollector (base) vs CountingFacetsCollector (comp):
{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev                Pct
diff
                 Respell       55.89      (3.2%)       55.13      (3.9%)   -1.4% (  -8% -
   5%)
                PKLookup      207.52      (1.6%)      206.95      (1.4%)   -0.3% (  -3% -
   2%)
                Wildcard       62.22      (3.2%)       62.94      (2.7%)    1.2% (  -4% -
   7%)
                  IntNRQ       17.88      (5.2%)       18.16      (5.7%)    1.6% (  -8% -
  13%)
                 Prefix3       45.56      (4.9%)       46.48      (4.1%)    2.0% (  -6% -
  11%)
        HighSloppyPhrase        0.80      (9.7%)        0.84      (8.5%)    4.9% ( -12% -
  25%)
              HighPhrase       13.52      (7.7%)       15.09      (8.1%)   11.6% (  -3% -
  29%)
         LowSloppyPhrase       15.02      (3.9%)       17.15      (4.0%)   14.1% (   5% -
  22%)
               LowPhrase       14.14      (4.3%)       16.77      (4.9%)   18.6% (   8% -
  29%)
         MedSloppyPhrase       14.81      (2.6%)       18.33      (2.7%)   23.7% (  17% -
  29%)
                  Fuzzy2       27.57      (2.6%)       34.95      (3.1%)   26.8% (  20% -
  33%)
             AndHighHigh        9.39      (1.6%)       11.92      (1.4%)   27.0% (  23% -
  30%)
                 MedTerm       14.63      (2.2%)       18.89      (1.7%)   29.1% (  24% -
  33%)
                HighTerm        5.28      (1.8%)        7.02      (2.4%)   33.0% (  28% -
  37%)
                  Fuzzy1       20.79      (2.1%)       27.71      (2.8%)   33.3% (  27% -
  39%)
               OrHighLow        4.82      (1.8%)        6.70      (2.6%)   39.1% (  34% -
  44%)
               OrHighMed        4.74      (1.8%)        6.61      (3.0%)   39.4% (  34% -
  44%)
              OrHighHigh        2.68      (1.8%)        3.77      (2.9%)   40.9% (  35% -
  46%)
               MedPhrase       39.21      (3.6%)       55.35      (3.6%)   41.2% (  32% -
  50%)
              AndHighMed       36.29      (3.5%)       51.92      (2.0%)   43.1% (  36% -
  50%)
                 LowTerm       27.96      (3.2%)       41.47      (2.2%)   48.3% (  41% -
  55%)
              AndHighLow       64.36      (5.4%)      107.94      (5.7%)   67.7% (  53% -
  83%)
             MedSpanNear       70.17      (6.1%)      123.23      (7.4%)   75.6% (  58% -
  94%)
             LowSpanNear       70.35      (6.0%)      123.59      (7.1%)   75.7% (  58% -
  94%)
            HighSpanNear       70.35      (6.1%)      123.69      (7.8%)   75.8% (  58% -
  95%)
{noformat}

These are nice gains!
                
> Explore facets aggregation during documents collection
> ------------------------------------------------------
>
>                 Key: LUCENE-4600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4600
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>            Assignee: Shai Erera
>         Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch,
LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch
>
>
> Today the facet module simply gathers all hits (as a bitset, optionally with a float[]
to hold scores as well, if you will aggregate them) during collection, and then at the end
when you call getFacetsResults(), it makes a 2nd pass over all those hits doing the actual
aggregation.
> We should investigate just aggregating as we collect instead, so we don't have to tie
up transient RAM (fairly small for the bit set but possibly big for the float[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message