lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4769) Add a CountingFacetsAggregator which reads ordinals from a cache
Date Mon, 11 Feb 2013 14:21:13 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575802#comment-13575802
] 

Michael McCandless commented on LUCENE-4769:
--------------------------------------------

Full (6.6M) wikibig index, 7 facet dims:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev                Pct
diff
                 Respell       46.60      (3.4%)       45.82      (4.1%)   -1.7% (  -8% -
   6%)
            HighSpanNear        3.49      (1.7%)        3.51      (2.2%)    0.8% (  -3% -
   4%)
              HighPhrase       17.13     (10.5%)       17.42     (11.0%)    1.7% ( -17% -
  26%)
                  Fuzzy2       53.25      (2.8%)       54.19      (3.1%)    1.8% (  -4% -
   7%)
              AndHighLow      587.43      (2.3%)      597.84      (2.6%)    1.8% (  -3% -
   6%)
         LowSloppyPhrase       20.30      (2.3%)       20.68      (2.3%)    1.9% (  -2% -
   6%)
             LowSpanNear        8.24      (2.3%)        8.42      (2.9%)    2.1% (  -3% -
   7%)
             AndHighHigh       23.36      (1.3%)       23.95      (0.9%)    2.5% (   0% -
   4%)
        HighSloppyPhrase        0.92      (5.1%)        0.94      (6.1%)    2.8% (  -7% -
  14%)
               LowPhrase       21.02      (6.2%)       21.63      (6.7%)    2.9% (  -9% -
  16%)
             MedSpanNear       28.31      (1.3%)       29.20      (1.5%)    3.1% (   0% -
   6%)
         MedSloppyPhrase       25.98      (1.7%)       26.79      (1.7%)    3.1% (   0% -
   6%)
                 MedTerm       47.54      (1.9%)       49.49      (3.4%)    4.1% (  -1% -
   9%)
                  Fuzzy1       47.28      (2.2%)       49.27      (2.6%)    4.2% (   0% -
   9%)
              AndHighMed      105.55      (0.9%)      112.03      (1.2%)    6.1% (   3% -
   8%)
                Wildcard       27.63      (1.2%)       30.03      (1.6%)    8.7% (   5% -
  11%)
               MedPhrase      109.43      (5.6%)      122.45      (7.4%)   11.9% (   0% -
  26%)
                 LowTerm      110.94      (1.9%)      128.73      (1.8%)   16.0% (  12% -
  20%)
               OrHighLow       17.11      (2.2%)       22.44      (3.7%)   31.1% (  24% -
  37%)
               OrHighMed       16.63      (2.1%)       21.89      (3.8%)   31.6% (  25% -
  38%)
                HighTerm       19.17      (1.9%)       26.30      (3.5%)   37.2% (  31% -
  43%)
              OrHighHigh        8.77      (2.4%)       12.45      (4.7%)   42.1% (  34% -
  50%)
                 Prefix3       13.06      (1.8%)       18.66      (2.2%)   42.9% (  38% -
  47%)
                  IntNRQ        3.59      (1.6%)        6.45      (3.3%)   79.8% (  73% -
  86%)
{noformat}

trunk DVs take 61.4 MB while the int[] cache takes 202.9 MB (3.3X
more).  Also, if users use the int[] cache they must remember to use
(and maybe we check / warn about it somehow) a disk-backed DV else
it's silly since you'd be double-caching in RAM.

Curiously these gains are not that much better (except IntNRQ) than
LUCENE-4764, which was only ~31% larger... which is odd because we had
previously tested [close to] LUCENE-4764 against int[] cache and it
was faster.

                
> Add a CountingFacetsAggregator which reads ordinals from a cache
> ----------------------------------------------------------------
>
>                 Key: LUCENE-4769
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4769
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-4769.patch
>
>
> Mike wrote a prototype of a FacetsCollector which reads ordinals from a CachedInts structure
on LUCENE-4609. I ported it to the new facets API, as a FacetsAggregator. I think we should
offer users the means to use such a cache, even if it consumes more RAM. Mike tests show that
this cache consumed x2 more RAM than if the DocValues were loaded into memory in their raw
form. Also, a PackedInts version of such cache took almost the same amount of RAM as straight
int[], but the gains were minor.
> I will post the patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message