lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-4748) Add DrillSideways helper class to Lucene facets module
Date Fri, 22 Feb 2013 11:14:13 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless updated LUCENE-4748:
---------------------------------------

    Attachment: LUCENE-4748.patch

New patch, fixing various bugs, beefing up the tests and resolving all
nocommits.  I think it's ready!

I also fixed a consistency issue with the facets API: if you request
faceting for a non-existent category, it now returns an empty
FacetResult instead of skipping it.

I tested on a wider variety of drill down / sideways queries.  base =
old patch and comp = this patch:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev                Pct
diff
          LowTermHardDD2       24.43      (2.0%)       24.43      (2.2%)    0.0% (  -4% -
   4%)
         HighTermEasyDD2       18.91      (1.6%)       20.59      (4.3%)    8.9% (   2% -
  15%)
          LowTermHardDD1       31.38      (2.0%)       36.21      (1.7%)   15.4% (  11% -
  19%)
         LowTermMixedDD2       44.09      (2.1%)       53.93      (0.9%)   22.3% (  18% -
  25%)
        LowTermHardOrDD1       25.85      (2.3%)       33.80      (2.0%)   30.7% (  25% -
  35%)
          MedTermHardDD2        5.78      (1.4%)        7.71      (5.3%)   33.4% (  26% -
  40%)
          LowTermEasyDD2      129.51      (1.7%)      176.27      (3.9%)   36.1% (  30% -
  42%)
          MedTermEasyDD2       42.88      (1.8%)       60.03      (3.5%)   40.0% (  34% -
  46%)
         MedTermMixedDD2       12.52      (1.4%)       17.59      (4.2%)   40.5% (  34% -
  46%)
        LowTermHardOrDD2       18.57      (2.8%)       26.45      (1.3%)   42.4% (  37% -
  47%)
          LowTermEasyDD1       71.73      (1.8%)      102.77      (1.8%)   43.3% (  38% -
  47%)
        LowTermEasyOrDD2       61.01      (2.7%)       98.57      (6.7%)   61.6% (  50% -
  73%)
         HighTermHardDD2        1.22      (1.8%)        1.97      (6.8%)   61.7% (  52% -
  71%)
          MedTermHardDD1        8.77      (2.6%)       14.47      (5.1%)   65.1% (  55% -
  74%)
        HighTermMixedDD2        2.69      (1.6%)        4.50      (6.8%)   67.4% (  58% -
  76%)
          MedTermEasyDD1       18.61      (2.6%)       32.34      (6.1%)   73.8% (  63% -
  84%)
        LowTermEasyOrDD1       51.31      (2.2%)       91.48      (2.1%)   78.3% (  72% -
  84%)
       HighTermEasyOrDD2        8.96      (3.1%)       16.17      (5.4%)   80.5% (  69% -
  91%)
       HighTermEasyOrDD1        3.47      (4.1%)        6.40      (7.5%)   84.8% (  70% -
 100%)
        MedTermHardOrDD2        4.31      (3.3%)        8.03      (6.4%)   86.6% (  74% -
  99%)
         HighTermEasyDD1        3.16      (3.0%)        5.89      (7.7%)   86.6% (  73% -
 100%)
        MedTermEasyOrDD1       15.63      (3.4%)       30.05      (6.5%)   92.2% (  79% -
 105%)
         HighTermHardDD1        1.61      (3.1%)        3.13      (7.6%)   94.3% (  81% -
 108%)
        MedTermHardOrDD1        6.75      (3.5%)       13.76      (6.0%)  103.9% (  91% -
 117%)
       HighTermHardOrDD2        1.14      (4.2%)        2.41      (9.2%)  111.6% (  94% -
 130%)
        MedTermEasyOrDD2       19.92      (3.0%)       45.44      (6.3%)  128.1% ( 115% -
 141%)
       HighTermHardOrDD1        0.96      (3.5%)        2.54     (10.4%)  163.6% ( 144% -
 183%)
{noformat}

DD2 means drill down on 2 dims, DD1 means drill down on 1 dim.  Hard
means the 1 or 2 dims have high count, Easy means they have low count,
and Mixed means one high and one low.  OrDDX means I OR two values per
dim.

The new patch is especially faster for the OR case (ie, when you drill
down on more than one value in a single dim), I think because it
handles it directly instead of recursing into another BQ.

                
> Add DrillSideways helper class to Lucene facets module
> ------------------------------------------------------
>
>                 Key: LUCENE-4748
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4748
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.2, 5.0
>
>         Attachments: DrillSideways-alternative.tar.gz, LUCENE-4748.patch, LUCENE-4748.patch,
LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch,
LUCENE-4748.patch
>
>
> This came out of a discussion on the java-user list with subject
> "Faceted search in OR": http://markmail.org/thread/jmnq6z2x7ayzci5k
> The basic idea is to count "near misses" during collection, ie
> documents that matched the main query and also all except one of the
> drill down filters.
> Drill sideways makes for a very nice faceted search UI because you
> don't "lose" the facet counts after drilling in.  Eg maybe you do a
> search for "cameras", and you see facets for the manufacturer, so you
> drill into "Nikon".
> With drill sideways, even after drilling down, you'll still get the
> counts for all the other brands, where each count tells you how many
> hits you'd get if you changed to a different manufacturer.
> This becomes more fun if you add further drill-downs, eg maybe I next drill
> down into Resolution=10 megapixels", and then I can see how many 10
> megapixel cameras all other manufacturers, and what other resolutions
> Nikon cameras offer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message