lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
Date Thu, 03 Apr 2014 18:49:17 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959078#comment-13959078
] 

Michael McCandless commented on LUCENE-5527:
--------------------------------------------

+1 for LeafCollector and the patch.

I tested if there are search performance impacts from this:

{noformat}
Report after iter 10:
                    Task    QPS base      StdDev    QPS comp      StdDev                Pct
diff
                 Respell       49.44      (3.3%)       48.10      (3.7%)   -2.7% (  -9% -
   4%)
                  Fuzzy2       46.74      (3.2%)       45.73      (3.1%)   -2.2% (  -8% -
   4%)
                  Fuzzy1       59.25      (3.7%)       58.08      (3.5%)   -2.0% (  -8% -
   5%)
                  IntNRQ        3.42      (3.8%)        3.40      (3.8%)   -0.7% (  -7% -
   7%)
                 Prefix3       86.67      (2.6%)       86.17      (2.6%)   -0.6% (  -5% -
   4%)
         LowSloppyPhrase       44.44      (2.3%)       44.42      (2.5%)   -0.1% (  -4% -
   4%)
                Wildcard       19.08      (3.5%)       19.07      (3.0%)   -0.1% (  -6% -
   6%)
              AndHighMed       34.38      (1.0%)       34.38      (1.0%)   -0.0% (  -2% -
   2%)
             LowSpanNear       10.41      (3.1%)       10.41      (2.3%)    0.0% (  -5% -
   5%)
        HighSloppyPhrase        3.49      (7.9%)        3.49      (6.6%)    0.1% ( -13% -
  15%)
             AndHighHigh       28.35      (1.1%)       28.39      (1.0%)    0.1% (  -1% -
   2%)
             MedSpanNear       31.06      (2.8%)       31.12      (2.7%)    0.2% (  -5% -
   5%)
              AndHighLow      391.44      (2.9%)      392.73      (2.6%)    0.3% (  -5% -
   6%)
         MedSloppyPhrase        3.54      (5.2%)        3.56      (4.6%)    0.4% (  -8% -
  10%)
               OrHighMed       26.51      (4.0%)       26.66      (5.7%)    0.6% (  -8% -
  10%)
            OrHighNotLow       24.84      (4.1%)       24.98      (5.8%)    0.6% (  -9% -
  10%)
               LowPhrase       13.19      (1.6%)       13.27      (2.3%)    0.6% (  -3% -
   4%)
               OrHighLow       18.78      (4.1%)       18.91      (5.8%)    0.7% (  -8% -
  11%)
           OrNotHighHigh        8.87      (4.5%)        8.93      (6.0%)    0.7% (  -9% -
  11%)
            OrHighNotMed       30.63      (4.1%)       30.85      (5.5%)    0.7% (  -8% -
  10%)
              OrHighHigh        8.21      (4.1%)        8.27      (5.8%)    0.7% (  -8% -
  11%)
               MedPhrase      203.10      (6.6%)      204.77      (6.3%)    0.8% ( -11% -
  14%)
           OrHighNotHigh       11.09      (4.5%)       11.18      (5.9%)    0.8% (  -9% -
  11%)
                 LowTerm      322.74      (5.6%)      325.67      (5.6%)    0.9% (  -9% -
  12%)
                HighTerm       63.88     (12.8%)       64.55     (12.2%)    1.1% ( -21% -
  29%)
                 MedTerm      100.19      (9.8%)      101.31      (9.5%)    1.1% ( -16% -
  22%)
            HighSpanNear        8.09      (4.0%)        8.18      (4.9%)    1.1% (  -7% -
  10%)
              HighPhrase        4.27      (7.1%)        4.32      (6.5%)    1.2% ( -11% -
  15%)
            OrNotHighMed       19.00      (7.0%)       19.30      (7.6%)    1.6% ( -12% -
  17%)
            OrNotHighLow       19.63      (7.4%)       19.96      (8.0%)    1.7% ( -12% -
  18%)
{noformat}

Looks like just noise!

> Make the Collector API work per-segment
> ---------------------------------------
>
>                 Key: LUCENE-5527
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5527
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but there is
one of them that I really really like that consists in refactoring the {{Collector}} API in
order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be able to take
care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and {{acceptDocsOutOfOrder}}
need to be called after {{setNextReader}} which is otherwise unclear.
> It also makes things more flexible. For example, a collector could much more easily decide
to use different strategies on different segments. In particular, it makes the early-termination
collector much cleaner since it can return different atomic collectors implementations depending
on whether the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it easier to migrate
by having a Collector that would implement both Collector and AtomicCollector, return {{this}}
in setNextReader and make current concrete Collector implementations extend this class instead
of directly extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message