lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <msoko...@gmail.com>
Subject Re: filtering and chaining Collectors
Date Thu, 16 Aug 2018 16:45:09 GMT
Right, that makes sense usually. But there are use cases for
post-filtering. A good example is when a collector performs grouping or
windowing and we want to apply filters based on the grouped or windowed
values.

On Thu, Aug 16, 2018 at 4:22 AM Adrien Grand <jpountz@gmail.com> wrote:

> I think one reason that we don't want to encourage filtering at the
> collector level is that it is much slower than filtering in the query. The
> former needs to check hits one by one while the latter can use leap frog to
> skip documents that don't match.
>
> Le mer. 15 août 2018 à 23:27, Michael Sokolov <msokolov@gmail.com> a
> écrit :
>
> > Hmm the more I root around, the more crazy it seems to try to thread a
> > return value through all the different places collect() gets called from.
> > Somehow I thought it would just be one place in IndexSearcher somewhere.
> >
> > On Wed, Aug 15, 2018 at 5:18 PM Michael Sokolov <msokolov@gmail.com>
> > wrote:
> >
> > > We have MultiCollector to enable running multiple Collectors on the
> same
> > > hits, in sequence for each hit. I think a nice extension would be to
> > enable
> > > filtering so that earlier collectors could reject a hit, preventing
> later
> > > collectors from seeing it.  This way you could have a post-filter
> > > implemented in one collector, and some other collection, like faceting,
> > in
> > > the next one, that wants to ignore hits that are filtered in this
> > > post-filter.
> > >
> > > The implementation idea would be to return a "status" value from
> > > LeafCollector.collect() indicating how to proceed. This could also
> > > naturally be used for early termination (you could have
> status=TERMINATE
> > |
> > > SKIP | COLLECT, say).
> > >
> > > I was trying to undertsand why this wasn't done before  for early
> > > termination since it seemed so natural to me, and thought - there must
> > be a
> > > reason. But I went and read through (skimmed really) the original
> > > EarlyTerminatingCollector issue (
> > > https://issues.apache.org/jira/browse/LUCENE-4858) and didn't see any
> > > discussion of that.
> > >
> > > Am I missing something here?
> > >
> > > -Mike
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message