lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject Re: Configurable collectors for custom ranking
Date Sat, 21 Dec 2013 16:08:36 GMT
Hi Peter,

The fastest approach to doing this would to keep parallel hppc
FloatArrayList for the scores and IntArrayList for the docs. Just add the
docs and scores at collect time and iterate them in the finish. You'll be
using more memory, but if you're looking for best possible performance then
this might be the way to go.

Joel


On Thu, Dec 19, 2013 at 3:25 PM, Peter Keegan <peterlkeegan@gmail.com>wrote:

> I implemented the PostFilter approach described by Joel. Just iterating
> over the OpenBitSet, even without the scaling or the HashMap lookup, added
> 30ms to a query time, which kinda surprised me. There were about 150K hits
> out of a total of 500K. Is OpenBitSet the best way to do this?
>
> Thanks,
> Peter
>
>
> On Thu, Dec 19, 2013 at 9:51 AM, Peter Keegan <peterlkeegan@gmail.com
> >wrote:
>
> > In order to size the PriorityQueue, the result window size for the query
> > is needed. This has been computed in the SolrIndexSearcher and available
> > in: QueryCommand.getSupersetMaxDoc(), but doesn't seem to be available
> for
> > the PostFilter in either the SolrParms or SolrQueryRequest. Is there a
> way
> > to get this precomputed value or do I have to duplicate the logic from
> > SolrIndexSearcher?
> >
> > Thanks,
> > Peter
> >
> >
> > On Thu, Dec 12, 2013 at 1:53 PM, Joel Bernstein <joelsolr@gmail.com
> >wrote:
> >
> >> Thanks, I agree this powerful stuff. One of the reasons that I haven't
> >> gotten back to pluggable collectors is that I've been using PostFilters
> >> instead.
> >>
> >> When you start doing stuff with scores in postfilters you'll run into
> the
> >> bug in SOLR-5416. This will effect you when you use facets in
> combination
> >> with the QueryResultCache or tag and exclude faceting.
> >>
> >> The patch in SOLR-5416 resolves this issue. You'll just need your
> >> PostFilter to implement ScoreFilter and the SolrIndexSearcher will know
> >> how
> >> to handle things.
> >>
> >> The DelegatingCollector.finish() method is so new, these kinds of bugs
> are
> >> still being cleaned out of the system. SOLR-5416 should be in Solr 4.7.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Dec 12, 2013 at 12:54 PM, Peter Keegan <peterlkeegan@gmail.com
> >> >wrote:
> >>
> >> > This is pretty cool, and worthy of adding to Solr in Action (v2) and
> the
> >> > other books. With function queries, flexible filter processing and
> >> caching,
> >> > custom collectors, and post filters, there's a lot of flexibility
> here.
> >> >
> >> > Btw, the query times using a custom collector to scale/recompute
> scores
> >> is
> >> > excellent (will have to see how it compares to your outlined
> solution).
> >> >
> >> > Thanks,
> >> > Peter
> >> >
> >> >
> >> > On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein <joelsolr@gmail.com>
> >> > wrote:
> >> >
> >> > > The sorting is going to happen in the lower level collectors. You
> >> need a
> >> > > value source that returns the score of the document being collected.
> >> > >
> >> > > Here is how you can make this happen:
> >> > >
> >> > > 1) Create an object in your PostFilter that simply holds the current
> >> > score.
> >> > > Place this object in the SearchRequest context map. Update
> >> object.score
> >> > as
> >> > > you pass the docs and scores to the lower collectors.
> >> > >
> >> > > 2) Create a values source that checks the SearchRequest context for
> >> the
> >> > > object that's holding the current score. Use this object to return
> the
> >> > > current score when called. For example if you give the value source
> a
> >> > > handle called "score" a compound function call will look like this:
> >> > > sum(score(), field(x))
> >> > >
> >> > > Joel
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan <
> peterlkeegan@gmail.com
> >> > > >wrote:
> >> > >
> >> > > > Regarding my original goal, which is to perform a math function
> >> using
> >> > the
> >> > > > scaled score and a field value, and sort on the result, how does
> >> this
> >> > fit
> >> > > > in? Must I implement another custom PostFilter with a higher
cost
> >> than
> >> > > the
> >> > > > scale PostFilter?
> >> > > >
> >> > > > Thanks,
> >> > > > Peter
> >> > > >
> >> > > >
> >> > > > On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan <
> >> peterlkeegan@gmail.com
> >> > > > >wrote:
> >> > > >
> >> > > > > Thanks very much for the guidance. I'd be happy to donate
a
> >> working
> >> > > > > solution.
> >> > > > >
> >> > > > > Peter
> >> > > > >
> >> > > > >
> >> > > > > On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein <
> >> joelsolr@gmail.com
> >> > > > >wrote:
> >> > > > >
> >> > > > >> SOLR-5020 has the commit info, it's mainly changes to
> >> > > SolrIndexSearcher
> >> > > > I
> >> > > > >> believe. They might apply to 4.3.
> >> > > > >> I think as long you have the finish method that's all
you'll
> >> need.
> >> > If
> >> > > > you
> >> > > > >> can get this working it would be excellent if you could
donate
> >> back
> >> > > the
> >> > > > >> Scale PostFilter.
> >> > > > >>
> >> > > > >>
> >> > > > >> On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan <
> >> > peterlkeegan@gmail.com
> >> > > > >> >wrote:
> >> > > > >>
> >> > > > >> > This is what I was looking for, but the DelegatingCollector
> >> > 'finish'
> >> > > > >> method
> >> > > > >> > doesn't exist in 4.3.0 :(   Can this be patched
in and are
> >> there
> >> > any
> >> > > > >> other
> >> > > > >> > PostFilter dependencies on 4.5?
> >> > > > >> >
> >> > > > >> > Thanks,
> >> > > > >> > Peter
> >> > > > >> >
> >> > > > >> >
> >> > > > >> > On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein
<
> >> > joelsolr@gmail.com
> >> > > >
> >> > > > >> > wrote:
> >> > > > >> >
> >> > > > >> > > Here is one approach to use in a postfilter
> >> > > > >> > >
> >> > > > >> > > 1) In the collect() method call score for
each doc. Use the
> >> > scores
> >> > > > to
> >> > > > >> > > create your scaleInfo.
> >> > > > >> > > 2) Keep a bitset of the hits and a priorityQueue
of your
> top
> >> X
> >> > > > >> ScoreDocs.
> >> > > > >> > > 3) Don't delegate any documents to lower collectors
in the
> >> > > collect()
> >> > > > >> > > method.
> >> > > > >> > > 4) In the finish method create a score mapping
(use the
> hppc
> >> > > > >> > > IntFloatOpenHashMap) with your top X docIds
pointing to
> their
> >> > > score,
> >> > > > >> > using
> >> > > > >> > > the priorityQueue created in step 2. Then
iterate the
> bitset
> >> > (also
> >> > > > >> > created
> >> > > > >> > > in step 2) sending down each doc to the lower
collectors,
> >> > > retrieving
> >> > > > >> and
> >> > > > >> > > scaling the score from the score map. If the
document is
> not
> >> in
> >> > > the
> >> > > > >> score
> >> > > > >> > > map then send down 0.
> >> > > > >> > >
> >> > > > >> > > You'll have setup a dummy scorer to feed to
lower
> collectors.
> >> > The
> >> > > > >> > > CollapsingQParserPlugin has an example of
how to do this.
> >> > > > >> > >
> >> > > > >> > >
> >> > > > >> > >
> >> > > > >> > >
> >> > > > >> > > On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan
<
> >> > > > peterlkeegan@gmail.com
> >> > > > >> > > >wrote:
> >> > > > >> > >
> >> > > > >> > > > Hi Joel,
> >> > > > >> > > >
> >> > > > >> > > > I thought about using a PostFilter, but
the problem is
> that
> >> > the
> >> > > > >> 'scale'
> >> > > > >> > > > function must be done after all matching
docs have been
> >> scored
> >> > > but
> >> > > > >> > before
> >> > > > >> > > > adding them to the PriorityQueue that
sorts just the rows
> >> to
> >> > be
> >> > > > >> > returned.
> >> > > > >> > > > Doing the 'scale' function wrapped in
a 'query' is
> proving
> >> to
> >> > be
> >> > > > too
> >> > > > >> > slow
> >> > > > >> > > > when it visits every document in the
index.
> >> > > > >> > > >
> >> > > > >> > > > In the Collector, I can see how to get
the field values
> >> like
> >> > > this:
> >> > > > >> > > >
> >> > > > >> > > >
> >> > > > >> > >
> >> > > > >> >
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >>
> indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField,
> >> > > > >> > > > QParser).getValues()
> >> > > > >> > > >
> >> > > > >> > > > But, 'getValueSource' needs a QParser,
which isn't
> >> available.
> >> > > > >> > > > And I can't create a QParser without
a SolrQueryRequest,
> >> which
> >> > > > isn't
> >> > > > >> > > > available.
> >> > > > >> > > >
> >> > > > >> > > > Thanks,
> >> > > > >> > > > Peter
> >> > > > >> > > >
> >> > > > >> > > >
> >> > > > >> > > > On Wed, Dec 11, 2013 at 1:48 PM, Joel
Bernstein <
> >> > > > joelsolr@gmail.com
> >> > > > >> >
> >> > > > >> > > > wrote:
> >> > > > >> > > >
> >> > > > >> > > > > Peter,
> >> > > > >> > > > >
> >> > > > >> > > > > It sounds like you could achieve
what you want to do
> in a
> >> > > > >> PostFilter
> >> > > > >> > > > rather
> >> > > > >> > > > > then extending the TopDocsCollector.
Is there a reason
> >> why a
> >> > > > >> > PostFilter
> >> > > > >> > > > > won't work for you?
> >> > > > >> > > > >
> >> > > > >> > > > > Joel
> >> > > > >> > > > >
> >> > > > >> > > > >
> >> > > > >> > > > > On Tue, Dec 10, 2013 at 3:24 PM,
Peter Keegan <
> >> > > > >> > peterlkeegan@gmail.com
> >> > > > >> > > > > >wrote:
> >> > > > >> > > > >
> >> > > > >> > > > > > Quick question:
> >> > > > >> > > > > > In the context of a custom
collector, how does one
> get
> >> the
> >> > > > >> values
> >> > > > >> > of
> >> > > > >> > > a
> >> > > > >> > > > > > field of type 'ExternalFileField'?
> >> > > > >> > > > > >
> >> > > > >> > > > > > Thanks,
> >> > > > >> > > > > > Peter
> >> > > > >> > > > > >
> >> > > > >> > > > > >
> >> > > > >> > > > > > On Tue, Dec 10, 2013 at 1:18
PM, Peter Keegan <
> >> > > > >> > > peterlkeegan@gmail.com
> >> > > > >> > > > > > >wrote:
> >> > > > >> > > > > >
> >> > > > >> > > > > > > Hi Joel,
> >> > > > >> > > > > > >
> >> > > > >> > > > > > > This is related to another
thread on function query
> >> > > > matching (
> >> > > > >> > > > > > >
> >> > > > >> > > > > >
> >> > > > >> > > > >
> >> > > > >> > > >
> >> > > > >> > >
> >> > > > >> >
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >>
> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
> >> > > > >> > > > > > ).
> >> > > > >> > > > > > > The patch in SOLR-4465
will allow me to extend
> >> > > > >> TopDocsCollector
> >> > > > >> > and
> >> > > > >> > > > > > perform
> >> > > > >> > > > > > > the 'scale' function on
only the documents matching
> >> the
> >> > > main
> >> > > > >> > dismax
> >> > > > >> > > > > > query.
> >> > > > >> > > > > > > As you mention, it is
a slightly intrusive design
> and
> >> > > > requires
> >> > > > >> > > that I
> >> > > > >> > > > > > > manage my own PriorityQueue
(and a local duplicate
> of
> >> > > > >> HitQueue),
> >> > > > >> > > but
> >> > > > >> > > > > > should
> >> > > > >> > > > > > > work. I think a better
design would hide the PQ
> from
> >> the
> >> > > > >> plugin.
> >> > > > >> > > > > > >
> >> > > > >> > > > > > > Thanks,
> >> > > > >> > > > > > > Peter
> >> > > > >> > > > > > >
> >> > > > >> > > > > > >
> >> > > > >> > > > > > > On Sun, Dec 8, 2013 at
5:32 PM, Joel Bernstein <
> >> > > > >> > joelsolr@gmail.com
> >> > > > >> > > >
> >> > > > >> > > > > > wrote:
> >> > > > >> > > > > > >
> >> > > > >> > > > > > >> Hi Peter,
> >> > > > >> > > > > > >>
> >> > > > >> > > > > > >> I've been meaning
to revisit configurable ranking
> >> > > > collectors,
> >> > > > >> > but
> >> > > > >> > > I
> >> > > > >> > > > > > >> haven't
> >> > > > >> > > > > > >> yet had a chance.
It's on the shortlist of things
> >> I'd
> >> > > like
> >> > > > to
> >> > > > >> > > tackle
> >> > > > >> > > > > > >> though.
> >> > > > >> > > > > > >>
> >> > > > >> > > > > > >>
> >> > > > >> > > > > > >>
> >> > > > >> > > > > > >> On Fri, Dec 6, 2013
at 4:17 PM, Peter Keegan <
> >> > > > >> > > > peterlkeegan@gmail.com>
> >> > > > >> > > > > > >> wrote:
> >> > > > >> > > > > > >>
> >> > > > >> > > > > > >> > I looked at SOLR-4465
and SOLR-5045, where it
> >> appears
> >> > > > that
> >> > > > >> > there
> >> > > > >> > > > is
> >> > > > >> > > > > a
> >> > > > >> > > > > > >> goal
> >> > > > >> > > > > > >> > to be able to
do custom sorting and ranking in a
> >> > > > >> PostFilter.
> >> > > > >> > So
> >> > > > >> > > > far,
> >> > > > >> > > > > > it
> >> > > > >> > > > > > >> > looks like only
custom aggregation can be
> >> implemented
> >> > > in
> >> > > > >> > > > PostFilter
> >> > > > >> > > > > > >> (5045).
> >> > > > >> > > > > > >> > Custom sorting/ranking
can be done in a
> pluggable
> >> > > > collector
> >> > > > >> > > > (4465),
> >> > > > >> > > > > > but
> >> > > > >> > > > > > >> > this patch is
no longer in dev.
> >> > > > >> > > > > > >> >
> >> > > > >> > > > > > >> > Is there any
other dev. being done on adding
> >> custom
> >> > > > sorting
> >> > > > >> > > (after
> >> > > > >> > > > > > >> > collection) via
a plugin?
> >> > > > >> > > > > > >> >
> >> > > > >> > > > > > >> > Thanks,
> >> > > > >> > > > > > >> > Peter
> >> > > > >> > > > > > >> >
> >> > > > >> > > > > > >>
> >> > > > >> > > > > > >>
> >> > > > >> > > > > > >>
> >> > > > >> > > > > > >> --
> >> > > > >> > > > > > >> Joel Bernstein
> >> > > > >> > > > > > >> Search Engineer at
Heliosearch
> >> > > > >> > > > > > >>
> >> > > > >> > > > > > >
> >> > > > >> > > > > > >
> >> > > > >> > > > > >
> >> > > > >> > > > >
> >> > > > >> > > > >
> >> > > > >> > > > >
> >> > > > >> > > > > --
> >> > > > >> > > > > Joel Bernstein
> >> > > > >> > > > > Search Engineer at Heliosearch
> >> > > > >> > > > >
> >> > > > >> > > >
> >> > > > >> > >
> >> > > > >> > >
> >> > > > >> > >
> >> > > > >> > > --
> >> > > > >> > > Joel Bernstein
> >> > > > >> > > Search Engineer at Heliosearch
> >> > > > >> > >
> >> > > > >> >
> >> > > > >>
> >> > > > >>
> >> > > > >>
> >> > > > >> --
> >> > > > >> Joel Bernstein
> >> > > > >> Search Engineer at Heliosearch
> >> > > > >>
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Joel Bernstein
> >> > > Search Engineer at Heliosearch
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Joel Bernstein
> >> Search Engineer at Heliosearch
> >>
> >
> >
>



-- 
Joel Bernstein
Search Engineer at Heliosearch

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message