lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Turnbull <dturnb...@opensourceconnections.com>
Subject Re: Solr Support for BM25F
Date Tue, 19 Apr 2016 00:17:46 GMT
It's worth adding that Lucene's BlendedTermQuery, (used in Elasticsearch's
cross_field search), attempts to blend field's document frequency together.
So I wonder what BlendedTermQuery plus BM25 similarity per-field would do?
It might be close to true BM25F aside for the length issue.

(You'd have to write a QParserPlugin and build the BlendedTermQuery
yourself, AFAIK there's not a direct Solr interface to it yet.)

Best
-Doug

On Mon, Apr 18, 2016 at 4:52 PM Tom Burton-West <tburtonw@umich.edu> wrote:

> Hi David,
>
> It may not matter for your use case  but just in case you really are
> interested in the "real BM25F" there is a difference between configuring K1
> and B for different fields in Solr and a "real" BM25F implementation.  This
> has to do with Solr's model of fields being mini-documents (i.e. each field
> has its own length, idf and tf)   See the discussion in
> https://issues.apache.org/jira/browse/LUCENE-2959, particularly these
> comments by Robert Muir:
>
> "Actually as far as BM25f, this one presents a few challenges (some already
> discussed on LUCENE-2091 <
> https://issues.apache.org/jira/browse/LUCENE-2091>
> ).
>
> To summarize:
>
>    - for any field, Lucene has a per-field terms dictionary that contains
>    that term's docFreq. To compute BM25f's IDF method would be challenging,
>    because it wants a docFreq "across all the fields". (its not clear to
> me at
>    a glance either from the original paper, if this should be across only
> the
>    fields in the query, across all the fields in the document, and if a
>    "static" schema is implied in this scoring system (in lucene document 1
> can
>    have 3 fields and document 2 can have 40 different ones, even with
>    different properties).
>    - the same issue applies to length normalization, lucene has a "field
>    length" but really no concept of document length."
>
> Tom
>
> On Thu, Apr 14, 2016 at 12:41 PM, David Cawley <david.cawley5@mail.dcu.ie>
> wrote:
>
> > Hello,
> > I am developing an enterprise search engine for a project and I was
> hoping
> > to implement BM25F ranking algorithm to configure the tuning parameters
> on
> > a per field basis. I understand BM25 similarity is now supported in Solr
> > but I was hoping to be able to configure k1 and b for different fields
> such
> > as title, description, anchor etc, as they are structured documents.
> > I am fairly new to Solr so any help would be appreciated. If this is
> > possible or any steps as to how I can go about implementing this it would
> > be greatly appreciated.
> >
> > Regards,
> >
> > David
> >
> > Current Solr Version 5.4.1
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message