lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene
Date Fri, 11 Mar 2011 15:54:06 GMT


Robert Muir commented on LUCENE-2091:

your attachment (BM25SimilarityProvider) seems to rely on some other code (Stats.DocFieldStats)
& AggregatesProvider .. which I guess is part of your DFR patch.. can you provide a pointer
to that.

Yeah this is from LUCENE-2392. Unfortunately it won't work with the most recent patch there,
but both patches are just really exploration to see how we can divide into subtasks.

For an update, the JIRA issues aren't well linked but we have actually made pretty good progress
on some major portions (imo these are the most interesting):
* Collection term stats: LUCENE-2862
* per-field similarity: LUCENE-2236
* termstate, to avoid redundant i/o for stats: LUCENE-2694
* norms cleanup: LUCENE-2771, LUCENE-2846

The next big step is to separate scoring from matching (see the latest patch on LUCENE-2392)
so that similarity has full responsibility for all calculations, and so we get full integration
with all queries, etc.

This isn't that complicated: however, in order to do this, we need to first refactor Explanations,
so that a Similarity has the capability (and responsibility!) to fully explain its calculations.
So I think this is the next issue to resolve before going any further.

> Add BM25 Scoring to Lucene
> --------------------------
>                 Key: LUCENE-2091
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>            Reporter: Yuval Feinstein
>            Priority: Minor
>             Fix For: 4.0
>         Attachments:, LUCENE-2091.patch, persianlucene.jpg
>   Original Estimate: 48h
>  Remaining Estimate: 48h
> describes an implementation of Okapi-BM25 scoring
in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib. 

This message is automatically generated by JIRA.
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message