lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: BM25 Scoring Patch
Date Wed, 17 Feb 2010 16:46:31 GMT
I tend to agree with you Marvin, you are right, the different scoring
mechanisms need different information available and this is the problem.

although last I checked, one hard part of BM25 rotates around fields versus
documents... e.g. BM25's IDF calculation.

but maybe this is just an extreme form of your example :)

On Wed, Feb 17, 2010 at 11:39 AM, Marvin Humphrey <marvin@rectangular.com>wrote:

> On Wed, Feb 17, 2010 at 10:31:19AM -0500, Robert Muir wrote:
> > yet if we don't do the hard work up front to make it easy to plug in
> things
> > like BM25, then no one will implement additional scoring formulas for
> > Lucene, we currently make it terribly difficult to do this.
>
> FWIW... Similarity and posting format spec are so closely tied that I'm
> considering linking them in Lucy.
>
>  Schema schema = new Schema();
>  FullTextType bm25Type = new FullTextType(new BM25Similarity());
>  schema.specField("content", bm25Type);
>  schema.specField("title", bm25Type);
>  StringType matchType = new StringType(new MatchSimilarity());
>  schema.specField("category", matchType);
>
> That way, custom scoring implementations can guarantee that they always
> have
> the posting information they need available to make their similarity
> judgments.  Similarity also becomes a more generalized notion, with the
> TF/IDF-specific functionality moving into a subclass.
>
> Maybe something similar could be made to work in Lucene.  Dunno how
> McCandless
> has things set up for spec'ing codecs on the flex branch.
>
> Marvin Humphrey
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message