lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <cdor...@gmail.com>
Subject Re: Please help me with a basic question...
Date Fri, 20 May 2011 18:46:05 GMT
Hi Rich,

SeetSpotSimilarity looks promising. Does it not favor shorter docs by not
> normalizing or does it make some attempt to standardized.
>
> > - using e.g. SeetSpotSimilarity which do not favor shorter documents.
>

SweetSpotSimilarity (I misspelled it previously) defines a range of lengths
which is not normalized (this is the sweet spot) and a slop by which longer
or shorter documents are "punished" relative to the gap from the defined
range. It takes parameters for defining the range and the slop. See here:
http://lucene.apache.org/java/3_1_0/api/contrib-misc/org/apache/lucene/misc/SweetSpotSimilarity.html

I stumbled upon the 'Explain' function yesterday though it returns a crowded
> message using debug in SOLR admin. Is there another method or interface
> which returns more or cleaner info?
>

I am not familiar with the use of Solr for this, I hope someone else will
answer this...


> >>>> (Number of times the term appears in a document)
> >>>>    /   (Total Number of terms in that document)
> >>>>    * Log10(Number of total documents
> >>>>                  / Number of times search term appears in all
> documents)
>

If I read this correctly it seems the part about dividing by
    (Total Number of terms in that document)
does not describe the default scoring in Lucene, where this value, aka the
doc length, only takes part in the length normalization. This is also
explained here:
http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/search/Similarity.html

Once you are able to actually compare the score computation details
(Explain) for such two documents I think you'll be in better state for
identifying what really hurts the search quality.

Regards,
Doron

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message