lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dikshant Shahi <contacts...@gmail.com>
Subject Re: Lucene cosine similarity score for more like this query
Date Tue, 03 Feb 2015 06:09:43 GMT
Conceptually, your understanding is correct about VSM & cosine similarity.
In text analysis, the range is 0 to 1 as there is no negative similarity.

The scores for handler which internally use Lucene's cosine similarity can
also go beyond 1. The reason being these scores are computed for each field
and goes through more computation after that. For example
summation/multiplication of scores for fields, to come up with the final
score for the document. Correct me, if my understanding is wrong.

Thanks,
Dikshant



On Tue, Feb 3, 2015 at 2:53 AM, Markus Jelsma <markus.jelsma@openindex.io>
wrote:

> Hi - MoreLikeThis is not based on cosine similarity. The idea is that rare
> terms - high IDF - are extracted from the source document, and then used to
> build a regular Query(). That query follows the same rules as regular
> queries, the rules of your similarity implemenation, which is TFIDF by
> default. So, as suggested, if you enable debugging, you can clearly see why
> scores can be above 1, or even much higher if queryNorm is disabled when
> using BM25 as similarity.
>
> If you really need cosine similarity between documents, you have to enable
> term vectors for the source fields, and use them to calculate the angle.
> The problem is that this does not scale well, you would need to calculate
> angles with virtually all other documents.
>
> M.
>
> -----Original message-----
> > From:Ali Nazemian <alinazemian@gmail.com>
> > Sent: Monday 2nd February 2015 21:39
> > To: solr-user@lucene.apache.org
> > Subject: Re: Lucene cosine similarity score for more like this query
> >
> > Dear Erik,
> > Thank you for your response. Would younplease tell me why this score
> could
> > be higher than 1? While cosine similarity can not be higher than 1.
> > On Feb 2, 2015 7:32 PM, "Erik Hatcher" <erik.hatcher@gmail.com> wrote:
> >
> > > The scoring is the same as Lucene.  To get deeper insight into how a
> score
> > > is computed, use Solr’s debug=true mode to see the explain details in
> the
> > > response.
> > >
> > >         Erik
> > >
> > > > On Feb 2, 2015, at 10:49 AM, Ali Nazemian <alinazemian@gmail.com>
> wrote:
> > > >
> > > > Hi,
> > > > I was wondering what is the range of score is brought by more like
> this
> > > > query in Solr? I know that the Lucene uses cosine similarity in
> vector
> > > > space model for calculating similarity between two documents. I also
> know
> > > > that cosine similarity is between -1 and 1 but the fact that I dont
> > > > understand is why the score which is brought by more like this query
> > > could
> > > > be "12" for example?! Would you please explain what is the
> calculation
> > > > process is Solr?
> > > > Thank you very much.
> > > >
> > > > Best regards.
> > > >
> > > > --
> > > > A.Nazemian
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message