lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Diego Ceccarelli (BLOOMBERG/ LONDON)" <>
Subject Re: Personalized search parameters
Date Mon, 08 Jan 2018 14:00:24 GMT
I'm assuming that you are writing the cosine similarity and you have two vectors containing
the pairs <term, tfidf>. The two vectors could have different sizes because they only
contain the terms that have tfidf != 0.
if you want to compute cosine similarity between the two lists you just have to consider the
pairs that appears in **both the vectors**, because otherwise if a term doesn't appear in
one of the two the product is going to be 0, so it will not contribute to the final tfidf

(Really old) Example:

From: At: 01/06/18 17:24:07To:
Subject: Re: Personalized search parameters

Don't we need vectors of the same size to calculate the cosine similarity? 
Maybe I missed something, but following that example it looks like i have to
manually recreate the sparse vectors, because the term vector of a document
should (i may be wrong) contain only the terms that appear in that document.
Am I wrong?

Given that i assumed (and that example goes in that direction) that we have
to manually create the sparse vector by first collecting all the terms and
then calculating the tf-idf frequency for each term in each document.
That's what i did, and I obtained vectors of the same dimension for each
document, i was just wondering if there was a better optimized way to obtain
those sparse vectors.

Sent from:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message