I would like to know if there is a simple way to force Lucene to adopt the
simple cosine similarity of the term frequency vectors of the documents and
the query for ranking the result. In practice the score sc_i of the document
i should be given by:
sc_i = (D_i*Q)/(|D_i|*|Q|)
where D_i = vector of the term frequencies of document i;
Q = vector of the term frequencies of the Query;
* = scalar product;
|| = norm of the vector (the square root of the sum of the squares
of the entries of the vector).
I wasn't able to find a way to evaluate |D_i|.
Thank you
Claudio
