Lucene uses the vector space model. To understand that:
Read section 2.1 of "Space optimizations for Total Ranking" paper (Linked
here http://lucene.sourceforge.net/publications.html)
Read section 6 to 6.4 of
http://www.csee.umbc.edu/cadip/readings/IR.report.120600.book.pdf
Read section 1 of
http://www.cs.utexas.edu/users/inderjit/courses/dm2004/lecture5.ps
Vikas
On Tue, 14 Dec 2004, Nhan Nguyen Dang wrote:
> Hi all,
> Lucene score document based on the correlation between
> the query q and document t:
> (this is raw function, I don't pay attention to the
> boost_t, coord_q_d factor)
>
> score_d = sum_t( tf_q * idf_t / norm_q * tf_d * idf_t
> / norm_d_t) (*)
>
> Could anybody explain it in detail ? Or are there any
> papers, documents about this function ? Because:
>
> I have also read the book: Modern Information
> Retrieval, author: Ricardo BaezaYates and Berthier
> RibeiroNeto, Addison Wesley (Hope you have read it
> too). In page 27, they also suggest a scoring funtion
> for vector model based on the correlation between
> query q and document d as follow (I use different
> symbol):
>
> sum_t( weight_t_d * weight_t_q)
> score_d(d, q)=  (**)
> norm_d * norm_q
>
> where weight_t_d = tf_d * idf_t
> weight_t_q = tf_q * idf_t
> norm_d = sqrt( sum_t( (tf_d * idf_t)^2 ) )
> norm_q = sqrt( sum_t( (tf_q * idf_t)^2 ) )
>
> (**): sum_t( tf_q*idf_t * tf_d*idf_t)
> score_d(d, q)= (***)
> norm_d * norm_q
>
> The two function, (*) and (***), have 2 differences:
> 1. in (***), the sum_t is just for the numerator but
> in the (*), the sum_t is for everything. So, with
> norm_q = sqrt(sum_t((tf_q*idf_t)^2)); sum_t is
> calculated twice. Is this right? please explain.
>
> 2. No factor that define norms of the document: norm_d
> in the function (*). Can you explain this. what is the
> role of factor norm_d_t ?
>
> One more question: could anybody give me documents,
> papers that explain this function in detail. so when I
> apply Lucene for my system, I can adapt the document,
> and the field so that I still receive the correct
> scoring information from Lucene .
>
> Best regard,
> Thanks every body,
>
> =====
> Ð#7863;ng Nhân

To unsubscribe, email: luceneuserunsubscribe@jakarta.apache.org
For additional commands, email: luceneuserhelp@jakarta.apache.org
