lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roxana Angheluta <>
Subject Re: Study Group (WAS Re: Normalized Scoring)
Date Mon, 07 Feb 2005 17:42:39 GMT

>I think I see what you are after.  I'm after the same knowledge. :)
>The only things that I can recommend are books:
>  Modern Information Retrieval
>  Managing Gigabytes
>And online resources like:
> (note the weird host name)
>There is a pile of stuff in Citeseer, but those papers never really dig
>into the details and always require solid previous knowledge of the
>field.  They are no replacement for a class or a textbook.
>If you find a good forum for IR, please share.
I don't know about IR forums, but maybe the following link helps to get 
an introduction for those not familiar with the field of IR.
It gives an overview over possible weighting schemas used with vector 
space model:

These weights have been implemented in SMART, which is a famous 
retrieval system developed at Cornell University by Gerald Salton, one 
of the big names in the history of  IR (see

The weighting methods  used in SMART  can be coded with 3 characters.

First char gives the term-freq procedure to be used
      Second char gives the inverted-doc-freq procedure to be used.
      Third char gives the normalization procedure to be used.

Any combination of 3 letters is in theory acceptable. The system accounts for the boolean
model (by using e.g. bnn schema), 
as well as for more sophisticated weights.

While these schemas are theoretically attractive, it seems that empirically other weightings
have been proven to 
be more useful (e.g. not squaring the idf term).

Hope this helps,

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message