lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Halácsy Péter <halacsy.pe...@axelero.com>
Subject RE: Normalization of Documents
Date Sat, 13 Apr 2002 16:03:00 GMT

> Therefore we would need an interface where we could change the lucene=20
> document boost factor during runtime. For example, a=20
> document's ranking=20
> could be based on:
>     links pointing to that document (like Google)
>     last modification date,
>     size of the document,
>     term frequency,
>     how often was it displayed by other users, sending the same query=20
> terms to the system
>     .....

4 of these 5 are based on a pre-calculated document value/weight/score =
(I don't exactly understand what term frequency means in this context). =
If I could assign a value to every document (as I proposed in a mail) we =
could start to implement some algorithm to calculate different values =
(for example link calculating popularity/page rank needs a matrix =
inversion that isn't too simple)


> Let me know if you find that idea interessting, i would like=20
> to work on=20
> that topic.
I find it very interesting.

peter


On 4/13/02 6:05 AM, "Bernhard Messer" 
> <Bernhard.Messer@intrafind.de> wrote:
> 
> 
> > 
> > the topic you are focusing on is a never ending story in content
> > retrieval in general. There is no perfect solution which 
> fits in every
> > environment. Retrieving a document's context based on a single query
> > term seems to be very difficult also. In Lucene it isn't de very
> > difficult to change the ranking algorithm. If you don't 
> like the field
> > normalization, you could comment the following in line in 
> the TermScorer
> > class.
> > 
> > score *= Similarity.norm(norms[d]);
> > 
> > If you put a comment around this line, youre scoring is based on the
> > term frequency.
> > 
> > If more people are interested, we could think on a little bit more
> > flexible ranking system within Lucene. There would be 
> several parameters
> > which from the environment which could be used to rank a document.
> > Therefore we would need an interface where we could change 
> the lucene
> > document boost factor during runtime. For example, a 
> document's ranking
> > could be based on:
> >   links pointing to that document (like Google)
> >   last modification date,
> >   size of the document,
> >   term frequency,
> >   how often was it displayed by other users, sending the same query
> > terms to the system
> >   .....
> 
> 
> --
> To unsubscribe, e-mail:   
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: 
> <mailto:lucene-user-help@jakarta.apache.org>
> 
> 

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message