lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Carlson <carl...@bookandhammer.com>
Subject Re: Normalization of Documents
Date Sat, 13 Apr 2002 15:44:25 GMT
Hi Bernhard,

I think this is a very interesting issue.

I think that changing the scoring algorithm is one part of it, the other is
to get the information from the Document to use in the ranking. Since this
is an expensive operation, there will have to be an alternative approach.

Do you have any suggestions to start?

Thanks

--Peter

On 4/13/02 6:05 AM, "Bernhard Messer" <Bernhard.Messer@intrafind.de> wrote:


> 
> the topic you are focusing on is a never ending story in content
> retrieval in general. There is no perfect solution which fits in every
> environment. Retrieving a document's context based on a single query
> term seems to be very difficult also. In Lucene it isn't de very
> difficult to change the ranking algorithm. If you don't like the field
> normalization, you could comment the following in line in the TermScorer
> class.
> 
> score *= Similarity.norm(norms[d]);
> 
> If you put a comment around this line, youre scoring is based on the
> term frequency.
> 
> If more people are interested, we could think on a little bit more
> flexible ranking system within Lucene. There would be several parameters
> which from the environment which could be used to rank a document.
> Therefore we would need an interface where we could change the lucene
> document boost factor during runtime. For example, a document's ranking
> could be based on:
>   links pointing to that document (like Google)
>   last modification date,
>   size of the document,
>   term frequency,
>   how often was it displayed by other users, sending the same query
> terms to the system
>   .....


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message