Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
From: "Christian Schrader" <schrader.news@evendi.de>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Subject: document boost factor
Date: Thu, 6 Jun 2002 12:44:17 +0200
Message-ID: <LMEALKPOKMELLBJGEBEBOEGGIHAA.schrader.news@evendi.de>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-2"
Content-Transfer-Encoding: 8bit
Importance: Normal
In-reply-to: <50EA669584662B498F13A5F24630A0C0DB175B@peach.mnet.private>

Is it possible to set a document boost factor in the current CVS? And if
not, is anybody working on it?
I am VERY interested and would gladly test performance issues :-)

Christian

> -----Urspr�ngliche Nachricht-----
> Von: Hal�csy P�ter [mailto:halacsy.peter@axelero.com]
> Gesendet: 13 April 2002 18:03
> An: Lucene Users List
> Betreff: RE: Normalization of Documents
>
>
>
> > Therefore we would need an interface where we could change the lucene=20
> > document boost factor during runtime. For example, a=20
> > document's ranking=20
> > could be based on:
> >     links pointing to that document (like Google)
> >     last modification date,
> >     size of the document,
> >     term frequency,
> >     how often was it displayed by other users, sending the same query=20
> > terms to the system
> >     .....
>
> 4 of these 5 are based on a pre-calculated document value/weight/score =
> (I don't exactly understand what term frequency means in this context). =
> If I could assign a value to every document (as I proposed in a mail) we =
> could start to implement some algorithm to calculate different values =
> (for example link calculating popularity/page rank needs a matrix =
> inversion that isn't too simple)
>
>
> > Let me know if you find that idea interessting, i would like=20
> > to work on=20
> > that topic.
> I find it very interesting.
>
> peter
>
>
> On 4/13/02 6:05 AM, "Bernhard Messer"
> > <Bernhard.Messer@intrafind.de> wrote:
> >
> >
> > >
> > > the topic you are focusing on is a never ending story in content
> > > retrieval in general. There is no perfect solution which
> > fits in every
> > > environment. Retrieving a document's context based on a single query
> > > term seems to be very difficult also. In Lucene it isn't de very
> > > difficult to change the ranking algorithm. If you don't
> > like the field
> > > normalization, you could comment the following in line in
> > the TermScorer
> > > class.
> > >
> > > score *= Similarity.norm(norms[d]);
> > >
> > > If you put a comment around this line, youre scoring is based on the
> > > term frequency.
> > >
> > > If more people are interested, we could think on a little bit more
> > > flexible ranking system within Lucene. There would be
> > several parameters
> > > which from the environment which could be used to rank a document.
> > > Therefore we would need an interface where we could change
> > the lucene
> > > document boost factor during runtime. For example, a
> > document's ranking
> > > could be based on:
> > >   links pointing to that document (like Google)
> > >   last modification date,
> > >   size of the document,
> > >   term frequency,
> > >   how often was it displayed by other users, sending the same query
> > > terms to the system
> > >   .....
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
> >
> >
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>