Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 48636 invoked from network); 6 Jun 2002 10:44:46 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 6 Jun 2002 10:44:46 -0000 Received: (qmail 11812 invoked by uid 97); 6 Jun 2002 10:44:43 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 11762 invoked by uid 97); 6 Jun 2002 10:44:42 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 11750 invoked by uid 98); 6 Jun 2002 10:44:41 -0000 X-Antivirus: nagoya (v4198 created Apr 24 2002) From: "Christian Schrader" To: "Lucene Users List" Subject: document boost factor Date: Thu, 6 Jun 2002 12:44:17 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-2" Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Importance: Normal In-reply-to: <50EA669584662B498F13A5F24630A0C0DB175B@peach.mnet.private> X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Is it possible to set a document boost factor in the current CVS? And if not, is anybody working on it? I am VERY interested and would gladly test performance issues :-) Christian > -----Urspr�ngliche Nachricht----- > Von: Hal�csy P�ter [mailto:halacsy.peter@axelero.com] > Gesendet: 13 April 2002 18:03 > An: Lucene Users List > Betreff: RE: Normalization of Documents > > > > > Therefore we would need an interface where we could change the lucene=20 > > document boost factor during runtime. For example, a=20 > > document's ranking=20 > > could be based on: > > links pointing to that document (like Google) > > last modification date, > > size of the document, > > term frequency, > > how often was it displayed by other users, sending the same query=20 > > terms to the system > > ..... > > 4 of these 5 are based on a pre-calculated document value/weight/score = > (I don't exactly understand what term frequency means in this context). = > If I could assign a value to every document (as I proposed in a mail) we = > could start to implement some algorithm to calculate different values = > (for example link calculating popularity/page rank needs a matrix = > inversion that isn't too simple) > > > > Let me know if you find that idea interessting, i would like=20 > > to work on=20 > > that topic. > I find it very interesting. > > peter > > > On 4/13/02 6:05 AM, "Bernhard Messer" > > wrote: > > > > > > > > > > the topic you are focusing on is a never ending story in content > > > retrieval in general. There is no perfect solution which > > fits in every > > > environment. Retrieving a document's context based on a single query > > > term seems to be very difficult also. In Lucene it isn't de very > > > difficult to change the ranking algorithm. If you don't > > like the field > > > normalization, you could comment the following in line in > > the TermScorer > > > class. > > > > > > score *= Similarity.norm(norms[d]); > > > > > > If you put a comment around this line, youre scoring is based on the > > > term frequency. > > > > > > If more people are interested, we could think on a little bit more > > > flexible ranking system within Lucene. There would be > > several parameters > > > which from the environment which could be used to rank a document. > > > Therefore we would need an interface where we could change > > the lucene > > > document boost factor during runtime. For example, a > > document's ranking > > > could be based on: > > > links pointing to that document (like Google) > > > last modification date, > > > size of the document, > > > term frequency, > > > how often was it displayed by other users, sending the same query > > > terms to the system > > > ..... > > > > > > -- > > To unsubscribe, e-mail: > > > > For additional commands, e-mail: > > > > > > > > -- > To unsubscribe, e-mail: For additional commands, e-mail: -- To unsubscribe, e-mail: For additional commands, e-mail: