lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: Per-token weighting / attribute data in index
Date Sat, 03 Jun 2006 01:05:59 GMT
On Fri, Jun 02, 2006 at 03:47:10PM -0700, Chris Hostetter wrote:
> You may want to check out the java-dev list ... there's been some talk
> among the people who really unerstand the low levels of lucene's file
> formats about adding arbitrary "payload" data with each term/doc pair .. a
> proposal that started (as far as i can tell) from a desire to have
> individual term/doc boosting...

It's funny that you don't seem to include yourself in that group yet, Hoss. I
imagine it won't be long.  If you haven't read that Brin/Page paper from 1998
yet, you should check it out.

Enabling individual positions to be boosted is indeed one of the main targets
of the current discussion.  A slightly easier to understand application would
be boosting individual tokens according to relative font size. For instance,
we might assume that text between <h1> tags is more important than text
between <p> tags and boost it.  There's no good way to handle that right now.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message