lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Per-token weighting / attribute data in index
Date Fri, 02 Jun 2006 20:57:25 GMT

i may be missunderstanding your goal .. it sounds like what you want to do
is say thta for certain documents (which you trust) matching on the title
is "worth more" then matching on the title of other documents (which you
don't trust)

if that' the case, then at index time you can add field boost on the
title just for hte documents you trust, and add no boost for hte documents
you don't trust.

I've i've missunderstood your question, could you provide a use case
describing your goal, and where lucene fails to meet it?

: Date: Fri, 2 Jun 2006 13:14:41 -0700
: From: Scott Davies <>
: Reply-To:
: To:
: Subject: Per-token weighting / attribute data in index
: Hi...reasonably experienced web search programmer but total Lucene newbie here.
: After poking through Lucene for a while, I still haven't figured out a
: decent way to tweak the scoring based on per-token data.  For example,
: as far as I can tell so far, the only reasonable way to have words in
: the titles or headers of HTML documents be "worth more" for scoring
: purposes than ordinary body text is to make "title" and "header"
: fields and apply appropriate field boosts across all documents.  That
: works OK if you only have a few special fields you want to boost by
: some consistent amount each, but falls down if, say, you wanted to
: include some sort of "tags" or anchortext in the scoring of documents
: where there's a high degree of variability in how much any given tag
: or anchor should be "trusted" and thus influence the score.  (I could
: conceivably discretize the boosts and, say, put all the anchortext
: with boost 2.5 in a special "anchortext-boost2.5" field, but that
: would be extremely awkward and presumably cause major performance
: issues as the number of fields increases.)
: Have I just failed to notice the right way to do this, or is there
: really no decent way to do it in Lucene at this time?  If the latter,
: are there any plans to add this feature at some point semi-soon?  This
: seems to me like a major scoring limitation for applications not just
: indexing and searching over plain text documents...
: Thanks,
: -- Scott
: ---------------------------------------------------------------------
: To unsubscribe, e-mail:
: For additional commands, e-mail:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message