lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mek <>
Subject Re: Very high fieldNorm for a field resulting in bad results
Date Thu, 28 Sep 2006 08:00:08 GMT
> it depends on your goal.  index time field boosts are a way to express
> things like "this documents title is worth twice as much as the title of
> most documents" query time boosts are a way to express "i care about
> matches on this clause of my query twice as much as i do about matches to
> other clauses of my query.

Assuming I want to boost the fields with the same value for all documents,
can this be replaced by query-time boosting.

> : 2. When searching through the archive I had read a post by you, saying
> : its possible to give exact matches much higher weightage by indexing
> : the START & END
> : from :
> the context was that even if you turn off field norms you can still some
> score benefits/restrictions of matches on shorter fields vs longer fields
> by indexing marker tokens (things you wouldn't expect to be regular
> tokens; i used START and END just for convinience)  at the begining and
> ending of hte field, and then including them in your phrase or span near
> query with lots of slop ... so a values like...

Cool, that was a good solution.

I, though, am storing the norms & yet do not get exact matches ranking
higher than others.

I do change  "user query"  to  "user query"^PHRASE_BOOST user query

this helps to some level, but my problem comes from the fact that
given a user query, i dont know which field to search in & hence
construct a boolean query that searches in all fields.

The issue with this is, that an exact match in field A ends up ranking
lower than a broader match in field A of another doc which also
matches partially with field B
Are there any easy solutions that have been used before ?

eg. Britney spears -> "Britney spear"^N britney spears
will match :

doc1 - A:"BritneySpears, and other celebrities"  B:William Spears
doc2   A: "Britney Spears"

the problem is aggravated by the fact that a query for "spears
william" should get me doc1 (that is is should actually query & score
based on all fields).

The only good solution I can think of is - giving very very high
weightage to the phrase, but even that did not seem to work.

Another possible idea - make length norm play a greater role in the
score (use length norm ^ 2 ? ) .

What else can I try ?

Thanks a lot for the responses Chris,

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message