Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (idunn.apache.osuosl.org: domain gmail.com designates
 64.233.182.186 as permitted sender)
DomainKey-Status: good
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
        s=beta; d=gmail.com;
        h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
        b=Wb6n7FJASpzrA6ZMQ5H0w5Qvm+dMvB/MtbnHUwb+PPkrOX/FvhLJ9O9/J+m/FvMXWptzerCzLPUDd1oNao4ExN4KNHIkaIQ6+X/cWAbL7tKLkbPUx5fChJGlxvACKPRvu8xc8hoa44N1ADzO4pPzXxkCx/GY8Z+VZUS0X4pmSMY=
Message-ID: <865c77680609261944h335583acy6b6900ecb898e8eb@mail.gmail.com>
Date: Wed, 27 Sep 2006 08:14:48 +0530
From: Mek <mekin.m@gmail.com>
To: java-user@lucene.apache.org
Subject: Re: Very high fieldNorm for a field resulting in bad results
In-Reply-To: <Pine.LNX.4.58.0609261155270.13298@hal.rescomp.berkeley.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <865c77680609250327j265039aeo43b8d771e405a5dc@mail.gmail.com>
	 <Pine.LNX.4.58.0609261155270.13298@hal.rescomp.berkeley.edu>

Thanks a lot Chris for the detailed & patitent response.


>
> The value of a the field norm for any field named "A" is typically the
> lengthNorm of the field, times the document boost, times the field boost
> for *each* Field instance added to the document with the name "A".
> (lengthNorm is by default 1/swrt(num of terms))

That explains the very high value for the fieldNorm. The boost value
became boost_vale^#of  values in the field.

A couple of more questions:

1. Can I do away with index-time boosting for fields & tweak
query-time boosting for them ? I understand that doc level boosting is
very useful while indexing.
But for fields, both index-boost & query-boost are mutiples which lead
to the score, so would it be safe to say that I can replace the
index-time boost with query-time boosting. This allows me a lot of
freedom to test different values without re-indexing which takes  me
about 6 hours.

2. When searching through the archive I had read a post by you, saying
its possible to give exact matches much higher weightage by indexing
the START & END
from : http://www.nabble.com/What-are-norms--tf1919250.html#a5335856
"it is possible to score exact matches on (tokenized) fields very high
without using lengthNorm by indexing START and END tokens for the field as
well, and then including them in your sloppy phrase queries -- the
"tighter" match will score highest."

Can you please elaborate on this,

Thanks a ton for the response,
mekin

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org