lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry Steichen" <te...@net-frame.com>
Subject Re: Lucene Scoring Behavior
Date Sun, 21 Sep 2003 21:24:08 GMT
Doug,

Well, my good intentions (to reindex on Thursday night) were interrupted by
Hurricaine Isabel (followed by a 44 hour power outtage).

Well, excuses aside, I did get the reindex done today and the scores for all
hits from a single date query come out to be the same score (as they
should).  Don't have any idea what screwed up the previous index - though as
promised, I'll keep an eye on it as I continue to merge new stuff over the
next few days/weeks.

Is there a way, using standard Lucene configuration parameters and/or API's,
to force the hit scores to come out so the highest one is set to 1, and the
others are proportionately lower?

Regards,

Terry

----- Original Message -----
From: "Terry Steichen" <terry@net-frame.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Thursday, September 18, 2003 10:10 AM
Subject: Re: Lucene Scoring Behavior


> Doug,
>
> I just extracted a portion of the database, reindexed and found the scores
> to come out much more like we'd expect.  Appears this may be an indexing
> issue - I index new stuff each day and merge the new index with the master
> index.  Only redo the master when I can't avoid it (because it takes so
> long).  I probably merge 100 times or more before reindexing.  This
evening
> I'll reindex and let you know if the apparent problem clears up.  If so,
> I'll keep track of it as I continue to merge and see if there's any issue
> there.
>
> Thanks for the input (and from Erik, pointing me to the Explanation - it's
> pretty neat).
>
> Question: The new scores for the test database portion mentioned above all
> seem to come out in the range of .06 to .07.  I assume this is because
they
> never get normalized.  If this is the case, (a) would it hurt anything to
> "normalize up" (so the scores range up to 1), and if so (b) is there an
> easy, non-disruptive (to the source code) way to do this?
>
> Regards,
>
> Terry
>
>
> ----- Original Message -----
> From: "Doug Cutting" <cutting@lucene.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Wednesday, September 17, 2003 11:15 PM
> Subject: Re: Lucene Scoring Behavior
>
>
> > Hmm.  This makes no sense to me.  Can you supply a reproducible
> > standalone test case?
> >
> > Doug
> >
> > Terry Steichen wrote:
> > > Doug,
> > >
> > > (1) No, I did *not* boost the pub_date field, either in the indexing
> process
> > > or in the query itself.
> > >
> > > (2) And, each pub_date field of each document (which is in XML format)
> > > contains only one instance of the date string.
> > >
> > > (3) And only the pub_date field itself is indexed.  There are other
> > > attributes of this field that may contain the date string, but they
> aren't
> > > indexed - that is, they are not included in the instantiated Document
> class.
> > >
> > > Regards,
> > >
> > > Terry
> > >
> > > ----- Original Message -----
> > > From: "Doug Cutting" <cutting@lucene.com>
> > > To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > > Sent: Wednesday, September 17, 2003 5:51 PM
> > > Subject: Re: Lucene Scoring Behavior
> > >
> > >
> > >
> > >>Terry Steichen wrote:
> > >>
> > >>>  0.03125 = fieldNorm(field=pub_date, doc=90992)
> > >>>  1.0 = fieldNorm(field=pub_date, doc=90970)
> > >>
> > >>It looks like the fieldNorm's are what differ, not the IDFs.  These
are
> > >>the product of the document and/or field boost, and 1/sqrt(numTerms)
> > >>where numTerms is the number of terms in the "pub_date" field of the
> > >>document.  Thus if each document is only assigned one date, and you
> > >>didn't boost the field or the document when you indexed it, this
should
> > >>be 1.0.  But if the document has two dates, then this would be
> > >>1/sqrt(2).  Or if you boosted this document pub_date field, then this
> > >>will have whatever boost you provided.
> > >>
> > >>So, did you boost anything when indexing?  Or could a single document
> > >>have two or more different values for pub_date?  Either would explain
> > >
> > > this.
> > >
> > >>Doug
> > >>
> > >>
> > >>---------------------------------------------------------------------
> > >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > >>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > >>
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


Mime
View raw message