lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: SV: SV: SV: Changing the Scoring api
Date Thu, 14 Sep 2006 17:36:59 GMT

I obviously missunderstood your goal ... my reading of your question was
that you wanted the sum of the scores of individual terms (based on the tf
and idf) to matter, and you wanted the field norm values of the docs to be
taken into account (for "date boosting" purposes), but you did not want
documents to be 'penalized' by only matching some (but not all) of the
terms.  If you don't want the tf or idf to be taken into account either,
so that *only* the sums of the field norms are used, that can
still be done by overriding the coorisponding Similarity class methods
without modifying any Scorers ... and if you really don't want the sum
(just the xax) that can be done using DisjunctionMaxQueries instead of
BooleanQuery ... but i'm just speculating now because as i said, i
obviously missunerstood your question, so what exactly do you want, in
concrete terms?

: Yeah Hoss you are right this isn't java it's the .NET port. But I have
: to ask at this mail list since it contains a lot of people with a lot
: more insight in lucene then on the .NET user list.

Nothing personal, but that's the worst justification i've ever heard.
The Lucene.Net community is never going to grow/thrive if people don't
participate in it.  Looking at the archives, it doesn't appear you ever
attempted to post this (or any other) question to either of the Lucene.Net
mailing lists, so how can you say that you *have* to ask the Java Users
list in order to reach people with insight?  How do you know what kinds of
insight the Lucene.Net subscribers have? How do ou expect the Lucene.Net
community as a whole to gain insight if no one participates?


: And I have a hard time to believe that they wouldn't have ported the scoring parts correctly.

I wasn't suggestion it wouldn't be ported properly, i was pointing out
that different langauges (and differnet ports of APIs) have differnet
nuances.  The first thing i thought when looking at your Similarity class
was that it wasn't getting used at all because all of your method names
started with Capital letters -- it seemed like a very simple mistake for a
novice Java programmer to make.

: I haven't looked as much at the FunctionQuery in solr since I can't find
: any good documentation for it.  But if I write a function for a field
: don't the field values have to be in the field cache for applying this
: function? And since I'm dealing with a lot of data this will severely
: affect the overall performance.

1) if you have suggestions for improving the FunctionQuery javadocs, i'm
all ears ... it's not always easy for people who work with things daily to
realize how the documentation can be viewed as lacking by people less
familiar with it.  For me, i see that FunctionQueries are built from
ValuesSources, and that the classes which impliment ValueSource are
FieldCacheSource, LinearFloatFunction, etc... and just go from there.

2) having the values you want to compute functions on in the FieldCache
has no significantly greater impact on the performance of a query then the
fieldNorms you are currently using: in both cases there is an array with
one entry per doc; the only differnece is that fieldNorms are stored in a
byte[], and the FieldCache ues either int[] or float[] -- but you already
said you modified your fieldNorms to be float[] didn't you? ... so the
performance of FunctionQuery shouldn't be any different -- just easier to
maintain.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message