lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niraj Alok" <ni...@emacmillan.com>
Subject Re: score and frequency
Date Thu, 24 Jun 2004 14:14:46 GMT
Hi Franck,

That involves a lot of work and most importantly, reindexing, which takes 4
hours for the data I have (approx. 800 MB) .
I will try the same and will get back to you. You really deserve a treat frm
my side :)


Regards,
Niraj
----- Original Message -----
From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Thursday, June 24, 2004 7:38 PM
Subject: Re: score and frequency


> I don't think your field title can be used to search for the exact
> match, because in your field, the title is probably tokenized.
> The other field for the title should contain the title as a keyword, ie
> not tokenized. So, if the title is 'sea-lion', the term
> ("title","sea-lion") is store.
>
> The transformation I told you is to avoid storing the raw term, ie with
> the case and special characters.
> Something like that:
> --
> StandardAnalyzer analyzer = new StandardAnalyzer();
> Token token;
> StringBuffer buf = new StringBuffer();
> try {
>    stream = analyzer.tokenStream("title", new StringReader(title));
>    while ((token = stream.next()) != null) {
>      qBuf.append(' ');
>      qBuf.append(token.termText());
>    }
> } catch(IOException ioe) {
>    ioe.printStackTrace();
> }
> String transformedTitle = buf.toString().trim();
> --
> So, if you search for "sea-lion", "sea  lion" or "Sea lion", the
> transformed text will be "sea lion", the TermQuery will be new
> TermQuery(new Term("newField", "sea lion")) and you could then search
> for an exact match.
>
> Franck
>
> Niraj Alok wrote:
> > Hi Franck,
> >
> > I already had a field separately for the title. Replacing the
PhraseQuery
> > with the TermQuery did not help.
> > I haven't tried the transformation part. What kind of transformation are
you
> > talking about ? How to do this transformation?
> > Can you provide some more details please?
> >
> >
> > Regards,
> > Niraj
> > ----- Original Message -----
> > From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
> > To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > Sent: Thursday, June 24, 2004 7:04 PM
> > Subject: Re: score and frequency
> >
> >
> >
> >>Forget about the PhraseQuery, I'm stupid, it can't work like that.
> >>Because the phrase query will boost the documents which contain the
> >>search and not the documents which match exactly the search. So, the
> >>exact matches will come down. :-/
> >>
> >>You need to have some information about the lucene documents to know if
> >>it's an exact match. Such as the number of terms in the documents. The
> >>problem is that this number is store in the lengthNorm and as it's
> >>encoded on 1 byte, you can't have it precisely. So, you should shunt the
> >>problem.
> >>Here's another suggestion (a good one I hope):
> >>Add another field containing the title as a Keyword. Then you just have
> >>to replace the PhraseQuery I told you to use by a TermQuery searching
> >>for the term (newField,search)
> >>Of course, it will be a bit too restrictive to store the title without
> >>any transformation. You can for example store in this field the
> >>concatenation of the token given by your analyzer. Just don't forget to
> >>do the same transformation also for the search.
> >>Sorry for the previous posts.
> >>
> >>Franck
> >>
> >>Niraj Alok wrote:
> >>
> >>>Hi Franck,
> >>>
> >>>You seem to be a genius in lucene !
> >>>
> >>>I have done finally all that which you have suggested, but now when I
am
> >>>searching for "lion", those terms are coming much below in terms of
> >
> > scores.
> >
> >>>This is despite me setting the boost for the phrase query. Infact, this
> >
> > is
> >
> >>>resulting in almost all the exact matches to come down.
> >>>
> >>>
> >>>Regards,
> >>>Niraj
> >>>----- Original Message -----
> >>>From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
> >>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>>Sent: Thursday, June 24, 2004 6:02 PM
> >>>Subject: Re: score and frequency
> >>>
> >>>
> >>>
> >>>
> >>>>It may come from the boolean clauses. If you add your sub-queries with
a
> >>>>'required' flag, you'll only get the results matching all the words in
> >>>>your query.
> >>>>It can also come from the score which is different. If you set up a
> >>>>threshold to return the results, it can be the problem.
> >>>>
> >>>>Franck
> >>>>
> >>>>
> >>>>Niraj Alok wrote:
> >>>>
> >>>>
> >>>>>Hi Franck,
> >>>>>
> >>>>>Thank you so much for the detailed explanation.
> >>>>>However, when I tried to break up my MultiFieldQueryParser into a
> >
> > series
> >
> >>>of
> >>>
> >>>
> >>>>>BooleanQueries, the result set has got reduced drastically.
> >>>>>Any idea why this could be happening?
> >>>>>
> >>>>>Regards,
> >>>>>Niraj
> >>>>>----- Original Message -----
> >>>>>From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
> >>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>>>>Sent: Thursday, June 24, 2004 2:54 PM
> >>>>>Subject: Re: score and frequency
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>The MultiFieldQueryParser give you a BooleanQuery containing
1 query
> >
> > for
> >
> >>>>>>each field.
> >>>>>>Something like:
> >>>>>>          BooleanQuery
> >>>>>>          /   |   |   \
> >>>>>>        QF1  QF2 QF3  QF4    (QFx=Query for field x)
> >>>>>>
> >>>>>>You can still use the MultiFieldQueryParser and create a
BooleanQuery
> >
> > to
> >
> >>>>>>encapsulate the one parsed + the PhraseQuery, ie:
> >>>>>>           BooleanQuery(created by you)
> >>>>>>            /       \
> >>>>>>          BQ      PhraseQuery
> >>>>>>
> >>>>>>Or create the whole query (I think you should do that) and have
> >>>>>>something like that:
> >>>>>>           _BooleanQuery__
> >>>>>>          /   |   |   \   \
> >>>>>>        QF1  QF2 QF3  QF4  PhraseQuery      (QFx=Query for field
x)
> >>>>>>
> >>>>>>It's like parsing the following query:
> >>>>>>(field1:query) (field2:query) (field3:query)...(fieldx:query)
> >>>>>>(title:"query")~boost
> >>>>>>
> >>>>>>
> >>>>>>Franck
> >>>>>>
> >>>>>>
> >>>>>>Niraj Alok wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>I asked the previous question since I do not know how to
use
> >>>
> >>>PhraseQuery
> >>>
> >>>
> >>>>>>>I have one booleanquery and one query.
> >>>>>>>The query is Query query =  MultiFieldQueryParser.parse(
qs,
> >
> > searchLoc,
> >
> >>>>>>>flags, new StandardAnalyzer(stop));
> >>>>>>>
> >>>>>>>where qs is the word to be searched upon and searchLoc contains
all
> >
> > the
> >
> >>>>>four
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>fields.
> >>>>>>>
> >>>>>>>How do I insert a PhraseQuery here for title field only,
and that
too
> >>>>>
> >>>>>with
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>its boosted value?
> >>>>>>>
> >>>>>>>
> >>>>>>>Regards,
> >>>>>>>Niraj
> >>>>>>>----- Original Message -----
> >>>>>>>From: "Niraj Alok" <niraj@emacmillan.com>
> >>>>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>>>>>>Sent: Thursday, June 24, 2004 2:00 PM
> >>>>>>>Subject: Re: score and frequency
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>Does it mean that I would need to abandon MultiFieldQueryParser?
> >>>>>>>>
> >>>>>>>>Regards,
> >>>>>>>>Niraj
> >>>>>>>>----- Original Message -----
> >>>>>>>>From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
> >>>>>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>>>>>>>Sent: Thursday, June 24, 2004 1:22 PM
> >>>>>>>>Subject: Re: score and frequency
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>Hi,
> >>>>>>>>>first, what do you consider as an 'exact matching'
? It seems
that
> >>>
> >>>you
> >>>
> >>>
> >>>>>>>>>treat the search word by word, so 'lion sea' will
be an 'exact
> >
> > match'
> >
> >>>>>of
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>>>'sea-lion'.
> >>>>>>>>>I think you should add a PhraseQuery to your query
containing the
> >>>
> >>>title
> >>>
> >>>
> >>>>>>>>>and with a big boost. So, you don't need to boost
your title
field.
> >>>>>
> >>>>>Only
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>>>the results matching exactly (for the PhraseQuery)
will be
boosted.
> >>>>>>>>>
> >>>>>>>>>Franck
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>Niraj Alok wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>Hi Guys,
> >>>>>>>>>>
> >>>>>>>>>>I seem to have run into rough weather again.
> >>>>>>>>>>To describe the problem as concisely as possible,
I have four
> >
> > fields
> >
> >>>>>>>to
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>search upon : title , first para, rest of the paras and
content
> >
> > (equal
> >
> >>>>>to
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>>title + first para + rest of the para) .  I am doing
this by using
> >>>>>>>>MultiFieldQueryParser.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>Now there is a very complicated ranking algrorithm
specified by
> >
> > the
> >
> >>>>>>>>client and I have met most of them except one or two
and really
need
> >>>>>
> >>>>>your
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>>help as all my other efforts have failed.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>The most important rule is that exact matching
titles should
come
> >>>>>>>
> >>>>>>>first
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>, i.e. get higher scores.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>I have given the highest boost factor to the
title than the rest
> >
> > but
> >
> >>>>>>>the
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>problem comes up when there is some other title which
has got just
> >
> > one
> >
> >>>>>>>word
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>matching. For e.g., if I search for lion, there is a
title
sea-lion
> >>>>>
> >>>>>which
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>>also has the same boost factor as that of "lion" in the
index.
Also,
> >>>>>>>>sea-lion has got some more "lion" in its first para or
rest of the
> >>>
> >>>paras
> >>>
> >>>
> >>>>>>>>etc. such that its score comes higher than "lion".
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>Is there some way to get the exact matching titles
higher
scores?
> >>>>>>>>>>Please reply soon.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>Regards,
> >>>>>>>>>>Niraj
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>----- Original Message -----
> >>>>>>>>>>From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
> >>>>>>>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>>>>>>>>>Sent: Monday, June 07, 2004 12:50 PM
> >>>>>>>>>>Subject: Re: score and frequency
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>It seems that you don't the length norm to
be used. It's a
factor
> >>>>>>>
> >>>>>>>which
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>>>>normalize the score of a doc depending on
the size of the
> >
> > searched
> >
> >>>>>>>field
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>>>>of the doc. It's the field which make that
'ground ice' has a
> >>>
> >>>higher
> >>>
> >>>
> >>>>>>>>>>>score than 'ice hockey: British Sekonda Superleague
Play-Off
> >>>>>>>>>>>Championship: finals' because it only has
2 terms.
> >>>>>>>>>>>So, I suggest you to override the lengthNorm
method and to
ignore
> >>>
> >>>the
> >>>
> >>>
> >>>>>>>>>>>numTokens parameter.
> >>>>>>>>>>>NB: The length norm is computed during the
indexation and the
> >
> > norm
> >
> >>>>>are
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>>>>>store in the index (in the _aaa.f# files).
So, you need to do
> >>>>>
> >>>>>re-index
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>>>>>your data, and use this similarity during
the indexation.
> >>>>>>>>>>>
> >>>>>>>>>>>Cheers,
> >>>>>>>>>>>Franck
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>Niraj Alok wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>I have set the searcher.setSimilarity
 as well as also tried
> >>>
> >>>setting
> >>>
> >>>
> >>>>>>>>the
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>coord factor to 1.
> >>>>>>>>>>>>
> >>>>>>>>>>>>The problem as given by an example is
: Lets say I have titles
> >
> > to
> >
> >>>be
> >>>
> >>>
> >>>>>>>>>>>>displayed depending upon the search.
> >>>>>>>>>>>>E.g if i have "ice hockey" as the search
item and if it is
> >
> > default
> >
> >>>>>>>>>>>>similarity, my results are :
> >>>>>>>>>>>>
> >>>>>>>>>>>>ice hockey0.99999994
> >>>>>>>>>>>>ice hockey0.75
> >>>>>>>>>>>>ice hockey0.75
> >>>>>>>>>>>>winter Olympics: hockey, ice, medallists0.17402513
> >>>>>>>>>>>>ice age0.073680125
> >>>>>>>>>>>>National Hockey League0.020266924
> >>>>>>>>>>>>Cracking the Ice Age0.018420031
> >>>>>>>>>>>>ground-ice0.011512519
> >>>>>>>>>>>>ice hockey: British Sekonda Superleague
Play-Off Championship:
> >>>>>>>>>>>>finals0.0069075115
> >>>>>>>>>>>>(the numbers indicating the score).
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>But if i set the similarity as my overridden
one, the results
> >>>>>
> >>>>>become:
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>>>>>>ice hockey0.99999994
> >>>>>>>>>>>>ice hockey0.75
> >>>>>>>>>>>>ice hockey0.75
> >>>>>>>>>>>>ice age0.22104037
> >>>>>>>>>>>>winter Olympics: hockey, ice, medallists0.17402513
> >>>>>>>>>>>>National Hockey League0.060800765
> >>>>>>>>>>>>Cracking the Ice Age0.055260092
> >>>>>>>>>>>>ground-ice0.034537554
> >>>>>>>>>>>>ice hockey: British Sekonda Superleague
Play-Off Championship:
> >>>>>>>>>>>>finals0.020722535
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>I want all the titles which have both
"ice" and "hockey" to
come
> >>>>>>>
> >>>>>>>above
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>the
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>rest (to have higher scores)
> >>>>>>>>>>>>Meaning i would wish the results to appear
like:
> >>>>>>>>>>>>
> >>>>>>>>>>>>ice hockey
> >>>>>>>>>>>>ice hockey
> >>>>>>>>>>>>ice hockey
> >>>>>>>>>>>>winter Olympics: hockey, ice, medallists
> >>>>>>>>>>>>ice hockey: British Sekonda Superleague
Play-Off Championship:
> >>>>>
> >>>>>finals
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>>>>>>ice age
> >>>>>>>>>>>>National Hockey League
> >>>>>>>>>>>>Cracking the Ice Age
> >>>>>>>>>>>>ground-ice
> >>>>>>>>>>>>
> >>>>>>>>>>>>My overriden similarity class contains
just this method:
> >>>>>>>>>>>>public float coord(int overlap, int maxOverlap)
{
> >>>>>>>>>>>>
> >>>>>>>>>>>>return 1.0f;
> >>>>>>>>>>>>
> >>>>>>>>>>>>}
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>I feel it is the weight factor which
is producing indesirable
> >>>>>>>
> >>>>>>>results.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>Any
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>help in this regard would be highly appreciated.
> >>>>>>>>>>>>
> >>>>>>>>>>>>Regards,
> >>>>>>>>>>>>Niraj
> >>>>>>>>>>>>
> >>>>>>>>>>>>----- Original Message -----
> >>>>>>>>>>>>From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
> >>>>>>>>>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>>>>>>>>>>>Sent: Friday, June 04, 2004 8:46 PM
> >>>>>>>>>>>>Subject: Re: score and frequency
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>>Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>Be careful to set the default similarity
> >>>>>>>>>>>>>'Similarity.setDefault(similarity)'
before creating your
search
> >>>>>>>>
> >>>>>>>>instance
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>>(IndexSearcher).
> >>>>>>>>>>>>>If you change the default similarity
after, you'll still use
> >
> > the
> >
> >>>>>old
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>>one.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>>You'd better use the 'searcher.setSimilarity'
method on your
> >>>>>>>
> >>>>>>>searcher.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>>>>>>Franck
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>Phil brunet wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>Hi to all.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>Maybe the term frequency is not
the only parameter you need
to
> >>>>>>>>
> >>>>>>>>override
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>>>to "customize" the score attributed
by Lucene.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>Maybe you should consider the
normalisation factor, the idf
> >
> > and
> >
> >>>>>the
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>>>>>>>>coord factor ?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>Philippe
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>From: "Niraj Alok" <niraj@emacmillan.com>
> >>>>>>>>>>>>>>>Reply-To: "Lucene Users List"
> >
> > <lucene-user@jakarta.apache.org>
> >
> >>>>>>>>>>>>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>>>>>>>>>>>>>>Subject: Re: score and frequency
> >>>>>>>>>>>>>>>Date: Fri, 4 Jun 2004 15:13:32
+0530
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>Hi Erik,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>Thanks for the suggestion.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>I tried this:
> >>>>>>>>>>>>>>>public class RelevanceSimilarity
extends DefaultSimilarity
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>{
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>public float tf(float freq)
{
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>System.out.println("discounting
frequency");
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>return (float)1;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>}
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>}
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>and in my query class, I
used :
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>Similarity.setDefault(similarity);
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>Hits hits = is.search(query);
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>for(i = 0; i < hits.length();
i ++)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>result = result + hits.score(i);
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>However, this is still not
giving me the expected result.
Do
> >
> > I
> >
> >>>>>>>need
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>to
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>do
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>>>>something else?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>Regards,
> >>>>>>>>>>>>>>>Niraj
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>----- Original Message -----
> >>>>>>>>>>>>>>>From: "Erik Hatcher" <erik@ehatchersolutions.com>
> >>>>>>>>>>>>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>>>>>>>>>>>>>>Sent: Friday, June 04, 2004
1:55 PM
> >>>>>>>>>>>>>>>Subject: Re: score and frequency
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>On Jun 4, 2004, at 2:52
AM, Niraj Alok wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>Hi,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>I am having some
problems with the score of lucene.
> >>>>>>>>>>>>>>>>>I am trying to get
the results displayed according to
> >>>>>
> >>>>>hits.score
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>>>>>>and
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>it is giving the
results correctly.
> >>>>>>>>>>>>>>>>>However I do not
want the frequency factor to be used for
> >
> > the
> >
> >>>>>>>>>>>>>>>>>computation of the
score.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>Is it possible to
get the score which does not have the
> >>>>>>>
> >>>>>>>frequency
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>>>>>>>>>>factor in it ?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>Have a look at the javadocs
for Similarity.
> >
> > DefaultSimilarity
> >
> >>>>>is
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>>>>>>used
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>>>>>unless otherwise specified.
 You could subclass that and
> >>>>>
> >>>>>override
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>>>>>>this:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>>>>>public float tf(float
freq) {
> >>>>>>>>>>>>>>>>return (float)Math.sqrt(freq);
> >>>>>>>>>>>>>>>>}
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>and return 1.0.  This
might give you the effect you want.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>Erik
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>
>
>>>>>>>>>>>-----------------------------------------------------------------
-
> >
> > -
> >
> >>>-
> >>>
> >>>
> >>>>>-
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>>>>>>>>>>To unsubscribe, e-mail:
> >>>>>>>
> >>>>>>>lucene-user-unsubscribe@jakarta.apache.org
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>>>>>>>>>For additional commands,
e-mail:
> >>>>>>>>
> >>>>>>>>lucene-user-help@jakarta.apache.org
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
>
>>>>>>>>>>------------------------------------------------------------------
-
> >
> > -
> >
> >>>-
> >>>
> >>>
> >>>>>>>>>>>>>>>To unsubscribe, e-mail:
> >>>>>
> >>>>>lucene-user-unsubscribe@jakarta.apache.org
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>>>>>>>>>For additional commands,
e-mail:
> >>>>>>>
> >>>>>>>lucene-user-help@jakarta.apache.org
> >>>>>>>
> >>>>>>>
> >>>>
>
>>>>>>>>>>>>________________________________________________________________
_
> >>>>>>>>>>>>
> >>>>>>>>>>>>>>Bloquez les fenĂȘtres pop-up,
c'est gratuit !
> >>>
> >>>http://toolbar.msn.fr
> >>>
> >>>
>
>>>>>>>>>-------------------------------------------------------------------
-
> >
> > -
> >
> >>>>>>>>>>>>>>To unsubscribe, e-mail:
> >>>
> >>>lucene-user-unsubscribe@jakarta.apache.org
> >>>
> >>>
> >>>>>>>>>>>>>>For additional commands, e-mail:
> >>>>>>>
> >>>>>>>lucene-user-help@jakarta.apache.org
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>>>>>>--
> >>>>>>>>>>>>>Franck Brisbart
> >>>>>>>>>>>>>R&D
> >>>>>>>>>>>>>http://www.kelkoo.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>
>
>>>>>>>>>-------------------------------------------------------------------
-
> >
> > -
> >
> >>>>>>>>>>>>>To unsubscribe, e-mail:
> >>>
> >>>lucene-user-unsubscribe@jakarta.apache.org
> >>>
> >>>
> >>>>>>>>>>>>>For additional commands, e-mail:
> >>>>>
> >>>>>lucene-user-help@jakarta.apache.org
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
>
>>>>>>>>>-------------------------------------------------------------------
-
> >
> > -
> >
> >>>>>>>>>>>>To unsubscribe, e-mail:
> >
> > lucene-user-unsubscribe@jakarta.apache.org
> >
> >>>>>>>>>>>>For additional commands, e-mail:
> >>>
> >>>lucene-user-help@jakarta.apache.org
> >>>
> >>>
> >>>>>>>>>>>--
> >>>>>>>>>>>Franck Brisbart
> >>>>>>>>>>>R&D
> >>>>>>>>>>>http://www.kelkoo.com
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>
>
>>>>>>>>>-------------------------------------------------------------------
-
> >
> > -
> >
> >>>>>>>>>>>To unsubscribe, e-mail:
> >
> > lucene-user-unsubscribe@jakarta.apache.org
> >
> >>>>>>>>>>>For additional commands, e-mail:
> >>>
> >>>lucene-user-help@jakarta.apache.org
> >>>
> >>>
> >>>>>>>>>>
> >>>>>>>>>--
> >>>>>>>>>Franck Brisbart
> >>>>>>>>>R&D
> >>>>>>>>>http://www.kelkoo.com
> >>>>>>>>>
> >>>>>>>>>
> >>
>
>>>>>>>>--------------------------------------------------------------------
-
> >>>>>>>>
> >>>>>>>>>To unsubscribe, e-mail:
lucene-user-unsubscribe@jakarta.apache.org
> >>>>>>>>>For additional commands, e-mail:
> >
> > lucene-user-help@jakarta.apache.org
> >
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
>
>>>>>>>---------------------------------------------------------------------
> >>>>>>>
> >>>>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>>>>>>For additional commands, e-mail:
lucene-user-help@jakarta.apache.org
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
>
>>>>>>>---------------------------------------------------------------------
> >>>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>>>>>For additional commands, e-mail:
lucene-user-help@jakarta.apache.org
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>--
> >>>>>>Franck Brisbart
> >>>>>>R&D
> >>>>>>http://www.kelkoo.com
> >>>>>>
> >>>>>>
>
>>>>>>---------------------------------------------------------------------
> >>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>>>>>
> >>>>>
> >>>>>
> >>>>--
> >>>>Franck Brisbart
> >>>>R&D
> >>>>http://www.kelkoo.com
> >>>>
> >>>>
> >>>>---------------------------------------------------------------------
> >>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>>>
> >>>
> >>>
> >>
> >>--
> >>Franck Brisbart
> >>R&D
> >>http://www.kelkoo.com
> >>
> >>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>
> >
> >
>
>
> --
> Franck Brisbart
> R&D
> http://www.kelkoo.com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message