lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: weightage of each word according to precedence in document
Date Mon, 06 Feb 2012 11:13:04 GMT
At least it doesn't give the same score for a doc which doesn't have
all the terms which I think at one point you claimed.

So to try and simplify this, you've got one field called content and

doc1: pqrst uvwx abcd
doc2: abcd pqrst uvwx

and the query "abcd^10.0 content:pqrst^5.0" gives the same score for
doc1 and doc2.  That is to be expected since both docs are the same
length and both contain both search terms.

As I said before, if you want the order of matched terms to matter,
see PhraseQuery or SpanQuery.

Or store positional info in a Payload and factor that in somehow.
Powerful but complicated.  See
http://www.lucidimagination.com/blog/2010/04/18/refresh-getting-started-with-payloads/
for an example.

I can't think of another way to make, in your case, abcd score higher
if is first rather than third term in the doc.  I'd try a SpanQuery
with some reasonable slop value and add it as an optional clause to
your query, possibly with a boost.


--
Ian.


On Sat, Feb 4, 2012 at 10:11 AM, A Z <4azfriend@gmail.com> wrote:
> hi lan,
>
> sorry for late reply ,
>
> it is simple search with default similarity only,
> here it gives same score for doc which has both token that is abcd pqrst,
> there is no more weight for doc which has predence of abcd in document .
>
> here is output with score and searcher.explain
>
>
> Query content:abcd^10.0 content:pqrst^5.0
>
> *title ->pqrst uvwx abcd ::: content -> pqrst uvwx abcd::: Score ->0.6175326
> *
>
> Searcher.explain -> 0.6175326 = (MATCH) sum of:
>
> 0.46281427 = (MATCH) weight(content:abcd^10.0 in 0), product of:
>
> 0.92562854 = queryWeight(content:abcd^10.0), product of:
>
> 10.0 = boost
>
> 1.0 = idf(docFreq=4, maxDocs=5)
>
> 0.092562854 = queryNorm
>
> 0.5 = (MATCH) fieldWeight(content:abcd in 0), product of:
>
> 1.0 = tf(termFreq(content:abcd)=1)
>
> 1.0 = idf(docFreq=4, maxDocs=5)
>
> 0.5 = fieldNorm(field=content, doc=0)
>
> 0.15471835 = (MATCH) weight(content:pqrst^5.0 in 0), product of:
>
> 0.37843326 = queryWeight(content:pqrst^5.0), product of:
>
> 5.0 = boost
>
> 0.81767845 = idf(docFreq=5, maxDocs=5)
>
> 0.092562854 = queryNorm
>
> 0.40883923 = (MATCH) fieldWeight(content:pqrst in 0), product of:
>
> 1.0 = tf(termFreq(content:pqrst)=1)
>
> 0.81767845 = idf(docFreq=5, maxDocs=5)
>
> 0.5 = fieldNorm(field=content, doc=0)
>
> *title ->abcd pqrst uvwx ::: content -> abcd pqrst uvwx::: Score ->0.6175326
> *
>
> Searcher.explain -> 0.6175326 = (MATCH) sum of:
>
> 0.46281427 = (MATCH) weight(content:abcd^10.0 in 1), product of:
>
> 0.92562854 = queryWeight(content:abcd^10.0), product of:
>
> 10.0 = boost
>
> 1.0 = idf(docFreq=4, maxDocs=5)
>
> 0.092562854 = queryNorm
>
> 0.5 = (MATCH) fieldWeight(content:abcd in 1), product of:
>
> 1.0 = tf(termFreq(content:abcd)=1)
>
> 1.0 = idf(docFreq=4, maxDocs=5)
>
> 0.5 = fieldNorm(field=content, doc=1)
>
> 0.15471835 = (MATCH) weight(content:pqrst^5.0 in 1), product of:
>
> 0.37843326 = queryWeight(content:pqrst^5.0), product of:
>
> 5.0 = boost
>
> 0.81767845 = idf(docFreq=5, maxDocs=5)
>
> 0.092562854 = queryNorm
>
> 0.40883923 = (MATCH) fieldWeight(content:pqrst in 1), product of:
>
> 1.0 = tf(termFreq(content:pqrst)=1)
>
> 0.81767845 = idf(docFreq=5, maxDocs=5)
>
> 0.5 = fieldNorm(field=content, doc=1)
>
> *title ->pqrst uvwx lmn abcd ::: content -> pqrst uvwx lmn abcd::: Score
> ->0.6175326*
>
> Searcher.explain -> 0.6175326 = (MATCH) sum of:
>
> 0.46281427 = (MATCH) weight(content:abcd^10.0 in 3), product of:
>
> 0.92562854 = queryWeight(content:abcd^10.0), product of:
>
> 10.0 = boost
>
> 1.0 = idf(docFreq=4, maxDocs=5)
>
> 0.092562854 = queryNorm
>
> 0.5 = (MATCH) fieldWeight(content:abcd in 3), product of:
>
> 1.0 = tf(termFreq(content:abcd)=1)
>
> 1.0 = idf(docFreq=4, maxDocs=5)
>
> 0.5 = fieldNorm(field=content, doc=3)
>
> 0.15471835 = (MATCH) weight(content:pqrst^5.0 in 3), product of:
>
> 0.37843326 = queryWeight(content:pqrst^5.0), product of:
>
> 5.0 = boost
>
> 0.81767845 = idf(docFreq=5, maxDocs=5)
>
> 0.092562854 = queryNorm
>
> 0.40883923 = (MATCH) fieldWeight(content:pqrst in 3), product of:
>
> 1.0 = tf(termFreq(content:pqrst)=1)
>
> 0.81767845 = idf(docFreq=5, maxDocs=5)
>
> 0.5 = fieldNorm(field=content, doc=3)
>
> *title ->pqrst abcd uvwx lmn ::: content -> pqrst abcd uvwx lmn::: Score
> ->0.6175326*
>
> Searcher.explain -> 0.6175326 = (MATCH) sum of:
>
> 0.46281427 = (MATCH) weight(content:abcd^10.0 in 4), product of:
>
> 0.92562854 = queryWeight(content:abcd^10.0), product of:
>
> 10.0 = boost
>
> 1.0 = idf(docFreq=4, maxDocs=5)
>
> 0.092562854 = queryNorm
>
> 0.5 = (MATCH) fieldWeight(content:abcd in 4), product of:
>
> 1.0 = tf(termFreq(content:abcd)=1)
>
> 1.0 = idf(docFreq=4, maxDocs=5)
>
> 0.5 = fieldNorm(field=content, doc=4)
>
> 0.15471835 = (MATCH) weight(content:pqrst^5.0 in 4), product of:
>
> 0.37843326 = queryWeight(content:pqrst^5.0), product of:
>
> 5.0 = boost
>
> 0.81767845 = idf(docFreq=5, maxDocs=5)
>
> 0.092562854 = queryNorm
>
> 0.40883923 = (MATCH) fieldWeight(content:pqrst in 4), product of:
>
> 1.0 = tf(termFreq(content:pqrst)=1)
>
> 0.81767845 = idf(docFreq=5, maxDocs=5)
>
> 0.5 = fieldNorm(field=content, doc=4)
>
> *title ->pqrst uvwx lmn ::: content -> pqrst uvwx lmn::: Score ->0.07735918*
>
> Searcher.explain -> 0.07735918 = (MATCH) product of:
>
> 0.15471835 = (MATCH) sum of:
>
> 0.15471835 = (MATCH) weight(content:pqrst^5.0 in 2), product of:
>
> 0.37843326 = queryWeight(content:pqrst^5.0), product of:
>
> 5.0 = boost
>
> 0.81767845 = idf(docFreq=5, maxDocs=5)
>
> 0.092562854 = queryNorm
>
> 0.40883923 = (MATCH) fieldWeight(content:pqrst in 2), product of:
>
> 1.0 = tf(termFreq(content:pqrst)=1)
>
> 0.81767845 = idf(docFreq=5, maxDocs=5)
>
> 0.5 = fieldNorm(field=content, doc=2)
>
> 0.5 = coord(1/2)
>
>
> On Mon, Jan 30, 2012 at 2:59 PM, Ian Lea <ian.lea@gmail.com> wrote:
>
>> They all give exactly the same score, even the 3rd doc which doesn't
>> contain abcd at all?  Surprising.  What does searcher.explain() say?
>> Is this a simple search with default Similarity or is there stuff
>> you're not telling us?
>>
>> --
>> Ian.
>>
>>
>> On Sat, Jan 28, 2012 at 4:44 AM, A Z <4azfriend@gmail.com> wrote:
>> > Hi lan
>> >
>> > thanks for your reply.
>> >
>> > when i boosting each term while searching like   abcd is boost with boost
>> > factor of 10 and pqrst boost with boost factor of 5.
>> > then also it gives same score for documents
>> >
>> > *Query content:abcd^10.0 content:pqrst^5.0*
>>  >
>> >
>> > title ->pqrst uvwx abcd ::: content -> pqrst uvwx abcd::: Score
>> ->0.40883923
>> >
>> > title ->abcd pqrst uvwx ::: content -> abcd pqrst uvwx::: Score
>> ->0.40883923
>> >
>> > title ->pqrst uvwx lmn ::: content -> pqrst uvwx lmn::: Score
>> ->0.40883923
>> >
>> > title ->pqrst uvwx lmn abcd ::: content -> pqrst uvwx lmn abcd::: Score
>> > ->0.40883923
>> >
>> > title ->pqrst abcd uvwx lmn ::: content -> pqrst abcd uvwx lmn::: Score
>> > ->0.40883923
>> > Thanks
>> >
>> > On Wed, Jan 25, 2012 at 8:38 PM, Ian Lea <ian.lea@gmail.com> wrote:
>> >
>> >> If you want particular search terms to be more important than others
>> >> you can use boosting.  See
>> >> http://lucene.apache.org/java/3_5_0/queryparsersyntax.html#Boosting a
>> >> Term
>> >>
>> >> If you want the order of matched terms to matter, see PhraseQuery or
>> >> SpanQuery.  The latter is more flexible. See
>> >> http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/ for a
>> >> good writeup.
>> >>
>> >> And you can of course use combinations of everything.
>> >>
>> >>
>> >> --
>> >> Ian.
>> >>
>> >>
>> >>
>> >> On Tue, Jan 24, 2012 at 5:08 PM, A Z <4azfriend@gmail.com> wrote:
>> >> > Hi
>> >> >
>> >> >
>> >> >
>> >> > how can we assign custom score for each token/word.
>> >> >
>> >> >
>> >> >
>> >> > For Ex
>> >> >
>> >> > I have document
>> >> >
>> >> >
>> >> >
>> >> > 1    pqrst uvwx abcd
>> >> >
>> >> > 2    abcd pqrst uvwx
>> >> >
>> >> > 3    pqrst uvwx lmn
>> >> >
>> >> > 4    pqrst uvwx lmn abcd
>> >> >
>> >> > 5    pqrst abcd uvwx lmn
>> >> >
>> >> >
>> >> >
>> >> > *Now i m searching data ---> abcd pqrst*
>> >> >
>> >> > So it should give more weightage score to 2nd document then 1st
>> document
>> >> >
>> >> >
>> >> >
>> >> > So i want is
>> >> >
>> >> > *document 1 :---*    *pqrst *has more *weight * then   *uvwx *word
and
>> >> *then
>> >> >  abcd *word
>> >> >
>> >> > *document 2* *:---*    *abcd *has more *weight * then   *pqrst*
 word
>> >> > and *then  uvwx
>> >> > *word
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message