lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gururaja H <guru_h...@yahoo.com>
Subject Re: Relevance and ranking ...
Date Sat, 18 Dec 2004 12:56:15 GMT
Hi Erik,
 
Created my own subclass of Similarity.  When i printed the values for coord() factor
i am getting the same for all the 4 documents.  So the value is NOT  getting boosted.
Want to do this. as i want the document that has 
e.g., all three terms in a three word query over those that contain just two
of the words.
 
Please let me how do i go about doing this ?  Please explain the coordination factor ?
 
The default order of document that i get for my example given in this thread is as follows:
Doc#2
Doc#4
Doc#3
Doc#1
 
Any inputs would be help full.  Thanks,
 
Gururaja

Erik Hatcher <erik@ehatchersolutions.com> wrote:

On Dec 17, 2004, at 6:09 AM, Gururaja H wrote:
> Thanks for the reply. Is there any sample code which tells me how to 
> change these
> coord() factor, overlapping, lenght normalizaiton etc.. ??
> If there are any please provide me.

Have a look at Lucene's DefaultSimilarity code itself. Use that as a 
starting point - in fact you should subclass it and only override the 
one or two methods you want to tweak.

There are probably some other examples in Lucene's test cases, or that 
have been posted to the list but I don't have handy pointers to them.

Erik


>
> Thanks,
> Gururaja
>
>
> Erik Hatcher wrote:
> The coord() factor of Similarity is what controls a muliplier factor
> for overlapping query terms in a document. The DefaultSimilarity
> already contains factors that allow documents with overlapping terms to
> get boosted. Is this not working for you? You may also need to adjust
> length normalization factors. Check the javadocs on Similarity for
> details on implementing your own formulas. Also become familiar with
> IndexSearcher.explain() and the Explanation so that you can see how
> adjusting things affects the details.
>
> Erik
>
> On Dec 17, 2004, at 3:42 AM, Gururaja H wrote:
>
>> Hi,
>>
>> How to implement the following ? Please provide inputs ....
>>
>>
>> For example, if the search query has 5 terms (ibm, risc, tape, drive,
>> manual) and there are 4 matching documents with the following
>> attributes, then the order should be as described below.
>>
>> Doc#1: contains terms (ibm, drive) and has a total of 100 terms in the
>> document.
>>
>> Doc#2: contains terms (ibm, risc, tape, drive) and has a total of 30
>> terms in the document.
>>
>> Doc#3: contains terms (ibm, risc, tape, drive) and has a total of 100
>> terms in the document.
>>
>> Doc#4: contains terms (ibm, risc, tape, drive, manual) and has a total
>> of 300 terms in the document
>>
>> The search results should include all three documents since each has
>> one or more of the search terms, however, the order should be returned
>> as:
>>
>> Doc#4
>>
>> Doc#2
>>
>> Doc#3
>>
>> Doc#1
>>
>> Doc#4 should be first, since of the 5 search terms, it contains all 5.
>>
>> Doc#2 should be second, since it has 4 of the 5 search terms and of
>> the number of terms in the document, its ratio is higher than Doc#3
>> (4/30). Doc#3 has 4 of the 5 terms, but its ratio is 4/100.
>>
>> Doc#1 is last since it only has 2 of the 5 terms.
>>
>>
>> ----
>>
>> Thanks,
>> Gururaja
>>
>>
>> __________________________________________________
>> Do You Yahoo!?
>> Tired of spam? Yahoo! Mail has the best spam protection around
>> http://mail.yahoo.com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> 
> ---------------------------------
> Do you Yahoo!?
> Send holiday email and support a worthy cause. Do good.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message