lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: short document is more important.
Date Mon, 30 Aug 2004 16:39:08 GMT
Ernesto,

Don't add "no" as a stop word, that's not going to solve your problem.
My suggestion would be to use
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Explanation.html
to see what's making that short document have a higher score.

The higher ranking is probably due to this method:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#lengthNorm(java.lang.String,%20int)

The solution for this is to create your own Similarity sub-class and
override that lengthNorm method.  Perhaps you want it to return 1.0.

See the top of Similarity javadocs for how to set your own Similarity
subclass.

Otis


--- Ernesto De Santis <ernesto.desantis@colaborativa.net> wrote:

> Hi
> 
> In my index I have one document that is very short and in your body
> repeate
> "no funciona" (donīt work) two times. The problem is that this
> document,
> often appears in the first positions. Inclusively when it does not
> have much
> in common with the search input text.
> 
> For example:
> 
> The very short document A have "mi teclado no funciona" (my keyboard
> donīt
> work), two times.
> 
> If I put "mi mouse no funciona" (my mouse donīt work), the document A
> appears in the firts positions. This is very bad for me.
> 
> I think in put "no" in the stop words, but this is good solution? I
> lost
> this important word for many others documents.
> 
> Which is the best solution?
> 
> Thanks,
> Ernesto.
> 
> 
> 
> 
> ---
> Outgoing mail is certified Virus Free.
> Checked by AVG anti-virus system (http://www.grisoft.com).
> Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message