lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom Conlon" <>
Subject RE: Hits.score mystery
Date Thu, 01 Nov 2007 15:30:42 GMT
The reason seems to be that I found I needed to implement an analyser that lowercases terms
as well as *not* ignoring trailing characters such as #, +. 
(i.e. I needed to match C# and C++)

public final class LowercaseWhitespaceAnalyzer extends Analyzer 
  public TokenStream tokenStream(String fieldName, Reader reader) {
    return new LowercaseWhitespaceTokenizer(reader);

Problem now exists that "system," etc is not matched against "system".

Can anyone point to an example of a combination of analyser/tokeniser (or other method) that
gets around this please?


-----Original Message-----
From: Tom Conlon [] 
Sent: 01 November 2007 09:18
Subject: RE: Hits.score mystery

Thanks Daniel,

I'm using Searcher.explain() & luke to try to understand the reasons for the score.

-----Original Message-----
From: Daniel Naber []
Sent: 01 November 2007 08:19
Subject: Re: Hits.score mystery

On Wednesday 31 October 2007 19:14, Tom Conlon wrote:

> 119.txt 17.865013        97%    (13 occurences) 45.txt  8.600986 47%  
> (18 occurences)

45.txt might be a document with more therms so that its score is lower although it contains
more matches.



To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message