lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Hits.score mystery
Date Fri, 02 Nov 2007 00:09:43 GMT
One of many options is to copy the StandardAnalyzer but change it so 
that + and # are considered letters.

Just add + and # to the LETTER definition in the JavaCC file if you are 
using a release, or the JFlex file if you are working off Trunk (your 
prob using a release but the new JFlex analyzer is mega faster).

Tom Conlon wrote:
> The reason seems to be that I found I needed to implement an analyser that lowercases
terms as well as *not* ignoring trailing characters such as #, +. 
> (i.e. I needed to match C# and C++)
>
> public final class LowercaseWhitespaceAnalyzer extends Analyzer 
> {
>   public TokenStream tokenStream(String fieldName, Reader reader) {
>     return new LowercaseWhitespaceTokenizer(reader);
>   }
> } 
>
> Problem now exists that "system," etc is not matched against "system".
>
> Can anyone point to an example of a combination of analyser/tokeniser (or other method)
that gets around this please?
>
> Thanks,
> Tom
>
>
> -----Original Message-----
> From: Tom Conlon [mailto:tomc@2ls.com] 
> Sent: 01 November 2007 09:18
> To: java-user@lucene.apache.org
> Subject: RE: Hits.score mystery
>
> Thanks Daniel,
>
> I'm using Searcher.explain() & luke to try to understand the reasons for the score.
>
> -----Original Message-----
> From: Daniel Naber [mailto:lucenelist2007@danielnaber.de]
> Sent: 01 November 2007 08:19
> To: java-user@lucene.apache.org
> Subject: Re: Hits.score mystery
>
> On Wednesday 31 October 2007 19:14, Tom Conlon wrote:
>
>   
>> 119.txt 17.865013        97%    (13 occurences) 45.txt  8.600986 47%  
>> (18 occurences)
>>     
>
> 45.txt might be a document with more therms so that its score is lower although it contains
more matches.
>
> Regards
>  Daniel
>
> --
> http://www.danielnaber.de
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message