lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: More Analyzer Question
Date Mon, 21 Feb 2005 20:39:25 GMT
The problem is your KeywordSynonymAnalyzer is not truly a "keyword" 
analyzer in that it is tokenizing the field into parts.  So Document 1 
has [test] and [mario] as tokens that come from the LowerCaseTokenizer.

Look at Lucene's svn repository under contrib/analyzers and you'll see 
a KeywordTokenizer and corresponding KeywordAnalyzer you can use.

	Erik


On Feb 18, 2005, at 5:44 PM, Luke Shannon wrote:
> I have created an Analyzer that I think should just be converting to 
> lower
> case and add synonyms in the index (it is at the end of the email).
>
> The problem is, after running it I get one more result than I was 
> expecting
> (Document 1 should not be there):
>
> Running testNameCombination1, expecting: 1 result
> The query: +(type:138) +(name:mario*) returned 2
>
> Start Listing documents:
>
> Document: 0 contains:
> Name: Text<name:mario test>
> Desc: Text<desc:this is test from mario>
>
>
> Document: 1 contains:
> Name: Text<name:test mario>
> Desc: Text<desc:retro>
>
> End Listing documents
>
> Those same 2 documents in Luke look like this:
>
> Document 0
> Text<name:mario test>
> Text<desc:this is test from mario>
>
> Document 1
> Text<name:test mario>
> Text<desc:retro>
>
> That looks correct to me. The query shouldn't match Document 1.
>
> The analzyer used on this field is below and is applied like so:
>
> //set the default
> PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper(new
> SynonymAnalyzer(new FBSynonymEngine()));
>
> //the analyzer for the name field (only converts to lower case and adds
> synonyms
> analyzer.addAnalyzer("name", new KeywordSynonymAnalyzer(new
> FBSynonymEngine()));
>
> Any help would be appreciated.
>
> Thanks,
>
> Luke
>
>
> import org.apache.lucene.analysis.*;
> import java.io.Reader;
>
> public class KeywordSynonymAnalyzer extends Analyzer {
>     private SynonymEngine engine;
>
>     public KeywordSynonymAnalyzer(SynonymEngine engine) {
>         this.engine = engine;
>     }
>
>     public TokenStream tokenStream(String fieldName, Reader reader) {
>         TokenStream result = new SynonymFilter(new
> LowerCaseTokenizer(reader), engine);
>         return result;
>     }
> }
>
>
>
>
>
>
>
> Luke Shannon | Software Developer
> FutureBrand Toronto
>
> 207 Queen's Quay, Suite 400
> Toronto, ON, M5J 1A7
> 416 642 7935 (office)
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message