lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Subject RE: Searching for keywords .net,c#,...
Date Mon, 25 Feb 2013 23:04:34 GMT
I did search google on TokenFilter lucene example and found this link
which seems to override incrementToken() ( guess as I don't know java )
however using 3.0.3, I can override
           public override Token Next(Token result)
           public override Token Next()
but not able to figure out how to proceed there, I tried to debug using
            public override Token Next(Token result)
                Debug.WriteLine(string.Format(" --- {0}", result));
                return result;
But went nowhere with that, any help on how to write my custom tokenFilter()

Also, The analyzer I have is setup as below without the use of
ReusableTokenStream() per the example in your link, not sure if that makes a
difference ??

    class MyAnalyzer : Analyzer
        public override TokenStream TokenStream(string fieldName,
System.IO.TextReader reader)
            TokenStream result = new WhitespaceTokenizer(reader);
            result = new LowerCaseFilter(result);
            result = new StandardFilter(result);
            result = new StopFilter(true, result,
            return result;

-----Original Message-----
From: Naresh [] 
Sent: Monday, February 25, 2013 1:18 AM
Subject: Re: Searching for keywords .net,c#,...

You can write your own token-filter to split on some characters (comma, |
etc.,) and then build an analyzer using the WhiteSpaceTokenizer,
LowerCaseFilter and your CustomTokenFilter.


On Mon, Feb 25, 2013 at 11:24 AM, kumar <> wrote:

> Hello all
> I am a lucene novice and trying to setup lucene in a .net app using 
> for searching through documents So far it has been 
> fantastic, however given that the users expectations are for 
> "google"-like search, running into issues searching for .net and c#
> Initially tried the StandardAnalyzer which of course does not work for 
> searching - .net & c#
> Changed that to a custom analyzer       using WhitespaceTokenizer and
> LowerCaseFilter and it works
> however some of the documents have the keywords as
> oracle,.net,C#,java etc. ( i.e. separated by commas without any space 
> )
> and this custom analyzer fails here
> Looking for suggestions on how this might work as i'm sure it's 
> possible considering both lucene and .net/c# have been around for a 
> long long while
> It looks like PatternAnalyzer might be of some use in this case, 
> however i'm not quite sure how to use it and have found scant 
> references to it
> Any help is appreciated
> Thanks
> kumar


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message