lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <x10...@gmail.com>
Subject RE: Searching for keywords .net,c#,...
Date Mon, 25 Feb 2013 23:04:34 GMT
I did search google on TokenFilter lucene example and found this link
http://sujitpal.blogspot.com/2011/07/lucene-token-concatenating-tokenfilter_
30.html
which seems to override incrementToken() ( guess as I don't know java )
however using lucene.net 3.0.3, I can override
           public override Token Next(Token result)
           public override Token Next()
but not able to figure out how to proceed there, I tried to debug using
            public override Token Next(Token result)
            {
                Debug.WriteLine(string.Format(" --- {0}", result));
                return result;
            }
But went nowhere with that, any help on how to write my custom tokenFilter()




Also, The analyzer I have is setup as below without the use of
ReusableTokenStream() per the example in your link, not sure if that makes a
difference ??

    class MyAnalyzer : Analyzer
    {
        public override TokenStream TokenStream(string fieldName,
System.IO.TextReader reader)
        {
            TokenStream result = new WhitespaceTokenizer(reader);
            result = new LowerCaseFilter(result);
            result = new StandardFilter(result);
            result = new StopFilter(true, result,
StopAnalyzer.ENGLISH_STOP_WORDS_SET);
            return result;
        }
    }

-----Original Message-----
From: Naresh [mailto:nnaresh@gmail.com] 
Sent: Monday, February 25, 2013 1:18 AM
To: java-user@lucene.apache.org
Subject: Re: Searching for keywords .net,c#,...

Hi,
You can write your own token-filter to split on some characters (comma, |
etc.,) and then build an analyzer using the WhiteSpaceTokenizer,
LowerCaseFilter and your CustomTokenFilter.

See
http://stackoverflow.com/questions/9015348/lucene-custom-analyzer/9015658#90
15658

On Mon, Feb 25, 2013 at 11:24 AM, kumar <x10179@gmail.com> wrote:

> Hello all
>
> I am a lucene novice and trying to setup lucene in a .net app using 
> lucene.net for searching through documents So far it has been 
> fantastic, however given that the users expectations are for 
> "google"-like search, running into issues searching for .net and c#
>
> Initially tried the StandardAnalyzer which of course does not work for 
> searching - .net & c#
> Changed that to a custom analyzer       using WhitespaceTokenizer and
> LowerCaseFilter and it works
> however some of the documents have the keywords as
>
> oracle,.net,C#,java etc. ( i.e. separated by commas without any space 
> )
>
> and this custom analyzer fails here
>
> Looking for suggestions on how this might work as i'm sure it's 
> possible considering both lucene and .net/c# have been around for a 
> long long while
>
> It looks like PatternAnalyzer might be of some use in this case, 
> however i'm not quite sure how to use it and have found scant 
> references to it
>
>
> Any help is appreciated
>
> Thanks
> kumar
>
>


--
Regards
Naresh


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message