lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Digy" <>
Subject RE: Unwanted Tokens
Date Thu, 06 May 2010 15:49:56 GMT
For input "This is a b text", your code would output [this] [is] [b] [text].
Call "return this.Next()" instead of "return input.Next()"


-----Original Message-----
From: Markus Doerig [] 
Sent: Thursday, May 06, 2010 4:43 PM
Subject: Unwanted Tokens

I'm trying to write my own Filter (using StandardFilter) as a template...

Most of the cases i could solve with adding my own StopWords, but some cases
of Terms i want to avoid,
i guess i need to write my own Filter.

here what i wrote so far:

        public override Lucene.Net.Analysis.Token Next()
            Lucene.Net.Analysis.Token t = input.Next();

            if (t == null)
                return null;

            System.String text = t.TermText();
            System.String type = t.Type();

            if (text.Length == 1)  // Remove all Terms with only one char
                return input.Next();
                return t;

When i check in Luke, it seems that some one-char terms are removed ,but not
I'm doing something wrong?

Thanks for any help.

View raw message