lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Digy" <digyd...@gmail.com>
Subject RE: Unwanted Tokens
Date Thu, 06 May 2010 15:49:56 GMT
For input "This is a b text", your code would output [this] [is] [b] [text].
Call "return this.Next()" instead of "return input.Next()"

DIGY

-----Original Message-----
From: Markus Doerig [mailto:farangkao@gmail.com] 
Sent: Thursday, May 06, 2010 4:43 PM
To: lucene-net-user@lucene.apache.org
Subject: Unwanted Tokens

I'm trying to write my own Filter (using StandardFilter) as a template...

Most of the cases i could solve with adding my own StopWords, but some cases
of Terms i want to avoid,
i guess i need to write my own Filter.

here what i wrote so far:

        public override Lucene.Net.Analysis.Token Next()
        {
            Lucene.Net.Analysis.Token t = input.Next();

            if (t == null)
                return null;

            System.String text = t.TermText();
            System.String type = t.Type();



            if (text.Length == 1)  // Remove all Terms with only one char
                return input.Next();
            else
                return t;
      }


When i check in Luke, it seems that some one-char terms are removed ,but not
all.
I'm doing something wrong?

Thanks for any help.
Markus.


Mime
View raw message