lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Digy (JIRA)" <j...@apache.org>
Subject [jira] Created: (LUCENE-1581) LowerCaseFilter should be able to be configured to use a specific locale.
Date Sun, 29 Mar 2009 00:14:50 GMT
LowerCaseFilter should be able to be configured to use a specific locale.
-------------------------------------------------------------------------

                 Key: LUCENE-1581
                 URL: https://issues.apache.org/jira/browse/LUCENE-1581
             Project: Lucene - Java
          Issue Type: Improvement
            Reporter: Digy


//Since I am a .Net programmer, Sample codes will be in c# but I don't think that it would
be a problem to understand them.
//

Assume an input text like "İ" and and analyzer like below
{code}
	public class SomeAnalyzer : Analyzer
    	{
		public override TokenStream TokenStream(string fieldName, System.IO.TextReader reader)
	        {
            		TokenStream t = new SomeTokenizer(reader);
		        t = new Lucene.Net.Analysis.ASCIIFoldingFilter(t);
			t = new LowerCaseFilter(t);
		        return t;
		}
        
    	}
{code}
	

ASCIIFoldingFilter will return "I" and after, LowerCaseFilter will return
	"i" (if locale is "en-US") 
	or 
	"ı' if(locale is "tr-TR") (that means,this token should be input to another instance of
ASCIIFoldingFilter)



So, calling LowerCaseFilter before ASCIIFoldingFilter would be a solution, but a better approach
can be adding
a new constructor to LowerCaseFilter and forcing it to use a specific locale.
{code}
    public sealed class LowerCaseFilter : TokenFilter
    {
        /* +++ */System.Globalization.CultureInfo CultureInfo = System.Globalization.CultureInfo.CurrentCulture;

        public LowerCaseFilter(TokenStream in) : base(in)
        {
        }

        /* +++ */  public LowerCaseFilter(TokenStream in, System.Globalization.CultureInfo
CultureInfo) : base(in)
        /* +++ */  {
        /* +++ */      this.CultureInfo = CultureInfo;
        /* +++ */  }
		
        public override Token Next(Token result)
        {
            result = Input.Next(result);
            if (result != null)
            {

                char[] buffer = result.TermBuffer();
                int length = result.termLength;
                for (int i = 0; i < length; i++)
                    /* +++ */ buffer[i] = System.Char.ToLower(buffer[i],CultureInfo);

                return result;
            }
            else
                return null;
        }
    }
{code}

DIGY

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message