lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chitra <chithu.r...@gmail.com>
Subject Re: ClassicAnalyzer Behavior on accent character
Date Fri, 20 Oct 2017 08:34:17 GMT
Hi Robert,
                 Yes, standardTokenizer solves my case... could you please
explain the difference between ClassicalTokenizer and StandardTokenizer?
How does standardTokenizer solve my case? I surf the web but I was unable
to understand...


Any help is greatly appreciated.

On Fri, Oct 20, 2017 at 12:10 AM, Robert Muir <rcmuir@gmail.com> wrote:

> easy, don't use classictokenizer: use standardtokenizer instead.
>
> On Thu, Oct 19, 2017 at 9:37 AM, Chitra <chithu.r111@gmail.com> wrote:
> > Hi,
> >               I indexed a term 'ⒶeŘꝋꝒɫⱯŋɇ' (aeroplane) and the term
was
> > indexed as "er l n", some characters were trimmed while indexing.
> >
> > Here is my code
> >
> > protected Analyzer.TokenStreamComponents createComponents(final String
> >> fieldName, final Reader reader)
> >>     {
> >>         final ClassicTokenizer src = new ClassicTokenizer(getVersion(),
> >> reader);
> >>         src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_
> TOKEN_LENGTH);
> >>
> >>         TokenStream tok = new ClassicFilter(src);
> >>         tok = new LowerCaseFilter(getVersion(), tok);
> >>         tok = new StopFilter(getVersion(), tok, stopwords);
> >>         tok = new ASCIIFoldingFilter(tok); // to enable
> AccentInsensitive
> >> search
> >>
> >>         return new Analyzer.TokenStreamComponents(src, tok)
> >>         {
> >>             @Override
> >>             protected void setReader(final Reader reader) throws
> >> IOException
> >>             {
> >>
> >> src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_TOKEN_LENGTH);
> >>                 super.setReader(reader);
> >>             }
> >>         };
> >>     }
> >
> >
> >
> > Am I missing anything? Is that expected behavior for my input or any
> reason
> > behind such abnormal behavior?
> >
> > --
> > Regards,
> > Chitra
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Regards,
Chitra

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message