lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Idzikowski <piotridzikow...@gmail.com>
Subject Re: StandardTokenizer#setMaxTokenLength
Date Mon, 20 Jul 2015 08:21:24 GMT
Hello.
Btw, I think ClassicAnalyzer has the same problem

Regards

On Fri, Jul 17, 2015 at 4:40 PM, Steve Rowe <sarowe@gmail.com> wrote:

> Hi Piotr,
>
> Thanks for reporting!
>
> See https://issues.apache.org/jira/browse/LUCENE-6682
>
> Steve
> www.lucidworks.com
>
> > On Jul 16, 2015, at 4:47 AM, Piotr Idzikowski <piotridzikowski@gmail.com>
> wrote:
> >
> > Hello.
> > I am developing own analyzer based on StandardAnalyzer.
> > I realized that tokenizer.setMaxTokenLength is called many times.
> >
> > *protected TokenStreamComponents createComponents(final String fieldName,
> > final Reader reader) {*
> > *    final StandardTokenizer src = new StandardTokenizer(getVersion(),
> > reader);*
> > *    src.setMaxTokenLength(maxTokenLength);*
> > *    TokenStream tok = new StandardFilter(getVersion(), src);*
> > *    tok = new LowerCaseFilter(getVersion(), tok);*
> > *    tok = new StopFilter(getVersion(), tok, stopwords);*
> > *    return new TokenStreamComponents(src, tok) {*
> > *      @Override*
> > *      protected void setReader(final Reader reader) throws IOException
> {*
> > *        src.setMaxTokenLength(StandardAnalyzer.this.maxTokenLength);*
> > *        super.setReader(reader);*
> > *      }*
> > *    };*
> > *  }*
> >
> > Does it make sense if length stays the same? I see it finally calls this
> > one( in StandardTokenizerImpl ):
> > *public final void setBufferSize(int numChars) {*
> > *     ZZ_BUFFERSIZE = numChars;*
> > *     char[] newZzBuffer = new char[ZZ_BUFFERSIZE];*
> > *     System.arraycopy(zzBuffer, 0, newZzBuffer, 0,
> > Math.min(zzBuffer.length, ZZ_BUFFERSIZE));*
> > *     zzBuffer = newZzBuffer;*
> > *   }*
> > So it just copies old array content into the new one.
> >
> > Regards
> > Piotr Idzikowski
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message