lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Idzikowski <piotridzikow...@gmail.com>
Subject StandardTokenizer#setMaxTokenLength
Date Thu, 16 Jul 2015 08:47:58 GMT
Hello.
I am developing own analyzer based on StandardAnalyzer.
I realized that tokenizer.setMaxTokenLength is called many times.

*protected TokenStreamComponents createComponents(final String fieldName,
final Reader reader) {*
*    final StandardTokenizer src = new StandardTokenizer(getVersion(),
reader);*
*    src.setMaxTokenLength(maxTokenLength);*
*    TokenStream tok = new StandardFilter(getVersion(), src);*
*    tok = new LowerCaseFilter(getVersion(), tok);*
*    tok = new StopFilter(getVersion(), tok, stopwords);*
*    return new TokenStreamComponents(src, tok) {*
*      @Override*
*      protected void setReader(final Reader reader) throws IOException {*
*        src.setMaxTokenLength(StandardAnalyzer.this.maxTokenLength);*
*        super.setReader(reader);*
*      }*
*    };*
*  }*

Does it make sense if length stays the same? I see it finally calls this
one( in StandardTokenizerImpl ):
*public final void setBufferSize(int numChars) {*
*     ZZ_BUFFERSIZE = numChars;*
*     char[] newZzBuffer = new char[ZZ_BUFFERSIZE];*
*     System.arraycopy(zzBuffer, 0, newZzBuffer, 0,
Math.min(zzBuffer.length, ZZ_BUFFERSIZE));*
*     zzBuffer = newZzBuffer;*
*   }*
So it just copies old array content into the new one.

Regards
Piotr Idzikowski

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message