lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Rowe <sar...@gmail.com>
Subject Re: StandardTokenizer#setMaxTokenLength
Date Fri, 17 Jul 2015 14:40:01 GMT
Hi Piotr,

Thanks for reporting!

See https://issues.apache.org/jira/browse/LUCENE-6682

Steve
www.lucidworks.com

> On Jul 16, 2015, at 4:47 AM, Piotr Idzikowski <piotridzikowski@gmail.com> wrote:
> 
> Hello.
> I am developing own analyzer based on StandardAnalyzer.
> I realized that tokenizer.setMaxTokenLength is called many times.
> 
> *protected TokenStreamComponents createComponents(final String fieldName,
> final Reader reader) {*
> *    final StandardTokenizer src = new StandardTokenizer(getVersion(),
> reader);*
> *    src.setMaxTokenLength(maxTokenLength);*
> *    TokenStream tok = new StandardFilter(getVersion(), src);*
> *    tok = new LowerCaseFilter(getVersion(), tok);*
> *    tok = new StopFilter(getVersion(), tok, stopwords);*
> *    return new TokenStreamComponents(src, tok) {*
> *      @Override*
> *      protected void setReader(final Reader reader) throws IOException {*
> *        src.setMaxTokenLength(StandardAnalyzer.this.maxTokenLength);*
> *        super.setReader(reader);*
> *      }*
> *    };*
> *  }*
> 
> Does it make sense if length stays the same? I see it finally calls this
> one( in StandardTokenizerImpl ):
> *public final void setBufferSize(int numChars) {*
> *     ZZ_BUFFERSIZE = numChars;*
> *     char[] newZzBuffer = new char[ZZ_BUFFERSIZE];*
> *     System.arraycopy(zzBuffer, 0, newZzBuffer, 0,
> Math.min(zzBuffer.length, ZZ_BUFFERSIZE));*
> *     zzBuffer = newZzBuffer;*
> *   }*
> So it just copies old array content into the new one.
> 
> Regards
> Piotr Idzikowski


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message