lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adriano Crestani <adrianocrest...@gmail.com>
Subject Re: Cloning TermAttribute objects
Date Thu, 15 Jul 2010 20:00:51 GMT
Keeping this thread alive.

I would appreciate a response from the community about this issue.

Thanks in advance,
Adriano Crestani

On Tue, Jul 13, 2010 at 3:59 AM, Adriano Crestani
<adrianocrestani@apache.org> wrote:
> Hi,
>
> Why TermAttributeImpl.clone() method uses buff.clone() instead of
> System.arrayCopy to clone its internal buffer? Performance reasons?
>
> I have the following scenario:
>
> ...
> public boolean incrementToken() {
> ...
> String twoHundredKCharsString = "abc....";
> String smallString = "test";
>
> termAttribute.setTermBuffer(twoHundredKCharsString);
> State largeStringState = captureState();
>
> termAttribute.setTermBuffer(smallString);
> State smallStringState = captureState();
>
> ...
> }
> ...
>
> And guess what?! smallStringState has a TermAttribute object that
> holds an internal buffer of 200k chars in size!!!
>
> I was googling and found out that using cloning and arrayCopy has the
> same performance for small arrays, and cloning just performs better
> for large arrays.
>
> So, if large string inputs are not a real scenario, why not use
> arrayCopy instead of clone? But in case it's a real scenario, Lucene
> should definitely not be copying the entire buffer for small strings.
>
> Maybe TermAttribute interface could expose a method like
> shrinkBuffer(), so the user could invoke when it needs to.
>
> Thoughts?
>
> Best Regards,
> Adriano Crestani
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message