avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Irving, Dave" <dave.irv...@baml.com>
Subject Utf8 byte[] reuse
Date Thu, 01 Mar 2012 13:48:44 GMT
Hi,

Im using a BinaryDecoder to read some Utf8s - and I'm reusing the same Utf8 instance.
I found there was a huge amount of allocation going on - and tracked it down to Utf8#setByteLength:

...
  public Utf8 setByteLength(int newLength) {
    if (this.length < newLength) {
      byte[] newBytes = new byte[newLength];
     System.arraycopy(bytes, 0, newBytes, 0, this.length);
      this.bytes = newBytes;
    }
    this.length = newLength;
    this.string = null;
    return this;
  }
...

So, say I've got 4 Utf8s A,B,C and D lined up, with byte lengths A=1, B=10, C=5 and D=6 respectively,
and do a read reusing the same Utf8 instance each time.
Read A: Causes allocation from empty buffer to buffer size 1 (ok)
Read B: Causes allocation from buffer size 1 to 10 (ok)
Read C: Reuses the buffer (ok)
Read D: Reallocates a buffer again, even though we've already got a 10 byte buffer (???)

A simple 'fix' would be to compare the byte[] length rather than this.length before doing
a reallocation.
The only issue I can see with this though is that you cause a byte[] of the largest utf you've
read with that instance to stay in memory. If thats a concern though, you could always provide
a 'limit' on construction of the Utf8 (if the allocated byte[] goes greater than this, drop
it and reallocate on the next resize < limit).

If this something that would be considered for changing if I submit a patch / jira?

Many thanks in advance,

Dave


----------------------------------------------------------------------
This message w/attachments (message) is intended solely for the use of the intended recipient(s)
and may contain information that is privileged, confidential or proprietary. If you are not
an intended recipient, please notify the sender, and then please delete and destroy all copies
and attachments, and be advised that any review or dissemination of, or the taking of any
action in reliance on, the information contained in or attached to this message is prohibited.

Unless specifically indicated, this message is not an offer to sell or a solicitation of any
investment products or other financial product or service, an official confirmation of any
transaction, or an official statement of Sender. Subject to applicable law, Sender may intercept,
monitor, review and retain e-communications (EC) traveling through its networks/systems and
may produce any such EC to regulators, law enforcement, in litigation and as required by law.

The laws of the country of each sender/recipient may impact the handling of EC, and EC may
be archived, supervised and produced in countries other than the country in which you are
located. This message cannot be guaranteed to be secure or free of errors or viruses. 

References to "Sender" are references to any subsidiary of Bank of America Corporation. Securities
and Insurance Products: * Are Not FDIC Insured * Are Not Bank Guaranteed * May Lose Value
* Are Not a Bank Deposit * Are Not a Condition to Any Banking Service or Activity * Are Not
Insured by Any Federal Government Agency. Attachments that are part of this EC may have additional
important disclosures and disclaimers, which you should read. This message is subject to terms
available at the following link: 
http://www.bankofamerica.com/emaildisclaimer. By messaging with Sender you consent to the
foregoing.

Mime
View raw message