avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yang <teddyyyy...@gmail.com>
Subject Re: why Utf8 (vs String)?
Date Fri, 12 Aug 2011 00:08:40 GMT
Thanks  a lot Doug

On Thu, Aug 11, 2011 at 5:02 PM, Doug Cutting <cutting@apache.org> wrote:
> This is for performance.
> A Utf8 may be efficiently compared to other Utf8's, e.g., when sorting,
> without decoding the UTF-8 bytes into characters.  A Utf8 may also be
> reused, so when iterating through a large number of values (e.g., in a
> MapReduce job) only a single instance need be allocated, while String
> would require an allocation per iteration.
> Note that String may be used when writing data, but that data is
> generally read as Utf8.  The toString() method may be called whenever a
> String is required.  If only equality or ordering is needed, and not
> substring operations, then leaving values as Utf8 is generally faster
> than converting to String.
> Doug
> On 08/11/2011 04:36 PM, Yang wrote:
>> if I declare a field to be "string", the generated java implementation
>> uses avro......Utf8 for that,
>> I was wondering what is the thinking behind this, and what is the
>> proper way to use the Utf8 value -----
>> oftentimes in my logic, I need to compare the value against other
>> String's, or store them into other databases , which
>> of course do not know about Utf8, so that I'd have to transform them
>> into String's.  so it seems being Utf8 unnecessarily
>> asks for a lot of transformations.
>> or I guess I'm not getting the correct usage ?
>> Thanks
>> Yang

View raw message