hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <>
Subject [jira] [Commented] (HIVE-2380) Add ByteArray Datatype
Date Tue, 06 Sep 2011 15:26:09 GMT


Ashutosh Chauhan commented on HIVE-2380:

bq. Is there a design doc somewhere?
Not yet. I can put together some design notes quickly over here. Is there an example of design
docs for the features done previously for Hive which I can look at to get an idea what shall
I cover in it?

bq. Since Hive already has an array type, but this feature is independent, we probably want
a different type name than bytearray.
OK. Binary ?

bq. For conversions, is going through string for all types a good default behavior? An alternative
would be to prevent implicit conversions altogether, and force users to pick the UDF with
the desired behavior. E.g. for string/binary conversion, it's a good idea to be able to specify
an encoding rather than always using the JVM default.
I also thought about casting and wasn't inclined for adding implicit casting. But I went with
it so as to make easy things easier. This way users can use this type easily in scripts without
needing to insert casts every time and in cases this doesnt work they can always write udfs.
Further, in many cases, JVM encoding is a good default. But, if you think thats not a good
idea, I can take away implicit casting.

bq. How does the new type work with 
bq. TRANSFORM scripts, 
I am assuming providing toString() will be good enough to make sure we can send data in string
form and after receiving can convert into bytearray. Is there anything else ?
bq. UDF's, 
Like other types. Do I need to think about anything here?
bq. saving to textfile, etc?
I assume you mean file containing text data in other columns, if user does so, it will be
his responsibility to escape and format data appropriately so that he can load it later, potentially
with the serde which does understand the format and escaping. 
bq. Don't we need more accessor functions (e.g. making the existing string functions such
as LENGTH work)?
Length should be possible. Any other accessor functions?

> Add ByteArray Datatype
> ----------------------
>                 Key: HIVE-2380
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>         Attachments: hive-2380.patch
> Add bytearray as a primitive data type.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message