hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: Parquet support (HIVE-5783)
Date Tue, 18 Feb 2014 18:31:51 GMT
Hi Alan,

Response is inline, below:

On Tue, Feb 18, 2014 at 11:49 AM, Alan Gates <gates@hortonworks.com> wrote:
> Gunther, is it the case that there is anything extra that needs to be done to ship Parquet
code with Hive right now?  If I read the patch correctly the Parquet jars were added to the
pom and thus will be shipped as part of Hive.  As long as it works out of the box when a user
says "create table ... stored as parquet" why do we care whether the parquet jar is owned
by Hive or another project?
>
> The concern about feature mismatch in Parquet versus Hive is valid, but I'm not sure
what to do about it other than assure that there are good error messages.  Users will often
want to use non-Hive based storage formats (Parquet, Avro, etc.).  This means we need a good
way to detect at SQL compile time that the underlying storage doesn't support the indicated
data type and throw a good error.

Agreed, the error messages should absolutely be good. I will ensure
this is the case via https://issues.apache.org/jira/browse/HIVE-6457

>
> Also, it's important to be clear going forward about what Hive as a project is signing
up for.  If tomorrow someone decides to add a new datatype or feature we need to be clear
that we expect the contributor to make this work for Hive owned formats (text, RC, sequence,
ORC) but not necessarily for external formats

This makes sense to me.

I'd just like to add that I have a patch available to improve the
hive-exec uber jar and general query speed:
https://issues.apache.org/jira/browse/HIVE-860. Additionally I have a
patch available to finish the generic STORED AS functionality:
https://issues.apache.org/jira/browse/HIVE-5976

Brock

Mime
View raw message