hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lefty Leverenz <>
Subject Re: Parquet support (HIVE-5783)
Date Fri, 21 Feb 2014 07:27:46 GMT
This is in the Terminology
the Storage Handlers doc:

Storage handlers introduce a distinction between *native* and
*non-native* tables.
> A native table is one which Hive knows how to manage and access without a
> storage handler; a non-native table is one which requires a storage handler.

It goes on to say that non-native tables are created with a STORED BY
clause (as opposed to a STORED AS clause).

Does that clarify or muddy the waters?

-- Lefty

On Thu, Feb 20, 2014 at 7:37 PM, Lefty Leverenz <>wrote:

> Some of these issues can be addressed in the documentation.  The "File
> Formats" section of the Language Manual needs an overview, and that might
> be a good place to explain the differences between Hive-owned formats and
> external formats.  Or the SerDe doc could be beefed up:  Built-In SerDes<>
> .
> In the meantime, I've added a link to the Avro doc in the "File Formats"
> list and mentioned Parquet in DDL's Row Format, Storage Format, and SerDe<,StorageFormat,andSerDe>section:
> Use STORED AS PARQUET (without ROW FORMAT SERDE) for the Parquet<>
>> storage format in Hive 0.13.0 and later<>;
>> 0.10, 0.11, or 0.12<>
>> .
> Does that work?
> -- Lefty
> On Tue, Feb 18, 2014 at 1:31 PM, Brock Noland <> wrote:
>> Hi Alan,
>> Response is inline, below:
>> On Tue, Feb 18, 2014 at 11:49 AM, Alan Gates <>
>> wrote:
>> > Gunther, is it the case that there is anything extra that needs to be
>> done to ship Parquet code with Hive right now?  If I read the patch
>> correctly the Parquet jars were added to the pom and thus will be shipped
>> as part of Hive.  As long as it works out of the box when a user says
>> "create table ... stored as parquet" why do we care whether the parquet jar
>> is owned by Hive or another project?
>> >
>> > The concern about feature mismatch in Parquet versus Hive is valid, but
>> I'm not sure what to do about it other than assure that there are good
>> error messages.  Users will often want to use non-Hive based storage
>> formats (Parquet, Avro, etc.).  This means we need a good way to detect at
>> SQL compile time that the underlying storage doesn't support the indicated
>> data type and throw a good error.
>> Agreed, the error messages should absolutely be good. I will ensure
>> this is the case via
>> >
>> > Also, it's important to be clear going forward about what Hive as a
>> project is signing up for.  If tomorrow someone decides to add a new
>> datatype or feature we need to be clear that we expect the contributor to
>> make this work for Hive owned formats (text, RC, sequence, ORC) but not
>> necessarily for external formats
>> This makes sense to me.
>> I'd just like to add that I have a patch available to improve the
>> hive-exec uber jar and general query speed:
>> Additionally I have a
>> patch available to finish the generic STORED AS functionality:
>> Brock

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message