hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Hammerbacher" <jeff.hammerbac...@gmail.com>
Subject Re: Questions regarding Hive metadata schema
Date Wed, 08 Oct 2008 00:53:02 GMT
For translation purposes, SerDe's in Hive correspond to
StoreFunc/LoadFunc pairs in Pig and Producers/Extractor pairs in

I claim SCOPE's terminology is the most elegant and we should all
standardize on their terminology, in this case at least. Joy claims
that SerDe is a common term in the hardware community. Since Hive was
mainly intended for hardware developers, ...wait a second, that's not

(seriously though, we need some way to keep these things straight, and
being able to reuse serialization/deserialization libraries would be

On Tue, Oct 7, 2008 at 3:49 PM, Prasad Chakka <prasad@facebook.com> wrote:
> Hi Alan,
> The objects are very closely associated with the Thrift API objects defined
> in src/contrib/hive/metastore/if/hive_metastore.thrift . It contains
> descriptions as to what each field is and it should most of your questions.
> ORM for this is at s/c/h/metastore/src/java/model/package.jdo.
> 2) SD is storage descriptor (look at SDS table)
> 3) SERDES contains information for Hive serializers and deserializers
> 5) Tables and Partitions have Storage Descriptors. Storage Descriptors
> contain physical storage info and how to read the data (serde info). Storage
> Description object actually contains the columns. This means that different
> partitions can have different column sets
> 6) 1-1
> Thanks,
> Prasad
> From: Alan Gates <gates@yahoo-inc.com>
> Reply-To: <core-user@hadoop.apache.org>
> Date: Tue, 7 Oct 2008 15:28:50 -0700
> To: <core-user@hadoop.apache.org>
> Subject: Questions regarding Hive metadata schema
> Hi,
> I've been looking over the db schema that hive uses to store it's
> metadata (package.jdo) and I had some questions:
>   1.  What do the field names in the TYPES table mean? TYPE1, TYPE2,
> and TYPE_FIELDS are all unclear to me.
>   2. In the TBLS (tables) table, what is sd?
>   3. What does the SERDES table store?
>   4. What does the SORT_ORDER table store? It appears to describe the
> ordering within a storage descriptor, which in turn appears to be
> related to a partition. Do you envision having a table where different
> partitions have different orders?
>   5. SDS (storage descriptor) table has a list of columns. Does this
> imply that columnar storage is supported?
>   6. What is the relationship between a storage descriptor and a
> partition? 1-1, 1-n?
> Thanks.
> Alan.

View raw message