hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/Design" by AllenSmith
Date Sat, 14 May 2011 19:35:36 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/Design" page has been changed by AllenSmith.
http://wiki.apache.org/hadoop/Hive/Design?action=diff&rev1=13&rev2=14

--------------------------------------------------

  Meta Store store provides two important but often over looked features of a data warehouse:
data abstraction and data discovery. Without the data abstractions provided in Hive, user
has to provide information about data formats, exractors and loaders along with the query.
In Hive, this information given during table creation and reused everytime the table is referenced.
This is very similar to the traditional warehousing systems. The second functionality, data
discovery, enables users to discover and explore relevant and specific data in the warehouse.
Other tools can be built using this metadata to expose and possibly enhance the information
about the data and its availability. Hive accomplishes both of these features by providing
a metdata repository that is tightly integrated with the Hive query processing system so that
data and metadata are in sync.
  
  === Metadata Objects ===
-  * Database - is a namespace for tables. It can be used as an administrative unit in future.
The database 'default' is used for tables with no user supplied database name.
+  * Database - is a namespace for tables. It can be used as an <span class="plainlinks">[http://www.outdoorfountains.com/
<span style="color:black;font-weight:normal; text-decoration:none!important; background:none!important;
text-decoration:none;">outdoor fountains</span>] administrative unit in future. The
database 'default' is used for tables with no user supplied database name.
   * Table - Metadata for table contains list of columns, owner, storage and SerDe information.
It can also contain any user supplied key and value data. Storage information includes location
of the underlying data, file inout and output formats and bucketing information. SerDe metadata
includes the implementation class of serializer and deserializer and any supporting information
required by the implementation. All of these information can be provided during the creation
of table.
   * Partition - Each partition can have its own columns and SerDe and storage information.
This facilitates schema changes without affecting older partitions.
  

Mime
View raw message