hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/PartitionedViews" by JohnSichi
Date Tue, 01 Feb 2011 19:58:43 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/PartitionedViews" page has been changed by JohnSichi.
http://wiki.apache.org/hadoop/Hive/PartitionedViews?action=diff&rev1=1&rev2=2

--------------------------------------------------

  
  = Use Cases =
  
- * An administrator wants to create a set of views as a table/column renaming layer on top
of an existing set of base tables, without disturbing the ETL processes which load those tables.
 To read-only users, the views should behave exactly the same as the underlying tables in
every way.  Among other things, this means users should be able to browse available partitions.
+  1. An administrator wants to create a set of views as a table/column renaming layer on
top of an existing set of base tables, without disturbing the ETL processes which load those
tables.  To read-only users, the views should behave exactly the same as the underlying tables
in every way.  Among other things, this means users should be able to browse available partitions.
- * A base table is partitioned on columns (ds,hr) for date and hour.  Besides this fine-grained
partitioning, users would also like to see a virtual table of coarse-grained (date-only) partitioning
in which the partition for a given date only appears once all of the hour-level partitions
of that day have been fully loaded.
+  1. A base table is partitioned on columns (ds,hr) for date and hour.  Besides this fine-grained
partitioning, users would also like to see a virtual table of coarse-grained (date-only) partitioning
in which the partition for a given date only appears after all of the hour-level partitions
of that day have been fully loaded.
- * A view is defined on a complex join+union+aggregation of a number of underlying base tables
and other views, all of which are themselves partitioned.  The top-level view should also
be partitioned accordingly, with a new partition not appearing until corresponding partitions
have been loaded for all of the underlying tables.
+  1. A view is defined on a complex join+union+aggregation of a number of underlying base
tables and other views, all of which are themselves partitioned.  The top-level view should
also be partitioned accordingly, with a new partition not appearing until corresponding partitions
have been loaded for all of the underlying tables.
  
+ = Approaches =
+ 
+  1. One possible approach mentioned in [[https://issues.apache.org/jira/browse/HIVE-1079|HIVE-1079]]
is to infer view partitions automatically based on the partitions of the underlying tables.
 A command such as SHOW PARTITIONS could then synthesize virtual partition descriptors on
the fly.  This is fairly easy to do for use case #1, but potentially very difficult for use
cases #2 and #3.  So for now, we are punting on this approach.
+  1. Instead, we will require users to explicitly declare view partitioning as part of CREATE
VIEW, and explicitly manage partition metadata via ALTER VIEW {ADD|DROP} PARTITION.  This
allows all of the use cases to be satisfied (while placing more burden on the user, and taking
up more metastore space).
+ 

Mime
View raw message