hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/PartitionedViews" by JohnSichi
Date Wed, 02 Feb 2011 01:25:33 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/PartitionedViews" page has been changed by JohnSichi.
http://wiki.apache.org/hadoop/Hive/PartitionedViews?action=diff&rev1=8&rev2=9

--------------------------------------------------

  = Approaches =
  
   1. One possible approach mentioned in [[https://issues.apache.org/jira/browse/HIVE-1079|HIVE-1079]]
is to infer view partitions automatically based on the partitions of the underlying tables.
 A command such as SHOW PARTITIONS could then synthesize virtual partition descriptors on
the fly.  This is fairly easy to do for use case #1, but potentially very difficult for use
cases #2 and #3.  So for now, we are punting on this approach.
-  1. Instead, per [[https://issues.apache.org/jira/browse/HIVE-1941|HIVE-1941]], we will
require users to explicitly declare view partitioning as part of CREATE VIEW, and explicitly
manage partition metadata via ALTER VIEW {ADD|DROP} PARTITION.  This allows all of the use
cases to be satisfied (while placing more burden on the user, and taking up more metastore
space).  With this approach, there is no real connection between view partitions and underlying
table partitions; it's even possible to create a partitioned view on an unpartitioned table,
or to have data in the view which is not covered by any view partition.  One downside here
is that a UI will not be able to show physical information such as file size when browsing
available partitions.
+  1. Instead, per [[https://issues.apache.org/jira/browse/HIVE-1941|HIVE-1941]], we will
require users to explicitly declare view partitioning as part of CREATE VIEW, and explicitly
manage partition metadata via ALTER VIEW {ADD|DROP} PARTITION.  This allows all of the use
cases to be satisfied (while placing more burden on the user, and taking up more metastore
space).  With this approach, there is no real connection between view partitions and underlying
table partitions; it's even possible to create a partitioned view on an unpartitioned table,
or to have data in the view which is not covered by any view partition.  One downside here
is that a UI will not be able to show last access time and physical information such as file
size when browsing available partitions.  (And stats won't work without an explicit ANALYZE.)
  
  = Syntax =
  

Mime
View raw message