hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/ViewDev" by JohnSichi
Date Wed, 20 Jan 2010 23:47:09 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/ViewDev" page has been changed by JohnSichi.
http://wiki.apache.org/hadoop/Hive/ViewDev?action=diff&rev1=10&rev2=11

--------------------------------------------------

  
  '''Update 30-Dec-2009''':  Based on a design review meeting, we're going to go with the
flat model.  Prasad pointed out that in the future, for materialized views, we may need the
view definition to be tracked at the partition level as well, so that when we change the view
definition, we don't have to discard existing materialized partitions if the new view result
can be derived from the old one.  So it may make sense to add the view definition as a new
attribute of StorageDescriptor (since that is already present at both table and partition
level).
  
+ '''Update 20-Jan-2010''':  After further discussion with Prasad, we decided to put the view
definition on the table object instead; for details, see JIRA.
+ 
  == Dependency Tracking ==
  
  It's necessary to track dependencies from a view to objects it references in the metastore:
@@ -97, +99 @@

  
  However, if later we want to introduce persistent functions, or track column dependencies,
this model will be insufficient, and we may need to introduce inheritance, with a DependencyParticipant
base class from which tables, columns, functions etc all derive.  (Again, need to verify that
JDO inheritance will actually support what we want here.)
  
- '''Update 30-Dec-2009''':  Based on a design review meeting, we'll start with the bare-minimum
MySQL approach (with no metastore support for dependency tracking), then if time allows, add
dependency analysis and storage, followed by CASCADE support.
+ '''Update 30-Dec-2009''':  Based on a design review meeting, we'll start with the bare-minimum
MySQL approach (with no metastore support for dependency tracking), then if time allows, add
dependency analysis and storage, followed by CASCADE support.  See HIVE-1073 and HIVE-1074.
  
  == Dependency Invalidation ==
  
@@ -108, +110 @@

  
  Note that besides table modifications, other operations such as CREATE OR REPLACE VIEW have
similar issues (since views can reference other views).  The lenient approach provides a reasonable
solution for the related issue of external tables whose schemas may be dynamic (not sure if
we currently support this).
  
- '''Update 30-Dec-2009''':  Based on a design review meeting, we'll start with the lenient
approach, without any support for marking objects invalid in the metastore, then if time allows,
follow up with strict support and possibly metastore support for tracking object validity.
+ '''Update 30-Dec-2009''':  Based on a design review meeting, we'll start with the lenient
approach, without any support for marking objects invalid in the metastore, then if time allows,
follow up with strict support and possibly metastore support for tracking object validity.
 See HIVE-1077.
  
  == View Modification ==
  
@@ -119, +121 @@

  
  Note that supporting view modification requires detection of cyclic view definitions, which
should be invalid.  Whether this detection is carried out at the time of view modification
versus reference is dependent on the strict versus lenient approaches to dependency invalidation
described above.
  
- '''Update 30-Dec-2009''':  Based on a design review meeting, we'll start with an Oracle-style
ALTER VIEW v RECOMPILE, which can be used to revalidate a view definition, as well as to re-expand
the original definition for clauses such as select *.  Then if time allows, we'll follow up
with CREATE OR REPLACE VIEW support.  (The latter is less important since we're going with
the lenient invalidation model, making DROP and re-CREATE possible without having to deal
with downstream dependencies.)
+ '''Update 30-Dec-2009''':  Based on a design review meeting, we'll start with an Oracle-style
ALTER VIEW v RECOMPILE, which can be used to revalidate a view definition, as well as to re-expand
the original definition for clauses such as select *.  Then if time allows, we'll follow up
with CREATE OR REPLACE VIEW support.  (The latter is less important since we're going with
the lenient invalidation model, making DROP and re-CREATE possible without having to deal
with downstream dependencies.)  See HIVE-1077 and HIVE-1078.
  
  == Fast Path Execution ==
  
@@ -135, +137 @@

  
  == Underlying Partition Dependencies ==
  
- '''Update 30-Dec-2009''':  Prasad pointed out that even without supporting materialized
views, it may be necessary to provide users with metadata about data dependencies between
views and underlying table partitions so that users can avoid seeing inconsistent results
during the window when not all partitions have been refreshed with the latest data.  One option
is to attempt to derive this information automatically (using an overconservative guess in
cases where the dependency analysis can't be made smart enough); another is to allow view
creators to declare the dependency rules in some fashion as part of the view definition. 
Based on a design review meeting, we will probably go with the automatic analysis approach
once dependency tracking is implemented.  The analysis will be performed on-demand, perhaps
as part of describing the view or submitting a query job against it.  Until this becomes available,
users may be able to do their own analysis either via empirical lineage tools or via view->table
dependency tracking metadata once it is implemented.
+ '''Update 30-Dec-2009''':  Prasad pointed out that even without supporting materialized
views, it may be necessary to provide users with metadata about data dependencies between
views and underlying table partitions so that users can avoid seeing inconsistent results
during the window when not all partitions have been refreshed with the latest data.  One option
is to attempt to derive this information automatically (using an overconservative guess in
cases where the dependency analysis can't be made smart enough); another is to allow view
creators to declare the dependency rules in some fashion as part of the view definition. 
Based on a design review meeting, we will probably go with the automatic analysis approach
once dependency tracking is implemented.  The analysis will be performed on-demand, perhaps
as part of describing the view or submitting a query job against it.  Until this becomes available,
users may be able to do their own analysis either via empirical lineage tools or via view->table
dependency tracking metadata once it is implemented.  See HIVE-1079.
  

Mime
View raw message