hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/ViewDev" by JohnSichi
Date Thu, 28 Jan 2010 23:33:27 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/ViewDev" page has been changed by JohnSichi.
http://wiki.apache.org/hadoop/Hive/ViewDev?action=diff&rev1=11&rev2=12

--------------------------------------------------

  
  '''Update 30-Dec-2009''':  Prasad pointed out that even without supporting materialized
views, it may be necessary to provide users with metadata about data dependencies between
views and underlying table partitions so that users can avoid seeing inconsistent results
during the window when not all partitions have been refreshed with the latest data.  One option
is to attempt to derive this information automatically (using an overconservative guess in
cases where the dependency analysis can't be made smart enough); another is to allow view
creators to declare the dependency rules in some fashion as part of the view definition. 
Based on a design review meeting, we will probably go with the automatic analysis approach
once dependency tracking is implemented.  The analysis will be performed on-demand, perhaps
as part of describing the view or submitting a query job against it.  Until this becomes available,
users may be able to do their own analysis either via empirical lineage tools or via view->table
dependency tracking metadata once it is implemented.  See HIVE-1079.
  
+ = Metastore Upgrades =
+ 
+ Since we are adding new columns to the TBLS table in the metastore schema, existing metastore
deployments will need to be upgraded.  There are two ways this can happen.
+ 
+ == Automatic ALTER TABLE ==
+ 
+ If the following property is set in the Hive configuration file, JDO will notice the difference
between the persistent schema and the model and ALTER the tables automatically:
+ 
+ {{{
+ <property>
+ <name>datanucleus.autoCreateSchema</name>
+ <value>true</value>
+ </property>
+ }}}
+ 
+ == Explicit ALTER TABLE ==
+ 
+ However, if the {{{datanucleus.autoCreateSchema}}} property is set to {{{false}}}, then
the ALTER statements must be executed explicitly.  (An administrator may have set this property
for safety in production configurations.)
+ 
+ In this case, execute a script such as the following against the metastore database:
+ 
+ {{{
+ ALTER TABLE TBLS ADD COLUMN VIEW_ORIGINAL_TEXT MEDIUMTEXT;
+ ALTER TABLE TBLS ADD COLUMN VIEW_EXPANDED_TEXT MEDIUMTEXT;
+ ALTER TABLE TBLS ADD COLUMN TBL_TYPE VARCHAR(128);
+ }}}
+ 
+ The syntax here is for MySQL, so you may need to adjust it (particularly for CLOB datatype).
+ 
+ Note that it should be safe to execute this script and continue operations BEFORE upgrading
Hive; the old Hive version will simply ignore/nullify the columns it doesn't recognize.
+ 
+ == Existing Row UPDATE ==
+ 
+ After the tables are altered, the new columns will contain NULL values for existing rows
describing previously created tables.  This is correct for VIEW_ORIGINAL_TEXT and VIEW_EXPANDED_TEXT
(since views did not previously exist), but is incorrect for the TBL_TYPE column introduced
by [[https://issues.apache.org/jira/browse/HIVE-1068|HIVE-1068]].  The new Hive code is capable
of handling this (automatically filling in the correct value for the new field when a descriptor
is retrieved), but it does not "fix" the stored rows.  This could be an issue if in the future
other tools are used to retrieve information directly from the metastore database rather than
accessing the metastore API.
+ 
+ The script below can be used to fix existing rows after the tables have been altered.  It
should be run in the specified order, and AFTER all Hive instances directly accessing the
metastore database have been upgraded.  For safety, it is view-aware just in case a CREATE
VIEW statement has already been executed, meaning it can be rerun any time after the upgrade.
+ 
+ {{{
+ UPDATE TBLS SET TBL_TYPE='MANAGED_TABLE'
+ WHERE VIEW_ORIGINAL_TEXT IS NULL
+ AND NOT EXISTS(
+     SELECT * FROM TABLE_PARAMS 
+     WHERE TABLE_PARAMS.TBL_ID=TBLS.TBL_ID
+     AND PARAM_KEY='EXTERNAL'
+     AND PARAM_VALUE='TRUE'
+ );
+ UPDATE TBLS SET TBL_TYPE='EXTERNAL_TABLE'
+ WHERE EXISTS(
+     SELECT * FROM TABLE_PARAMS 
+     WHERE TABLE_PARAMS.TBL_ID=TBLS.TBL_ID
+     AND PARAM_KEY='EXTERNAL'
+     AND PARAM_VALUE='TRUE'
+ );
+ UPDATE TBLS SET TBL_TYPE='VIRTUAL_VIEW'
+ WHERE VIEW_ORIGINAL_TEXT IS NOT NULL;
+ }}}
+ 
+ For MySQL, note that the "safe updates" feature will need to be disabled since these are
full-table updates.
+ 

Mime
View raw message