hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carl Steinbach (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-2989) Adding Table Links to Hive
Date Wed, 06 Jun 2012 00:59:23 GMT

    [ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289887#comment-13289887
] 

Carl Steinbach commented on HIVE-2989:
--------------------------------------

@Bhushan/Namit: Any thoughts on this?
{quote}
I think it would be a good idea to require the user to name the table link (using a valid
SQL identifier) instead of tying the name of the link to the table/db it points to. Table
links (like views) can provide a useful level of indirection, but we lose that if the name
of link has to map directly to the target table. For example, suppose table t1 exists in the
default db, and the table link for t1 is created in database db1. Suppose at some later point
t1 is moved to a different db, or that the name of t1 is changed. With the current implementation
we would then also have to change the name of the link, which would require us to also change
any applications or scripts that refer to this link. With the other approach we would only
need to ALTER the details of the link but would be able leave the name of link unchanged.
{quote}

I'll add to this that using the tab_name@db_name syntax also requires us to weaken the grammar
and add lots of special-case logic to the methods that validate table names.
                
> Adding Table Links to Hive
> --------------------------
>
>                 Key: HIVE-2989
>                 URL: https://issues.apache.org/jira/browse/HIVE-2989
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore, Query Processor, Security
>    Affects Versions: 0.10.0
>            Reporter: Bhushan Mandhani
>            Assignee: Bhushan Mandhani
>         Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt,
HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> This will add Table Links to Hive. This will be an alternate mechanism for a user to
access tables and data in a database that is different from the one he is associated with.
This feature can be used to provide access control (if access to databasename.tablename in
queries and "use database X" is turned off in conjunction).
> If db X wants to access one or more partitions from table T in db Y, the user will issue:
> CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
> New partitions added to T will automatically be added to the link as well and become
available to X. However, if the link is specified to be static, that will not be the case.
The X user will then have to explicitly import each partition of T that he needs. The command
above will not actually make any existing partitions of T available to X. Instead, we provide
the following command to add an existing partition to a link:
> ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
> The user will need to execute the above for each existing partition that needs to be
imported. For future partitions, Hive will take care of this. An imported partition can be
dropped from a link using a similar command. We just specify "DROP" instead of "ADD". For
querying the linked table, the X user will refer to it as T@Y. Link Tables will only have
read access and not be writable. The entire Table Link alongwith all its imported partitions
can be dropped as follows:
> DROP LINK TO T@Y
> The above commands are purely MetaStore operations. The implementation will rely on replicating
the entire partition metadata when a partition is added to a link.  For every link that is
created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of
value "LINK_TABLE" (or "STATIC_LINK_TABLE" if the link has been specified as static). A new
column LINK_TBL_ID will be added which will contain the id of the imported table. It will
be NULL for all other table types including the regular managed tables. When a partition is
added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same
database  and not the master table in the other database. We will replicate all the metadata
for this partition from the master database. The advantage of this approach is that fewer
changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like "SHOW
TABLES" and "SHOW PARTITIONS" will work as expected for LINK_TABLEs too. Of course, even though
the metadata is not shared, the underlying data on disk is still shared. Hive still needs
to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the
underlying data from HDFS. Views and external tables cannot be imported from one database
to another.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message