hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Sichi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-417) Implement Indexing in Hive
Date Fri, 16 Jul 2010 21:47:56 GMT

    [ https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889345#action_12889345
] 

John Sichi commented on HIVE-417:
---------------------------------

Here are some preliminary comments on the metastore work.  We can move on to the plugin design
next week and start getting all of this into a doc.

* We should support a property on the index which controls the name of the index table, and
only generate an index table name automatically in the case where the user doesn't supply
the property.  For this, we'll need to add property key/values to the grammar (IDXPROPERTIES
like TBLPROPERTIES and SERDEPROPERTIES?).

* The grammar supports control over the tableFileFormat for the index table; what about other
attributes such as row format, location, and TBLPROPERTIES?  Some of these may be dictated
by the index implementation, but it may be useful to override in some cases (same as tableFileFormat).

* Is the partitioning for the index independent of the partitioning for the table?  Don't
we need to allow control over this in the grammar?

* I think we should track the status of the index (when was the last time it was rebuilt,
if ever) so that we know whether it is fresh with respect to the base table data.  How should
we model this in such a way that it takes per-partition indexing into account?

* Some metastore followups to be logged separately:  COMMENT clause on index definition; DESCRIBE
INDEX; SHOW INDEXES; dealing with base table columns being dropped/renamed out from under
the index

* For generating the index table structure, we'll need to move that to plugin (rather than
in Hive.java), since each index will need a different table structure (or no table structure
at all).

* Test queries:  remember to add ORDER BY for determinism.  Also, I'm not sure whether it
is safe to use /tmp in the local file system (it may not exist, e.g. on Windows).  I used
it in hbase_bulk.m, but that uses a mini HDFS cluster (not the local file system).

* Dropping a table with an index on it currently gives the exception below (in Derby; I didn't
test MySQL yet).  Same for attempting to drop an index table directly (instead of dropping
the index).  The second case should either fail with a meaningful exception, or implicitly
drop the index definition as a trigger from dropping the table.

hive> create table t1(i int);
OK
hive> create index q type compact on table t1(i);
OK
hive> drop table t1;
FAILED: Error in metadata: javax.jdo.JDODataStoreException: Exception thrown flushing changes
to datastore
NestedThrowables:
java.sql.BatchUpdateException: DELETE on table 'TBLS' caused a violation of foreign key constraint
'INDEXS_FK3' for key (12).  The statement has been rolled back.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

hive> create table t5(i int);
OK
hive> create index r type compact on table t5(i);
OK
hive> drop table default__t5_r__;
FAILED: Error in metadata: javax.jdo.JDODataStoreException: Exception thrown flushing changes
to datastore
NestedThrowables:
java.sql.BatchUpdateException: DELETE on table 'TBLS' caused a violation of foreign key constraint
'INDEXS_FK2' for key (17).  The statement has been rolled back.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask


> Implement Indexing in Hive
> --------------------------
>
>                 Key: HIVE-417
>                 URL: https://issues.apache.org/jira/browse/HIVE-417
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor
>    Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0
>            Reporter: Prasad Chakka
>            Assignee: He Yongqiang
>         Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, hive-indexing-8-thrift-metastore-remodel.patch,
hive-indexing.3.patch, hive-indexing.5.thrift.patch, idx2.png, indexing_with_ql_rewrites_trunk_953221.patch
>
>
> Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message