hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Sichi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-417) Implement Indexing in Hive
Date Sat, 03 Jul 2010 00:25:54 GMT

    [ https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884869#action_12884869
] 

John Sichi commented on HIVE-417:
---------------------------------

Had a chat with Ashish and Yongqiang offline, and came up with three alternatives.

1)  "Shortest path to checkin":  Treat current code as prototype and move it into contrib,
providing a utility for creating/updating the index, and keeping changes to core classes to
a minimum.  As Yongqiang pointed out, this makes it harder to follow up with automatic use
of the index due to the lack of metadata.  If we do this, we should create a new JIRA issue
for its limited scope.

2) "Full-fledged index support":  change the JDO metamodel to add support for indexes as first
class objects, and come up with a pluggable index creation+access design framework which can
encompass a variety of index types likely to be needed in the future.  Code from this patch
would become the first such index implementation provided.  If we do this, we should continue
on in this truly epic JIRA issue.

3) "Rework as materialized view":  keep the JDO metamodel as is (adding a new table type for
MATERIALIZED_VIEW) but change the DDL to CREATE MATERIALIZED VIEW AS SELECT ... and then come
up with the system functions needed (e.g. for accessing file offsets) in order to be able
to express the index construction as SQL.  We would then execute view materialization in a
fashion similar to CREATE TABLE AS SELECT.  This approach best reflects the way the current
code models an index as an ordinary table, but requires some other changes (e.g. CTAS + dynamic
partitioning, something we want anyway).  If we do this, we should create a new JIRA issue
since it's a different feature from the user POV.

We're aiming to reach a decision next week; input is welcome on whether these alternatives
make sense (and on others we should consider).

Since this JIRA issue is already so overloaded, we would also like to treat the following
two items as separate followup JIRA issues rather than trying to address it all at once:

* rewrite framework
* automatic usage of index or materialized view by optimizer


> Implement Indexing in Hive
> --------------------------
>
>                 Key: HIVE-417
>                 URL: https://issues.apache.org/jira/browse/HIVE-417
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor
>    Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0
>            Reporter: Prasad Chakka
>            Assignee: He Yongqiang
>         Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, hive-indexing.3.patch,
hive-indexing.5.thrift.patch, indexing_with_ql_rewrites_trunk_953221.patch
>
>
> Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message