hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prafulla Tekawade (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-417) Implement Indexing in Hive
Date Wed, 09 Jun 2010 11:41:18 GMT

    [ https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877049#action_12877049
] 

Prafulla Tekawade commented on HIVE-417:
----------------------------------------

I was thinking of adding something called query rewrite module.
It would be rule-based query rewrite system and it would 
rewrite the query into semantically equivalent query which is 
more optimized and/or uses indexes (not just for scans, but
for other query operators, e.g. GroupBy etc.)

Eg.

select distinct c1
from t1;

This query, if we have densed index ('compact summary index' in this
hive indexing patch) on c1 can be replaced with query on index table 
itself.

select idx_key
from t1_cmpct_sum_idx;

Similar query transformation can happen for other queries.

Module will be placed just before optimizer and will help optimizer.
Module structure looks like below.

[Query parser]
[Query rewrites] --> new phase
[Query optimization]
[Query execution planner]
[Query execution engine]

The rewrite module is 'generic', not just for above indexing case,
but for other cases too, e.g. OR predicates to union (for efficiency?), outer join
to union of anti & semi joins, moving out 'order by' out of union
subquery etc etc.

The aim is to implement a very simple, light-weight rewrite support,
implement the indexing related rewrites (above rewrite does not
even need a new run-time map-red operator) and integrate indexing
support quickly and cleanly. As noted above, this rewrite phase
is rule-based (and not cost-based), sort of early optimization.

Let me know what u think. I'll start with reading ur patch.
This would do most part from TODO 1, 
TODO 2 and 3 will have to be looked into. 

> Implement Indexing in Hive
> --------------------------
>
>                 Key: HIVE-417
>                 URL: https://issues.apache.org/jira/browse/HIVE-417
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor
>    Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0
>            Reporter: Prasad Chakka
>            Assignee: He Yongqiang
>         Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, hive-indexing.3.patch
>
>
> Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message