hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasad Chakka (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-417) Implement Indexing in Hive
Date Sun, 17 May 2009 15:41:45 GMT

    [ https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710204#action_12710204

Prasad Chakka commented on HIVE-417:

The question you raised applies only to B+Tree indexes. The index that I defined above is
not really a traditional database index but a kind of summary table (or view) and any lookup/range-query
on table requires reading of the whole index. So you can apply all predicates as long as columns
referenced in the predicates exist in the index. So we should be able use index on (col1,
col2, col3) for all the queries above. Sorting order has no impact here since the whole index
is read into memory anyways.

Since this index can be created in sorted order, we can create sparse index (similar to non-leaf
nodes of a B+-Tree) if the index itself is too big (ie, index sizes are order of magnitude
larger than HDFS block size). But this can be done as a later optimization.  

With the design above, indexes on joins will come free since predicate pushdown will push
the 'user.name="user_name"' to above the join and only index filtered rows participate in

But creating indexes on the joined output may increase the index size so as to decrease the
overall effectiveness. But with sparse indexes this problem might be mitigated so we can support
this kind of join indexes along with support for sparse indexes.

Yes, for some aggregation queries it may make sense to read the index (since it is a summary
table as well). Aggregations or any queries that involve only columns from the index can operate
only on the index and not the main table.

I also looked at it and not sure how it fits into Hive. Katta is more like an distributed
index server.

> Implement Indexing in Hive
> --------------------------
>                 Key: HIVE-417
>                 URL: https://issues.apache.org/jira/browse/HIVE-417
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor
>    Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.4.0
>            Reporter: Prasad Chakka
>            Assignee: He Yongqiang
> Implement indexing on Hive so that lookup and range queries are efficient.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message