hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <>
Subject [jira] Commented: (HIVE-417) Implement Indexing in Hive
Date Thu, 22 Jul 2010 18:34:55 GMT


Ning Zhang commented on HIVE-417:

Based on some internal discussions below are some comments about the design doc:

1) the staleness (inconsistency) between the index and the base table should be addressed
more precisely. 
   Since the current implementation allows the user to query the index table directly, we
should guarantee that the index is consistent with the base table at the query time. This
means at the query START time, the index was built completely based on the data stored in
the base table. The current design does not satisfy this criteria in that it only record the
last_modification_time (LMT) of the base table and the index table, and check if the latter
is larger than the former. This leaves the following example break:

timestamp0: last update of partition P1
timestamp1: start create index on partition P1
timestamp2: start insert overwrite P1
timestamp3: finish insert overwrite P1
timestamp4: finish index creation on P1
timestamp 5: query on P1

The LMTs of the index and the base table are timestamp4 and timestamp3 respectively so the
optimizer will conclude the index is consistent with base table. However, the index was built
based on stale data at the timestamp5. So the index should not be used. 

Instead of recording the LMT of the index table, we probably should record the LMT of the
base table in the index metadata at the beginning of the index creation.  In the above example,
the timestamp recorded in the index metadata should be timestamp0. This means the index was
created based on the base table at timestamp0. At the query time, we should check timestamp0
against timestamp 3, which correctly conclude the index is stale. 

BTW, all the timestamp should be coming from some centralized clock such as the DFS directory
update time (from the namenode).

2) The above consistency problem does not only present in the case of "DEFERRED REBUILD".
Even if the index rebuild starts right away after INSERT OVERWRITE, there is still a time
window that the index is stale (before the index creation is complete). So we need the same
mechanism to figure out stale indexes. 

3) I think a lock-based concurrency may not be the best choice as well. If the index creation
takes a long time, it defers the availability of the base table. If we have the optimizer,
we should always query against the base tables, and let the optimizer to figure out whether
an index is available and fresh. So if an index creation is not finished, we can just use
the base table, otherwise we can use the index if the cost is less expensive. 

4) Another case is that if the index creation finished and the query is using the index, and
then an DML happened on the base table and finished before the query finish. Here we only
guarantee snapshot consistency (results consisting with the data at the beginning of the query,
not after the query). 

5) If we have the mechanism to check consistency of the index, then the "index rebuild" command
could just return if the index is consistent. We can also allow a "force" option in case we
need to compensate for bad metadata. 

> Implement Indexing in Hive
> --------------------------
>                 Key: HIVE-417
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor
>    Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0
>            Reporter: Prasad Chakka
>            Assignee: He Yongqiang
>         Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, hive-indexing-8-thrift-metastore-remodel.patch,
hive-indexing.3.patch, hive-indexing.5.thrift.patch, idx2.png, indexing_with_ql_rewrites_trunk_953221.patch
> Implement indexing on Hive so that lookup and range queries are efficient.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message