hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Sichi (JIRA)" <>
Subject [jira] [Commented] (HIVE-1644) use filter pushdown for automatically accessing indexes
Date Fri, 29 Apr 2011 02:30:03 GMT


John Sichi commented on HIVE-1644:

OK, I dug into this and found out what's going on.

As you mentioned in the conf call, the order of operations in SemanticAnalyzer.genMapRedTasks
is such that physical optimization happens after GenMRTableScan1.  So the code in GenMRTableScan1
is totally irrelevant and can be removed.

You are setting the input format and intermediate file on the correct work object already
inside of IndexWhereProcessor.

What's going wrong is that the test is using MapRedTask instead of its superclass ExecDriver.
 And MapRedTask is missing the code to propagate the attributes from the work into the job
conf.  So we need to make this code from ExecDriver into a helper method setInputAttributes:

    if (work.getInputformat() != null) {
      HiveConf.setVar(job, HiveConf.ConfVars.HIVEINPUTFORMAT, work.getInputformat());
    if (work.getIndexIntermediateFile() != null) {
      job.set("hive.index.compact.file", work.getIndexIntermediateFile());

and then invoke setInputAttributes from within MapRedTask.execute, just before the "// enable
assertion" comment.

When I do this, then I can see the correct input format and intermediate file being set on
the spawned job.  (Speaking of the intermediate file, can we get rid of /tmp/index_banana?

The test passes with or without this change, indicating there could still be some other problem
(since the point of the test is to demonstrate different behavior when the index is being
used).  However, I'm not sure about the test itself since it is now using a range condition
where before it was using an equality condition, and block-level indexing means a block could
contain the extra values as long as a single value (47 in this case) is hit by the index.
 But you're using text files for some reason, and I still don't know exactly how the "blocks"
work there.

> use filter pushdown for automatically accessing indexes
> -------------------------------------------------------
>                 Key: HIVE-1644
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Indexing
>    Affects Versions: 0.8.0
>            Reporter: John Sichi
>            Assignee: Russell Melick
>         Attachments: HIVE-1644.1.patch, HIVE-1644.10.patch, HIVE-1644.11.patch, HIVE-1644.12.patch,
HIVE-1644.13.patch, HIVE-1644.14.patch, HIVE-1644.15.patch, HIVE-1644.16.patch, HIVE-1644.17.patch,
HIVE-1644.2.patch, HIVE-1644.3.patch, HIVE-1644.4.patch, HIVE-1644.5.patch, HIVE-1644.6.patch,
HIVE-1644.7.patch, HIVE-1644.8.patch, HIVE-1644.9.patch
> HIVE-1226 provides utilities for analyzing filters which have been pushed down to a table
scan.  The next step is to use these for selecting available indexes and generating access
plans for those indexes.

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message