hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7796) Provide subquery pushdown facility for storage handlers
Date Mon, 15 Dec 2014 02:23:13 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246244#comment-14246244
] 

Ashutosh Chauhan commented on HIVE-7796:
----------------------------------------

Does this work as follows:
* Phoenix Jdbc handler implements {{HiveStorageSubQueryHandler}}
* Using source ast, TokenRewriteStream stream and QBParseInfo, it tries to recreate sql text.
* Phoenix jdbc handler than sends this query to phoenix which parses and plans this sql.
* Phoenix jdbc handler than constructs Hive's {{TableScanOperator}} which it returns via this
interface.
* This TSOp is hooked into Hive pipeline.
* All the data from hbase flows through Phoenix client to Hive.

Am I somewhere even remotely close : ) here about design. It will help immensely to write
up a design doc for this with what goal you are trying to achieve.

I am interested in this work, so want to understand more of this. If design inferred above
is remotely close to what you have implemented, than one area of concern is last bullet. This
design makes phoenix client a bottleneck. It will be much more scalable if we can suck in
data directly from RegionServers instead of phoenix client.

> Provide subquery pushdown facility for storage handlers
> -------------------------------------------------------
>
>                 Key: HIVE-7796
>                 URL: https://issues.apache.org/jira/browse/HIVE-7796
>             Project: Hive
>          Issue Type: Improvement
>          Components: StorageHandler
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-7796.1.patch.txt
>
>
> If underlying storage can handle basic filtering or aggregation, hive can delegate execution
of whole subquery to the storage and handle it as a simple scanning operation.
> Experimentally implemented on JDBC / Phoenix handler and seemed working good. Hopefully
open the code for those too, but it's not allowed to me yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message