phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <>
Subject [jira] [Commented] (PHOENIX-3536) Remove creating unnecessary phoenix connections in MR Tasks of Hive
Date Wed, 22 Feb 2017 07:02:44 GMT


James Taylor commented on PHOENIX-3536:

Agreed, [~sergey.soldatov]. Good catch. I'm also concerned with the copy/paste from ScanPlan.
I think the idea of the patch is good, but the implementation can be improved. Perhaps QueryPlan
(or ScanPlan) can be made serializable. Also, why is the overhead so high for recompiling
the plan? Is it the creation of the HConnection? I think we need more data on that.

> Remove creating unnecessary phoenix connections in MR Tasks of Hive
> -------------------------------------------------------------------
>                 Key: PHOENIX-3536
>                 URL:
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Jeongdae Kim
>            Assignee: Jeongdae Kim
>              Labels: HivePhoenix
>         Attachments: PHOENIX-3536.1.patch
> PhoenixStorageHandler creates phoenix connections to make QueryPlan in getSplit phase(prepare
MR) and getRecordReader phase(Map) while running MR Job.
> in phoenix, it spends too many times to create the first phoenix connection(QueryServices)
for specific URL. (checking and loading phoenix schema information)
> i found it is possible to remove creating query plan again in Map phase(getRecordReader())
by serializing QueryPlan created from Input format ans passing this plan to record reader.

>  this approach improves scan performance by removing trying to unnecessary connection
in map phase.

This message was sent by Atlassian JIRA

View raw message