phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeongdae Kim (JIRA)" <>
Subject [jira] [Commented] (PHOENIX-3536) Remove creating unnecessary phoenix connections in MR Tasks of Hive
Date Tue, 21 Feb 2017 10:41:44 GMT


Jeongdae Kim commented on PHOENIX-3536:

[~jamestaylor] the intention of this patch is to reduce unnecessary operations that occur
during initializing phoenix connection in Hive Map Tasks. phoenix storage handler(PhoenixInputFormat)
makes input splits through phoenix JDBC connection when the Hive MR job submitted, and all
map tasks from the MR job create phoenix record readers that make phoenix connections respectively.

although all information(query plan) to execute the query is already obtained during job submission,
all map tasks try to make the query plan again that takes quite a long time to establish initial
phoenix connection establishment to load all phoenix metadata from system table in client
process (2~3 seconds in my test cases). with this patch, we can save quite a time for all
map tasks, because all map tasks skip initialization process of phoenix connection by re-using
the query plan created from prior process(job submission)

> Remove creating unnecessary phoenix connections in MR Tasks of Hive
> -------------------------------------------------------------------
>                 Key: PHOENIX-3536
>                 URL:
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Jeongdae Kim
>            Assignee: Jeongdae Kim
>              Labels: HivePhoenix
>         Attachments: PHOENIX-3536.1.patch
> PhoenixStorageHandler creates phoenix connections to make QueryPlan in getSplit phase(prepare
MR) and getRecordReader phase(Map) while running MR Job.
> in phoenix, it spends too many times to create the first phoenix connection(QueryServices)
for specific URL. (checking and loading phoenix schema information)
> i found it is possible to remove creating query plan again in Map phase(getRecordReader())
by serializing QueryPlan created from Input format ans passing this plan to record reader.

>  this approach improves scan performance by removing trying to unnecessary connection
in map phase.

This message was sent by Atlassian JIRA

View raw message