hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhichun Wu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-4997) HCatalog doesn't allow multiple input tables
Date Sat, 16 Aug 2014 17:22:18 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099716#comment-14099716
] 

Zhichun Wu commented on HIVE-4997:
----------------------------------

@ [~dintskirveli] :

Your approach tries to attach each InputInfo to InputSplit in HCatDelegatingInputFormat#getSplits,
and generate InputJobInfo in HCatDelegatingInputFormat#createRecordReader with the inputInfo
attached. It has to query hive metastore service when generating InputJobInfo in each map
, so I think it may have an impact on metastore service when the maps are huge. Also when
we setup an security hadoop cluster, each map has to acquire a delegation token in order to
access metastore service. The current patch hasn't take this part into consideration.

Here I think we can generate each InputJobInfo every time we add a table and then we can serialize
and attach Array<InputJobInfo> to job conf, we can fetch each inputJobInfo from job
conf in getSplits and createRecordReader. This will avoid query metastore service in map phase.
I've change the usage of adding multiple input tables as below:
{code}
 HCatMultipleInputs.init(job);
 HCatMultipleInputs.addInput(test_table1, "default", null, SequenceMapper.class);
 HCatMultipleInputs.addInput(test_table2, null, "part='1'", TextMapper1.class);
 HCatMultipleInputs.addInput(test_table2, null, "part='2'", TextMapper2.class);
 HCatMultipleInputs.build();
{code}

I've upload HIVE-4997.4.patch which based on HIVE-4997.3.patch. It works on our security hadoop
2.2.0 cluster.  It just works and I upload it for demonstrate the idea. I haven't put much
thought into the quality of code and the design of this new feature.

 

> HCatalog doesn't allow multiple input tables
> --------------------------------------------
>
>                 Key: HIVE-4997
>                 URL: https://issues.apache.org/jira/browse/HIVE-4997
>             Project: Hive
>          Issue Type: Improvement
>          Components: HCatalog
>    Affects Versions: 0.13.0
>            Reporter: Daniel Intskirveli
>             Fix For: 0.14.0
>
>         Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch, HIVE-4997.4.patch
>
>
> HCatInputFormat does not allow reading from multiple hive tables in the same MapReduce
job. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message