hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "BELUGA BEHR (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16758) Better Select Number of Replications
Date Mon, 07 Aug 2017 01:28:02 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115993#comment-16115993
] 

BELUGA BEHR commented on HIVE-16758:
------------------------------------

[~csun] I created two patches.  One has a default replication of 10 and the other or 1.  Now
that I look at it again, we should probably stick with the value of 10 since this will match
the default setting of {{mapreduce.client.submit.file.replication}} in mapred-default.xml.

I guess the issue is that we're overloading the meaning of this configuration.  One might
be surprised to discover that it is being used in this manner, but the two use cases are very
similar, therefore it's probably better to extend the use of this configuration than to introduce
yet another configuration.  That's my two cents.

> Better Select Number of Replications
> ------------------------------------
>
>                 Key: HIVE-16758
>                 URL: https://issues.apache.org/jira/browse/HIVE-16758
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: BELUGA BEHR
>            Assignee: BELUGA BEHR
>            Priority: Minor
>         Attachments: HIVE-16758.1.patch, HIVE-16758.2.patch, HIVE-16758.3.patch
>
>
> {{org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java}}
> We should be smarter about how we pick a replication number.  We should add a new configuration
equivalent to {{mapreduce.client.submit.file.replication}}.  This value should be around the
square root of the number of nodes and not hard-coded in the code.
> {code}
> public static final String DFS_REPLICATION_MAX = "dfs.replication.max";
> private int minReplication = 10;
>   @Override
>   protected void initializeOp(Configuration hconf) throws HiveException {
> ...
>     int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication);
>     // minReplication value should not cross the value of dfs.replication.max
>     minReplication = Math.min(minReplication, dfsMaxReplication);
>   }
> {code}
> https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message