hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "BELUGA BEHR (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16758) Better Select Number of Replications
Date Mon, 05 Jun 2017 16:32:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037179#comment-16037179
] 

BELUGA BEHR commented on HIVE-16758:
------------------------------------

Patch:

# Set the default number of replications to 1 to support single-node test clusters
# Determine the number of replications based on {{mapreduce.client.submit.file.replication}}
instead of DFS replication max
# Removed logic which increased the Hash Table Sink file replication to be based on the target
directory's default replication instead of the configured amount.  This is confusing because
it overrides a user setting without explaining to the user why their configuration has been
changed.  Additionally, this replication is about making the data locality reasonable for
Executor tasks and not about protecting data.  The default replication value has a very different
goal than this replication value and therefore should not be linked.

> Better Select Number of Replications
> ------------------------------------
>
>                 Key: HIVE-16758
>                 URL: https://issues.apache.org/jira/browse/HIVE-16758
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: BELUGA BEHR
>            Priority: Minor
>         Attachments: HIVE-16758.1.patch
>
>
> {{org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java}}
> We should be smarter about how we pick a replication number.  We should add a new configuration
equivalent to {{mapreduce.client.submit.file.replication}}.  This value should be around the
square root of the number of nodes and not hard-coded in the code.
> {code}
> public static final String DFS_REPLICATION_MAX = "dfs.replication.max";
> private int minReplication = 10;
>   @Override
>   protected void initializeOp(Configuration hconf) throws HiveException {
> ...
>     int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication);
>     // minReplication value should not cross the value of dfs.replication.max
>     minReplication = Math.min(minReplication, dfsMaxReplication);
>   }
> {code}
> https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message