hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "BELUGA BEHR (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-16758) Better Select Number of Replications
Date Thu, 25 May 2017 15:05:04 GMT
BELUGA BEHR created HIVE-16758:
----------------------------------

             Summary: Better Select Number of Replications
                 Key: HIVE-16758
                 URL: https://issues.apache.org/jira/browse/HIVE-16758
             Project: Hive
          Issue Type: Improvement
            Reporter: BELUGA BEHR
            Priority: Minor


{{org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java}}

We should be smarter about how we pick a replication number.  We should add a new configuration
equivalent to {{mapreduce.client.submit.file.replication}}.  This value should be around the
square root of the number of nodes and not hard-coded in the code.

{code}
public static final String DFS_REPLICATION_MAX = "dfs.replication.max";
private int minReplication = 10;

  @Override
  protected void initializeOp(Configuration hconf) throws HiveException {
...
    int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication);
    // minReplication value should not cross the value of dfs.replication.max
    minReplication = Math.min(minReplication, dfsMaxReplication);
  }
{code}

https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message