hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liyunzhang_intel (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-17010) Fix the overflow problem of Long type in SetSparkReducerParallelism
Date Mon, 03 Jul 2017 03:07:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-17010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

liyunzhang_intel updated HIVE-17010:
------------------------------------
    Description: 
 We use [numberOfByteshttps://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L129]
to collect the numberOfBytes of sibling of specified RS. We use Long type and it happens overflow
when the data is too big. After happening this situation, the parallelism is decided by [sparkMemoryAndCores.getSecond()|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L184]
if spark.dynamic.allocation.enabled is true, sparkMemoryAndCores.getSecond is a dymamic value
which is decided by spark runtime. For example, the value of sparkMemoryAndCores.getSecond
is 5 or 15 randomly. There is possibility that the value may be 1. The may problem here is
the overflow of addition of Long type.  You can reproduce the overflow problem by following
code
{code}
    public static void main(String[] args) {
      long a1= 9223372036854775807L;
      long a2=1022672;

      long res = a1+a2;
      System.out.println(res);  //-9223372036853753137

      BigInteger b1= BigInteger.valueOf(a1);
      BigInteger b2 = BigInteger.valueOf(a2);

      BigInteger bigRes = b1.add(b2);

      System.out.println(bigRes); //9223372036855798479

    }
{code}

> Fix the overflow problem of Long type in SetSparkReducerParallelism
> -------------------------------------------------------------------
>
>                 Key: HIVE-17010
>                 URL: https://issues.apache.org/jira/browse/HIVE-17010
>             Project: Hive
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>
>  We use [numberOfByteshttps://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L129]
to collect the numberOfBytes of sibling of specified RS. We use Long type and it happens overflow
when the data is too big. After happening this situation, the parallelism is decided by [sparkMemoryAndCores.getSecond()|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L184]
if spark.dynamic.allocation.enabled is true, sparkMemoryAndCores.getSecond is a dymamic value
which is decided by spark runtime. For example, the value of sparkMemoryAndCores.getSecond
is 5 or 15 randomly. There is possibility that the value may be 1. The may problem here is
the overflow of addition of Long type.  You can reproduce the overflow problem by following
code
> {code}
>     public static void main(String[] args) {
>       long a1= 9223372036854775807L;
>       long a2=1022672;
>       long res = a1+a2;
>       System.out.println(res);  //-9223372036853753137
>       BigInteger b1= BigInteger.valueOf(a1);
>       BigInteger b2 = BigInteger.valueOf(a2);
>       BigInteger bigRes = b1.add(b2);
>       System.out.println(bigRes); //9223372036855798479
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message