hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liyunzhang_intel (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-17010) Fix the overflow problem of Long type in SetSparkReducerParallelism
Date Tue, 04 Jul 2017 02:40:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-17010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

liyunzhang_intel updated HIVE-17010:
------------------------------------
    Attachment: HIVE-17010.1.patch

[~Ferd]: can you help review HIVE-17010.1.patch?
We can use double to replace long type to solve the overflow problem.
{code}
   //long max=9223372036854775807
      long a1= 9223372036854775807L;
      long a2=1022672;

      long res = a1+a2;
      System.out.println(res);  //-9223372036853753137


      //double max=1.7976931348623157E308
      double d1= 9223372036854775807L;
      double d2=1022672;

      double dres = d1+d2;
      System.out.println(dres);//9.223372036855798E18

{code}


> Fix the overflow problem of Long type in SetSparkReducerParallelism
> -------------------------------------------------------------------
>
>                 Key: HIVE-17010
>                 URL: https://issues.apache.org/jira/browse/HIVE-17010
>             Project: Hive
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>         Attachments: HIVE-17010.1.patch
>
>
> [link title|http://example.com] We use [numberOfByteshttps://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L129]
to collect the numberOfBytes of sibling of specified RS. We use Long type and it happens overflow
when the data is too big. After happening this situation, the parallelism is decided by [sparkMemoryAndCores.getSecond()|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L184]
if spark.dynamic.allocation.enabled is true, sparkMemoryAndCores.getSecond is a dymamic value
which is decided by spark runtime. For example, the value of sparkMemoryAndCores.getSecond
is 5 or 15 randomly. There is possibility that the value may be 1. The may problem here is
the overflow of addition of Long type.  You can reproduce the overflow problem by following
code
> {code}
>     public static void main(String[] args) {
>       long a1= 9223372036854775807L;
>       long a2=1022672;
>       long res = a1+a2;
>       System.out.println(res);  //-9223372036853753137
>       BigInteger b1= BigInteger.valueOf(a1);
>       BigInteger b2 = BigInteger.valueOf(a2);
>       BigInteger bigRes = b1.add(b2);
>       System.out.println(bigRes); //9223372036855798479
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message