pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liyunzhang_intel (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-4265) AlgebraicDoubleMathBase has "Java double precision problems"
Date Wed, 05 Nov 2014 07:25:34 GMT

     [ https://issues.apache.org/jira/browse/PIG-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

liyunzhang_intel updated PIG-4265:
----------------------------------
    Description: 
$PIG_HOME/bin/pig -x local RubyUDFs_10.pig
#RubyUDFs_10.pig

a = load 'studenttab10k' using PigStorage() as (name, age:int, gpa:double);
b = group a by name;
c = foreach b generate group, SUM(a.age), SUM(a.gpa);
d = foreach c generate $0, $1, (double)((int)$2*100)/100;
store d into 'local.output/RubyUDFs_10_benchmark.out';

the result in RubyUDFs_10.out/part
#grep "david s" RubyUDFs_10.out/part-r-00000 
david steinbeck	266	15.0

#grep "david s" studenttab10k
david steinbeck	21	2.44
david steinbeck	33	1.17
david steinbeck	42	1.94
david steinbeck	42	1.35
david steinbeck	31	2.77
david steinbeck	40	2.42
david steinbeck	57	3.91


when runing Ruby_UDFs.pig in spark, the sum(a.gpa) is 16.0 and (double)((int)$2*100)/100 will
be "david steinbeck	266	16.0".
when running Ruby_UDFs.pig in mapreduce mode, the sum(a.gpa) is 15.999999999999998 and (double)((int)$2*100)/100
will be "david steinbeck	266	15.0".



  was:
$PIG_HOME/bin/pig -x local RubyUDFs_10.pig
#RubyUDFs_10.pig

a = load 'studenttab10k' using PigStorage() as (name, age:int, gpa:double);
b = group a by name;
c = foreach b generate group, SUM(a.age), SUM(a.gpa);
d = foreach c generate $0, $1, (double)((int)$2*100)/100;
store d into 'local.output/RubyUDFs_10_benchmark.out';

the result in RubyUDFs_10.out/part
#grep "david s" RubyUDFs_10.out/part-r-00000 
david steinbeck	266	15.0

#grep "david s" studenttab10k
david steinbeck	21	2.44
david steinbeck	33	1.17
david steinbeck	42	1.94
david steinbeck	42	1.35
david steinbeck	31	2.77
david steinbeck	40	2.42
david steinbeck	57	3.91

when you sum all the gpa of "david steinbeck" in the file "studenttab10k", the result is "16"
while the result in RubyUDFs_10.out/part-r-00000 is "15". The reason is because double precision
problem in AlgebraicDoubleMathBase.java.
It sums all the gpa numbers to 15.999999-(double)((int)15.999999*100)/100 = 15.0.

{code}
AlgebraicDoubleMathBase.java
    private static Double doWork(Double arg1, Double arg2, KNOWN_OP op) {
        if (arg1 == null) {
            return arg2;
        } else if (arg2 == null) {
            return arg1;
        } else {
            switch (op) {
            case MAX: return Math.max(arg1, arg2);
            case MIN: return Math.min(arg1, arg2);
            case SUM: return arg1+arg2;  //this line has "Java BigDecimal precision problem"
            default: return null;
            }
        }
    }
{code}
The detail Java double precision problem you can refer "https://community.oracle.com/thread/2448849?tstart=0"




> AlgebraicDoubleMathBase has "Java double precision problems"
> ------------------------------------------------------------
>
>                 Key: PIG-4265
>                 URL: https://issues.apache.org/jira/browse/PIG-4265
>             Project: Pig
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>         Attachments: PIG-4265.patch
>
>
> $PIG_HOME/bin/pig -x local RubyUDFs_10.pig
> #RubyUDFs_10.pig
> a = load 'studenttab10k' using PigStorage() as (name, age:int, gpa:double);
> b = group a by name;
> c = foreach b generate group, SUM(a.age), SUM(a.gpa);
> d = foreach c generate $0, $1, (double)((int)$2*100)/100;
> store d into 'local.output/RubyUDFs_10_benchmark.out';
> the result in RubyUDFs_10.out/part
> #grep "david s" RubyUDFs_10.out/part-r-00000 
> david steinbeck	266	15.0
> #grep "david s" studenttab10k
> david steinbeck	21	2.44
> david steinbeck	33	1.17
> david steinbeck	42	1.94
> david steinbeck	42	1.35
> david steinbeck	31	2.77
> david steinbeck	40	2.42
> david steinbeck	57	3.91
> when runing Ruby_UDFs.pig in spark, the sum(a.gpa) is 16.0 and (double)((int)$2*100)/100
will be "david steinbeck	266	16.0".
> when running Ruby_UDFs.pig in mapreduce mode, the sum(a.gpa) is 15.999999999999998 and
(double)((int)$2*100)/100 will be "david steinbeck	266	15.0".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message