pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liyunzhang_intel (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-4265) AlgebraicDoubleMathBase has "Java double precision problems"
Date Tue, 04 Nov 2014 02:37:33 GMT

     [ https://issues.apache.org/jira/browse/PIG-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

liyunzhang_intel updated PIG-4265:
----------------------------------
    Attachment: PIG-4265.patch

The modification of this patch is:  Use BigDecimal to solve the problem
{code}
AlgebraicDoubleMathBase.java
    private static Double doWork(Double arg1, Double arg2, KNOWN_OP op) {
        if (arg1 == null) {
            return arg2;
        } else if (arg2 == null) {
            return arg1;
        } else {
            switch (op) {
            case MAX: return Math.max(arg1, arg2);
            case MIN: return Math.min(arg1, arg2);
            case SUM: return  BigDecimal.valueOf(arg1).add(BigDecimal.valueOf(arg2)).doubleValue();
            default: return null;
            }
        }
    }
{code}

> AlgebraicDoubleMathBase has "Java double precision problems"
> ------------------------------------------------------------
>
>                 Key: PIG-4265
>                 URL: https://issues.apache.org/jira/browse/PIG-4265
>             Project: Pig
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>         Attachments: PIG-4265.patch
>
>
> $PIG_HOME/bin/pig -x local RubyUDFs_10.pig
> #RubyUDFs_10.pig
> register '/home/zly/prj/oss/pig/bin/libexec/ruby/scriptingudfs.rb' using jruby as myfuncs;
> a = load 'studenttab10k' using PigStorage() as (name, age:int, gpa:double);
> b = group a by name;
> c = foreach b generate group, myfuncs.Sum(a.age), myfuncs.Sum(a.gpa);
> d = foreach c generate $0, $1, (double)((int)$2*100)/100;
> store d into 'RubyUDFs_10.out';
> the result in RubyUDFs_10.out/part
> #grep "david s" RubyUDFs_10.out/part-r-00000 
> david steinbeck	266	15.0
> #grep "david s" studenttab10k
> david steinbeck	21	2.44
> david steinbeck	33	1.17
> david steinbeck	42	1.94
> david steinbeck	42	1.35
> david steinbeck	31	2.77
> david steinbeck	40	2.42
> david steinbeck	57	3.91
> when you sum all the gpa of "david steinbeck" in the file "studenttab10k", the result
is "16" while the result in RubyUDFs_10.out/part-r-00000 is "15". The reason is because double
precision problem in AlgebraicDoubleMathBase.java.
> It sums all the gpa numbers to 15.999999-(double)((int)15.999999*100)/100 = 15.0.
> {code}
> AlgebraicDoubleMathBase.java
>     private static Double doWork(Double arg1, Double arg2, KNOWN_OP op) {
>         if (arg1 == null) {
>             return arg2;
>         } else if (arg2 == null) {
>             return arg1;
>         } else {
>             switch (op) {
>             case MAX: return Math.max(arg1, arg2);
>             case MIN: return Math.min(arg1, arg2);
>             case SUM: return arg1+arg2;  //this line has "Java BigDecimal precision problem"
>             default: return null;
>             }
>         }
>     }
> {code}
> The detail Java double precision problem you can refer "https://community.oracle.com/thread/2448849?tstart=0"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message