hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom Temple (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-7177) percentile_approx very inaccurate with high multiplicities in the data
Date Wed, 04 Jun 2014 16:49:01 GMT
Tom Temple created HIVE-7177:
--------------------------------

             Summary: percentile_approx very inaccurate with high multiplicities in the data
                 Key: HIVE-7177
                 URL: https://issues.apache.org/jira/browse/HIVE-7177
             Project: Hive
          Issue Type: Bug
          Components: UDF
    Affects Versions: 0.12.0
         Environment: Redhat 5.10 running Cloudera 5.0.0
            Reporter: Tom Temple


To reproduce:
1) create a table with a single integer column
2) with values: 1 million, 2 million, 3 million, and 4 million each repeated a quarter million
times.
3) percentile_approx(cast(col_0 as double), array(0.33,0.34),1000000)

Expected results: [2000000.0,2000000.0]

Actual results: [1280000.0,1320000.0] (I might be off by 40000 here)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message