hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabi Kazav (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
Date Fri, 14 Jun 2013 14:02:20 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683364#comment-13683364
] 

Gabi Kazav commented on HIVE-4730:
----------------------------------

After patching and compiling, when i run the same join it fail:

......
2013-06-14 16:47:14,924 INFO ExecReducer: ExecReducer: processing 2149000000 rows: used memory
= 45018992
2013-06-14 16:47:16,042 FATAL org.apache.hadoop.mapred.TaskTracker: Error running child :
java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.exec.persistence.AbstractRowContainer.size()I
        at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:802)
        at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:263)
        at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:301)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:473)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

2013-06-14 16:47:19,051 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics
with processName=CLEANUP, sessionId=
2013-06-14 16:47:19,305 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the
task
2013-06-14 16:47:19,305 INFO org.apache.hadoop.mapred.TaskRunner: Task:attempt_201306121727_0032_r_000004_0
is done. And is in the process of commiting
2013-06-14 16:47:19,311 INFO org.apache.hadoop.mapred.TaskRunner: Task 'attempt_201306121727_0032_r_000004_0'
done.


                
> Join on more than 2^31 records on single reducer failed (wrong results)
> -----------------------------------------------------------------------
>
>                 Key: HIVE-4730
>                 URL: https://issues.apache.org/jira/browse/HIVE-4730
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0
>            Reporter: Gabi Kazav
>            Assignee: Navis
>            Priority: Critical
>         Attachments: HIVE-4730.D11283.1.patch
>
>
> join on more than 2^31 rows leads to wrong results. for example:
> Create table small_table (p1 string) ROW FORMAT DELIMITED    LINES TERMINATED BY  '\n';
> Create table big_table (p1 string) ROW FORMAT DELIMITED    LINES TERMINATED BY  '\n';
> Loading 1 row to small_table (the value 1).
> Loading 2149580800 rows to big_table with the same value (1 on this case).
> create table output as select a.p1 from  big_table a join small_table b on (a.p1=b.p1);
> select count(*) from output ; will return only 1 row...
> the reducer syslog:
> ...
> 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 2147000000 rows: used
memory = 32925960
> 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 2148000000 rows: used
memory = 12815184
> 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 2149000000 rows: used
memory = 26684552   <-- looks like wrong value..
> ...
> 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 rows: used
memory = 17715896
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished.
closing...
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded
1 rows
> 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished.
closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded
1 rows
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 finished.
closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 forwarded
0 rows
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message