hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
Date Mon, 15 Jul 2013 17:38:49 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708691#comment-13708691
] 

Phabricator commented on HIVE-4730:
-----------------------------------

brock has commented on the revision "HIVE-4730 [jira] Join on more than 2^31 records on single
reducer failed (wrong results)".

  Hi Navis,

  Thanks for the patch!  I noted a few style nits.  Just curious, how long did the query take
to complete?  My guess is far too long to have a q-file test for this.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java:286 Is it possible
to move this up near the rest of the member variable definitions?

  Ideally it'd be nice to change the LHS to be List but it's possible that something in the
class requires ArrayList.

REVISION DETAIL
  https://reviews.facebook.net/D11283

To: JIRA, navis
Cc: brock

                
> Join on more than 2^31 records on single reducer failed (wrong results)
> -----------------------------------------------------------------------
>
>                 Key: HIVE-4730
>                 URL: https://issues.apache.org/jira/browse/HIVE-4730
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0
>            Reporter: Gabi Kazav
>            Assignee: Navis
>            Priority: Blocker
>         Attachments: HIVE-4730.D11283.1.patch
>
>
> join on more than 2^31 rows leads to wrong results. for example:
> Create table small_table (p1 string) ROW FORMAT DELIMITED    LINES TERMINATED BY  '\n';
> Create table big_table (p1 string) ROW FORMAT DELIMITED    LINES TERMINATED BY  '\n';
> Loading 1 row to small_table (the value 1).
> Loading 2149580800 rows to big_table with the same value (1 on this case).
> create table output as select a.p1 from  big_table a join small_table b on (a.p1=b.p1);
> select count(*) from output ; will return only 1 row...
> the reducer syslog:
> ...
> 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 2147000000 rows: used
memory = 32925960
> 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 2148000000 rows: used
memory = 12815184
> 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 2149000000 rows: used
memory = 26684552   <-- looks like wrong value..
> ...
> 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 rows: used
memory = 17715896
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished.
closing...
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded
1 rows
> 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished.
closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded
1 rows
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 finished.
closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 forwarded
0 rows
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message