Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 00E9CC7B8 for ; Sun, 16 Jun 2013 19:49:21 +0000 (UTC) Received: (qmail 11193 invoked by uid 500); 16 Jun 2013 19:49:20 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 11078 invoked by uid 500); 16 Jun 2013 19:49:20 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 11067 invoked by uid 500); 16 Jun 2013 19:49:20 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 11063 invoked by uid 99); 16 Jun 2013 19:49:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 16 Jun 2013 19:49:20 +0000 Date: Sun, 16 Jun 2013 19:49:20 +0000 (UTC) From: "Gabi Kazav (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684759#comment-13684759 ] Gabi Kazav commented on HIVE-4730: ---------------------------------- Looks good, thanks! > Join on more than 2^31 records on single reducer failed (wrong results) > ----------------------------------------------------------------------- > > Key: HIVE-4730 > URL: https://issues.apache.org/jira/browse/HIVE-4730 > Project: Hive > Issue Type: Bug > Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0 > Reporter: Gabi Kazav > Assignee: Navis > Priority: Critical > Attachments: HIVE-4730.D11283.1.patch > > > join on more than 2^31 rows leads to wrong results. for example: > Create table small_table (p1 string) ROW FORMAT DELIMITED LINES TERMINATED BY '\n'; > Create table big_table (p1 string) ROW FORMAT DELIMITED LINES TERMINATED BY '\n'; > Loading 1 row to small_table (the value 1). > Loading 2149580800 rows to big_table with the same value (1 on this case). > create table output as select a.p1 from big_table a join small_table b on (a.p1=b.p1); > select count(*) from output ; will return only 1 row... > the reducer syslog: > ... > 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 2147000000 rows: used memory = 32925960 > 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 2148000000 rows: used memory = 12815184 > 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 2149000000 rows: used memory = 26684552 <-- looks like wrong value.. > ... > 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 rows: used memory = 17715896 > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing... > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 1 rows > 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0 > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded 1 rows > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 forwarded 0 rows > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1 > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira