hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabi Kazav <>
Subject Hive - max rows limit (int limit = 2^31). need Help (looks liek a bug)
Date Wed, 29 May 2013 18:35:16 GMT

We are working on hive DB with our Hadoop cluster.
We now facing an issue about joining a big partition with more than 2^31 rows.

When the partition has more than 2147483648 rows (even 2147483649) the output of the join
is a single row.
When the partition has less than 2147483648 rows (event 2147483647) the output is correct.
Our test case:

create a table with 2147483649 rows in a partition with the value : "1" , join this table
to another table with a single row,single column with the value "1" on the partition_key.
later delete 2 rows and run the same join.
1st : only a single row is created
2nd : 2147483647 rows
the query we run for test the case is:

create table output_rows_over as
select a.s1
from  max_sint_rows a join small_table b
on (a.p1=b.p1);

on more than 2^31 rows we got the following on reducer log:

2013-05-27 21:51:14,186 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1
On less than 2^31 rows we got the following reducer log:

2013-05-27 23:43:14,681 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:2147483647

Anyone faced this issue?
Does hive has workaround for that?
I have huge partitions I need to work on and I cannot use hive for that..


Gabi Kazav
Infrastructure Team Leader,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message