hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Morrison <John.Morri...@catalinamarketing.com>
Subject RE: Hive JOINs not working as expected (returns 0 rows)
Date Tue, 20 Nov 2012 18:14:06 GMT
I did get this to work on Cloudera (CDH4).

Also got both self-joins to work using MapR (both return 1 as expected).
  	select count(*) from ext_date_1 a join ext_date_1 b on (a.cal_dt = b.cal_dt) ;  	
	select count(*) from ext_order_1 a join ext_order_1 b on (a.cal_dt = b.cal_dt) ;

I guess something must be wrong is MapR config somewhere?

Thanks to those who replied to this post.


-----Original Message-----
From: Edward Capriolo [mailto:edlinuxguru@gmail.com] 
Sent: Tuesday, November 20, 2012 11:34 AM
To: user@hive.apache.org
Subject: Re: Hive JOINs not working as expected (returns 0 rows)

Based on your counters of BYTES_READ its does not look like any data was found/processed for
either table.

On Tue, Nov 20, 2012 at 11:22 AM, Dean Wampler <dean.wampler@thinkbiganalytics.com>
wrote:
> Did you install v0.9.0 on your MapR cluster? I suspect there's a bug 
> in the interaction between the two. I'm not sure which version of Hive 
> is the latest shipped with MapR.
>
> I just tried your example in an older MapR M5 virtual machine instance 
> that came with Hive v0.7.1. It worked correctly. I then repeated the 
> experiment in Hive v0.9.0 on a generic Apache Hadoop setup I have in 
> another VM. It also worked correctly.  (I don't have Hive v0.9.0 
> available in the MapR VM.)
>
> Hope this helps.
>
> dean
>
> On Tue, Nov 20, 2012 at 9:43 AM, John Morrison 
> <John.Morrison@catalinamarketing.com> wrote:
>>
>> Hi,
>>
>>
>>
>> New to hive in last few weeks and could use some help with JOINs.
>>
>>
>>
>> Using MapR (version 0.9.0)    /usr/bin/hive ->
>> /opt/mapr/hive/hive-0.9.0/bin/hive
>>
>>
>>
>> I have 2 tables I am wanting to join by date (order_t and date_t).  
>> DDL at bottom.
>>
>>
>>
>> I have reduced this to 1 column and 1 row and still can't get things 
>> to work.
>>
>>
>>
>> Any help will be appreciated.
>>
>>
>>
>> -John
>>
>>
>>
>> Details:
>>
>>
>>
>> # data in each table
>>
>> hive> select * from ext_order_1 ;
>>
>> OK
>>
>> 20081203
>>
>> Time taken: 0.076 seconds
>>
>> hive> select * from ext_date_1 ;
>>
>> OK
>>
>> 20081203
>>
>> Time taken: 0.068 seconds
>>
>>
>>
>> # Trying to join results in 0 rows (see yellow below)
>>
>> #
>>
>> hive> select count(*) from ext_order_1 a join ext_date_1 b on 
>> hive> (a.cal_dt =
>> b.cal_dt) ;
>>
>> Total MapReduce jobs = 2
>>
>> Launching Job 1 out of 2
>>
>> Number of reduce tasks not specified. Estimated from input data size: 
>> 1
>>
>> In order to change the average load for a reducer (in bytes):
>>
>>   set hive.exec.reducers.bytes.per.reducer=<number>
>>
>> In order to limit the maximum number of reducers:
>>
>>   set hive.exec.reducers.max=<number>
>>
>> In order to set a constant number of reducers:
>>
>>   set mapred.reduce.tasks=<number>
>>
>> Starting Job = job_201211050921_0232, Tracking URL =  xxxx
>>
>> Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job 
>> -Dmapred.job.tracker=maprfs:/// -kill job_201211050921_0232
>>
>> Hadoop job information for Stage-1: number of mappers: 1; number of
>> reducers: 1
>>
>> 2012-11-20 10:08:48,797 Stage-1 map = 0%,  reduce = 0%
>>
>> 2012-11-20 10:08:53,823 Stage-1 map = 100%,  reduce = 0%, Cumulative 
>> CPU
>> 1.25 sec
>>
>> 2012-11-20 10:08:54,829 Stage-1 map = 100%,  reduce = 0%, Cumulative 
>> CPU
>> 1.25 sec
>>
>> 2012-11-20 10:08:55,835 Stage-1 map = 100%,  reduce = 0%, Cumulative 
>> CPU
>> 1.25 sec
>>
>> 2012-11-20 10:08:56,842 Stage-1 map = 100%,  reduce = 0%, Cumulative 
>> CPU
>> 1.25 sec
>>
>> 2012-11-20 10:08:57,848 Stage-1 map = 100%,  reduce = 0%, Cumulative 
>> CPU
>> 1.25 sec
>>
>> 2012-11-20 10:08:58,854 Stage-1 map = 100%,  reduce = 0%, Cumulative 
>> CPU
>> 1.25 sec
>>
>> 2012-11-20 10:08:59,860 Stage-1 map = 100%,  reduce = 0%, Cumulative 
>> CPU
>> 1.25 sec
>>
>> 2012-11-20 10:09:00,866 Stage-1 map = 100%,  reduce = 100%, 
>> Cumulative CPU
>> 3.23 sec
>>
>> MapReduce Total cumulative CPU time: 3 seconds 230 msec
>>
>> Ended Job = job_201211050921_0232
>>
>> Launching Job 2 out of 2
>>
>> Number of reduce tasks determined at compile time: 1
>>
>> In order to change the average load for a reducer (in bytes):
>>
>>   set hive.exec.reducers.bytes.per.reducer=<number>
>>
>> In order to limit the maximum number of reducers:
>>
>>   set hive.exec.reducers.max=<number>
>>
>> In order to set a constant number of reducers:
>>
>>   set mapred.reduce.tasks=<number>
>>
>> Starting Job = job_201211050921_0233, Tracking URL =  xxxx
>>
>> Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job 
>> -Dmapred.job.tracker=maprfs:/// -kill job_201211050921_0233
>>
>> Hadoop job information for Stage-2: number of mappers: 1; number of
>> reducers: 1
>>
>> 2012-11-20 10:09:02,058 Stage-2 map = 0%,  reduce = 0%
>>
>> 2012-11-20 10:09:07,084 Stage-2 map = 100%,  reduce = 0%, Cumulative 
>> CPU
>> 0.89 sec
>>
>> 2012-11-20 10:09:08,091 Stage-2 map = 100%,  reduce = 0%, Cumulative 
>> CPU
>> 0.89 sec
>>
>> 2012-11-20 10:09:09,101 Stage-2 map = 100%,  reduce = 0%, Cumulative 
>> CPU
>> 0.89 sec
>>
>> 2012-11-20 10:09:10,106 Stage-2 map = 100%,  reduce = 0%, Cumulative 
>> CPU
>> 0.89 sec
>>
>> 2012-11-20 10:09:11,112 Stage-2 map = 100%,  reduce = 0%, Cumulative 
>> CPU
>> 0.89 sec
>>
>> 2012-11-20 10:09:12,119 Stage-2 map = 100%,  reduce = 0%, Cumulative 
>> CPU
>> 0.89 sec
>>
>> 2012-11-20 10:09:13,125 Stage-2 map = 100%,  reduce = 100%, 
>> Cumulative CPU
>> 1.5 sec
>>
>> MapReduce Total cumulative CPU time: 1 seconds 500 msec
>>
>> Ended Job = job_201211050921_0233
>>
>> MapReduce Jobs Launched:
>>
>> Job 0: Map: 1  Reduce: 1   Cumulative CPU: 3.23 sec   MAPRFS Read: 396
>> MAPRFS Write: 222 SUCCESS
>>
>> Job 1: Map: 1  Reduce: 1   Cumulative CPU: 1.5 sec   MAPRFS Read: 628
>> MAPRFS Write: 72 SUCCESS
>>
>> Total MapReduce CPU Time Spent: 4 seconds 730 msec
>>
>> OK
>>
>> 0
>>
>> Time taken: 25.727 seconds
>>
>>
>>
>>
>>
>>
>>
>> # Hive DDL
>>
>> #
>>
>> $> cat date_t_1_row.ddl
>>
>>
>>
>> CREATE external TABLE  ext_date_1
>>
>> (
>>
>>      cal_dt                      INT
>>
>> )
>>
>>   ROW FORMAT
>>
>>   DELIMITED FIELDS TERMINATED BY '~'
>>
>>   LINES TERMINATED BY '\n' STORED AS TEXTFILE
>>
>>   LOCATION '/Work/Hive/tables/date_1'
>>
>> ;
>>
>>
>>
>>
>>
>> $>  cat order_1.ddl
>>
>>
>>
>> CREATE external TABLE ext_order_1
>>
>> (
>>
>> cal_dt             int
>>
>> )
>>
>>   ROW FORMAT
>>
>>   DELIMITED FIELDS TERMINATED BY '~'
>>
>>   LINES TERMINATED BY '\n' STORED AS TEXTFILE
>>
>>   LOCATION '/Work/Hive/tables/order_1'
>>
>> ;
>>
>>
>>
>>
>>
>>
>
>
>
>
> --
> Dean Wampler, Ph.D.
> thinkbiganalytics.com
> +1-312-339-1330
>
>



Mime
View raw message