hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <bejoy...@yahoo.com>
Subject Re: Issue on using hive Dynamic Partitions on larger tables
Date Tue, 21 Jun 2011 12:26:55 GMT
Hey Guys
        I was able to resolve the same by groping and distributing records to 
reducers using DISTRIBUTE BY. My modified query would be as folows

FROM parameter_def p
INSERT OVERWRITE TABLE parameter_part PARTITION(location)
SELECT p.seq_id,p.lead_id,p.arr_datetime,p.computed_value,p.del_date,p.location
DISTRIBUTE BY location;

With this query the entire job worked like a charm. If there could be any better 
implementations on similar scenarios please do share.

Thank You

Regards
Bejoy.KS


From: Bejoy Ks <bejoy_ks@yahoo.com>
To: user@hive.apache.org
Sent: Mon, June 20, 2011 8:27:16 PM
Subject: Re: Issue on using hive Dynamic Partitions on larger tables


Thanks Steven. Now I'm out of that bug, but another one pops when I'm trying for 
Dynamic partitions with larger tables. I have implemenetd the same on smaller 
tables using the same approach mentioned below, but some how it fails for larger 
tables.

My Larger source Table(parameter_def) contains 5 billion rows which I have 
SQOOPed into hive from a DWH and when I try implementing the dynamic partition 
on the same with the Query
INSERT OVERWRITE TABLE parameter_part PARTITION(location) 
SELECT p.seq_id,p.lead_id,p.arr_datetime,p.computed_value,
p.del_date,p.location FROM parameter_def p;
 There are 2 map reduce jobs triggered and the first one  now runs to completion 
after setting 

hive.exec.max.created.files=150000;
But the second job just fails as such without even running. Given below is the 
error log
From putty console
2011-06-20 10:40:13,348 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201106061630_0937
Ended Job = 1659539584, job is filtered out (removed at runtime).
Launching Job 2 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job  = job_201106061630_0938, Tracking URL = 
http://********.com:50030/jobdetails.jsp?jobid=job_201106061630_0938
Kill Command = /usr/lib/hadoop/bin/hadoop job  
-Dmapred.job.tracker=********.com:8021 -kill job_201106061630_0938
2011-06-20 10:42:51,914 Stage-3 map = 100%,  reduce = 100%
Ended Job = job_201106061630_0938 with errors
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask

From hive log file
2011-06-20 10:41:02,293 WARN   mapred.JobClient 
(JobClient.java:copyAndConfigureFiles(649)) - Use GenericOptionsParser for 
parsing the arguments. Applications should implement Tool for the same.
2011-06-20 10:42:51,917 ERROR exec.MapRedTask 
(SessionState.java:printError(343)) - Ended Job = job_201106061630_0938 with 
errors
2011-06-20 10:42:51,938 ERROR ql.Driver (SessionState.java:printError(343)) - 
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask


The hadoop and hive version I'm using are as follows
Hadoop Version - Hadoop 0.20.2-cdh3u0
Hive Version - Hive 0.7(lib/hive-hwi-0.7.0-cdh3u0.war)

Please help me out in figuring what is going wrong with my implementation. 

Thank You

Regards
Bejoy.K.S





________________________________
From: Steven Wong <swong@netflix.com>
To: "user@hive.apache.org" <user@hive.apache.org>
Sent: Sat, June 18, 2011 6:54:34 AM
Subject: RE: Issue on using hive Dynamic Partitions on larger tables


The name of the parameter is actually hive.exec.max.created.files. The wiki has 
a typo, which I’ll fix.
 
 
From:Bejoy Ks  [mailto:bejoy_ks@yahoo.com] 
Sent: Thursday, June 16, 2011 9:35 AM
To: hive user group
Subject: Issue on using hive Dynamic Partitions on larger tables
 
Hi Hive Experts
    I'm facing an issue while using hive Dynamic Partitions on larger tables. I 
tried out  Dynamic partitions on smaller tables and it was working fine but 
unfortunately when i tried the same on a larger table the map reduce job 
terminates throwing an error as

2011-06-16 12:14:28,592 Stage-1 map = 74%,  reduce = 0%
[Fatal Error] total number of created files exceeds 100000. Killing the job.
Ended Job = job_201106061630_0536 with errors
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask

I tried setting the parameter  hive.max.created.files to a larger value, still 
the same error
hive>set hive.max.created.files=500000;
The same error was thrown 'total number of created files exceeds 100000' even 
after I changed the value to 500000. I doubt whether the value is set for the 
config parameter is not getting affected. Or am I setting the wrong parameter to 
solve this issue. Please advise

The other parameters I did set on hive CLI for dynamic partitions are
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions.pernode=300;

The hive QL query I used for dynamic partition  is
INSERT OVERWRITE TABLE parameter_part PARTITION(location) 
SELECT  p.seq_id,p.lead_id,p.arr_datetime,p.computed_value,
p.del_date,p.location FROM parameter_def p;

Please help me out in resolving the same

Thank You.

Regards
Bejoy.K.S
Mime
View raw message