hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khaleel Khalid" <khale...@suntecgroup.com>
Subject RE: issue with DBInputFormat
Date Fri, 07 Mar 2014 11:22:03 GMT
Hi,
 
We faced the same with DBInputFormat.  Using DataDrivenDBInputFormat fixed the issue. 
 
 
Regards
 
Khaleel

________________________________

From: Manoj Babu [mailto:manoj444@gmail.com]
Sent: Fri 3/7/2014 4:36 PM
To: user@hadoop.apache.org
Subject: issue with DBInputFormat


Hi,

When using DBInputFormat to unload a data from table to hdfs i have configured 6 map tasks
to execute but 0th map task alone unloading the whole data from table and the remaining 5
tasks were running properly. Please find my obeservtion on debugging.

Chunk size=855565

Input Splits:

For split0 the start=0 and the end=855565 and the length=855565
For split1 the start=855565 and the end=1711130 and the length=855565
For split2 the start=1711130 and the end=2566695 and the length=855565
For split3 the start=2566695 and the end=3422260 and the length=855565
For split4 the start=3422260 and the end=4277825 and the length=855565
For split5 the start=4277825 and the end=5133394 and the length=855569

Queries fired from individual map tasks based on the splits created:

Map task 0: Select query: select * from emp
Map task 1: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp
) a WHERE rownum <= 4277825 + 855569 ) WHERE dbif_rno >= 4277825
Map task 2: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp
) a WHERE rownum <= 855565 + 855565 ) WHERE dbif_rno >= 855565
Map task 3: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp
) a WHERE rownum <= 1711130 + 855565 ) WHERE dbif_rno >= 1711130
Map task 4: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp
) a WHERE rownum <= 2566695 + 855565 ) WHERE dbif_rno >= 2566695
Map task 5: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp
) a WHERE rownum <= 3422260 + 855565 ) WHERE dbif_rno >= 3422260

The query executed from Map task 0 is the problem creator is not having any limits so it queried
all the rows from that task. 

The below condition in org.apache.hadoop.mapreduce.lib.db.OracleDBRecordReader.getSelectQuery()

if (split.getLength() > 0 && split.getStart() > 0) {

...
...}

should be as 
if (split.getLength() > 0 && split.getStart() >= 0) {

...
...}


By overriding the getSelectQuery i could able to overcome the issue. Anybody faced similar
issue?


Cheers! 
Manoj.
Mime
View raw message