hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Boesch <java...@gmail.com>
Subject Help with DataDrivenDBInputFormat: splits are created properly but zero records are sent to the mappers
Date Thu, 24 Jan 2013 20:39:23 GMT
I have made an attempt to implement a job using DataDrivenDBInputFormat.
The result is that the input splits are created successfully with 45K
records apeice, but zero records are then actually sent to the mappers.

If anyone can point to working example(s) of using DataDrivenDBInputFormat
it would be much appreciated.


Here are further details of my attempt:


    DBConfiguration.configureDB(job.getConfiguration(), props.getDriver(),
props.getUrl(), props.getUser(), props.getPassword());
    // Note: i also include code here to verify able to get
java.sql.Connection using the above props..

    DataDrivenDBInputFormat.setInput(job,
          DBLongWritable.class,
          "select id,status from app_detail_active_crawl_queue_v where " +
 DataDrivenDBInputFormat.SUBSTITUTE_TOKEN,
      "SELECT MIN(id),MAX(id) FROM app_detail_active_crawl_queue_v ");
    // I verified by stepping with debugger that the input query were
successfully applied by DataDrivenDBInputFormat to create two splits of 40K
records each
);

.. <snip>  ..
// Register a custom DBLongWritable class
  static {
    WritableComparator.define(DBLongWritable.class, new
DBLongWritable.DBLongKeyComparator());
    int x  = 1;
  }


Here is the job output. No rows were processed (even though 90K rows were
identified in the INputSplits phase and divided into two 45K splits..So why
were the input splits not processed?

[Thu Jan 24 12:19:59] Successfully connected to
driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:3306/classint
user=stephenb
[Thu Jan 24 12:19:59] select id,status from app_detail_active_crawl_queue_v
where $CONDITIONS
13/01/24 12:20:03 INFO mapred.JobClient: Running job: job_201301102125_0069
13/01/24 12:20:05 INFO mapred.JobClient:  map 0% reduce 0%
13/01/24 12:20:22 INFO mapred.JobClient:  map 50% reduce 0%
13/01/24 12:20:25 INFO mapred.JobClient:  map 100% reduce 0%
13/01/24 12:20:30 INFO mapred.JobClient: Job complete: job_201301102125_0069
13/01/24 12:20:30 INFO mapred.JobClient: Counters: 17
13/01/24 12:20:30 INFO mapred.JobClient:   Job Counters
13/01/24 12:20:30 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=21181
13/01/24 12:20:30 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
13/01/24 12:20:30 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
13/01/24 12:20:30 INFO mapred.JobClient:     Launched map tasks=2
13/01/24 12:20:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
13/01/24 12:20:30 INFO mapred.JobClient:   File Output Format Counters
13/01/24 12:20:30 INFO mapred.JobClient:     Bytes Written=0
13/01/24 12:20:30 INFO mapred.JobClient:   FileSystemCounters
13/01/24 12:20:30 INFO mapred.JobClient:     HDFS_BYTES_READ=215
13/01/24 12:20:30 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44010
13/01/24 12:20:30 INFO mapred.JobClient:   File Input Format Counters
13/01/24 12:20:30 INFO mapred.JobClient:     Bytes Read=0
13/01/24 12:20:30 INFO mapred.JobClient:   Map-Reduce Framework
13/01/24 12:20:30 INFO mapred.JobClient:     Map input records=0
13/01/24 12:20:30 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=200056832
13/01/24 12:20:30 INFO mapred.JobClient:     Spilled Records=0
13/01/24 12:20:30 INFO mapred.JobClient:     CPU time spent (ms)=2960
13/01/24 12:20:30 INFO mapred.JobClient:     Total committed heap usage
(bytes)=247201792
13/01/24 12:20:30 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=4457689088
13/01/24 12:20:30 INFO mapred.JobClient:     Map output records=0
13/01/24 12:20:30 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215

Mime
View raw message