hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Boesch <java...@gmail.com>
Subject Re: Help with DataDrivenDBInputFormat: splits are created properly but zero records are sent to the mappers
Date Fri, 25 Jan 2013 01:42:25 GMT
It turns out to be an apparent problem in one of the two methods
of  DataDrivenDBAPi.setInput().   The version I used does not work as
shown: it needs to have a primary key column set somehow. But no
information / documentation on how to set the pkcol that I could find.  So
I converted to using the other setIput() method  as follow:

    DataDrivenDBInputFormat.setInput(job, DBTextWritable.class,
          APP_DETAILS_CRAWL_QUEUE_V, null, "id", "id");

Now  this is working .




2013/1/24 Stephen Boesch <javadba@gmail.com>

>
> I have made an attempt to implement a job using DataDrivenDBInputFormat.
> The result is that the input splits are created successfully with 45K
> records apeice, but zero records are then actually sent to the mappers.
>
> If anyone can point to working example(s) of using DataDrivenDBInputFormat
> it would be much appreciated.
>
>
> Here are further details of my attempt:
>
>
>     DBConfiguration.configureDB(job.getConfiguration(), props.getDriver(),
> props.getUrl(), props.getUser(), props.getPassword());
>     // Note: i also include code here to verify able to get
> java.sql.Connection using the above props..
>
>     DataDrivenDBInputFormat.setInput(job,
>           DBLongWritable.class,
>           "select id,status from app_detail_active_crawl_queue_v where " +
>  DataDrivenDBInputFormat.SUBSTITUTE_TOKEN,
>       "SELECT MIN(id),MAX(id) FROM app_detail_active_crawl_queue_v ");
>     // I verified by stepping with debugger that the input query were
> successfully applied by DataDrivenDBInputFormat to create two splits of 40K
> records each
> );
>
> .. <snip>  ..
> // Register a custom DBLongWritable class
>   static {
>     WritableComparator.define(DBLongWritable.class, new
> DBLongWritable.DBLongKeyComparator());
>     int x  = 1;
>   }
>
>
> Here is the job output. No rows were processed (even though 90K rows were
> identified in the INputSplits phase and divided into two 45K splits..So why
> were the input splits not processed?
>
> [Thu Jan 24 12:19:59] Successfully connected to
> driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:3306/classint
> user=stephenb
> [Thu Jan 24 12:19:59] select id,status from
> app_detail_active_crawl_queue_v where $CONDITIONS
> 13/01/24 12:20:03 INFO mapred.JobClient: Running job: job_201301102125_0069
> 13/01/24 12:20:05 INFO mapred.JobClient:  map 0% reduce 0%
> 13/01/24 12:20:22 INFO mapred.JobClient:  map 50% reduce 0%
> 13/01/24 12:20:25 INFO mapred.JobClient:  map 100% reduce 0%
> 13/01/24 12:20:30 INFO mapred.JobClient: Job complete:
> job_201301102125_0069
> 13/01/24 12:20:30 INFO mapred.JobClient: Counters: 17
> 13/01/24 12:20:30 INFO mapred.JobClient:   Job Counters
> 13/01/24 12:20:30 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=21181
> 13/01/24 12:20:30 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 13/01/24 12:20:30 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 13/01/24 12:20:30 INFO mapred.JobClient:     Launched map tasks=2
> 13/01/24 12:20:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
> 13/01/24 12:20:30 INFO mapred.JobClient:   File Output Format Counters
> 13/01/24 12:20:30 INFO mapred.JobClient:     Bytes Written=0
> 13/01/24 12:20:30 INFO mapred.JobClient:   FileSystemCounters
> 13/01/24 12:20:30 INFO mapred.JobClient:     HDFS_BYTES_READ=215
> 13/01/24 12:20:30 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44010
> 13/01/24 12:20:30 INFO mapred.JobClient:   File Input Format Counters
> 13/01/24 12:20:30 INFO mapred.JobClient:     Bytes Read=0
> 13/01/24 12:20:30 INFO mapred.JobClient:   Map-Reduce Framework
> 13/01/24 12:20:30 INFO mapred.JobClient:     Map input records=0
> 13/01/24 12:20:30 INFO mapred.JobClient:     Physical memory (bytes)
> snapshot=200056832
> 13/01/24 12:20:30 INFO mapred.JobClient:     Spilled Records=0
> 13/01/24 12:20:30 INFO mapred.JobClient:     CPU time spent (ms)=2960
> 13/01/24 12:20:30 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=247201792
> 13/01/24 12:20:30 INFO mapred.JobClient:     Virtual memory (bytes)
> snapshot=4457689088
> 13/01/24 12:20:30 INFO mapred.JobClient:     Map output records=0
> 13/01/24 12:20:30 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215
>
>

Mime
View raw message