Hi,
I want to reduce number of splits because I think that I get many splits and
I want to reduce these splits.
While my job is running I can see:
*INFO mapreduce.Job: map ∞% reduce 0%*
I'm using DataDrivenDBInputFormat:
*
** setInput*
*public static void setInput(Job
<http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html>
job,
Class
<http://java.sun.com/javase/6/docs/api/java/lang/Class.html?is-external=true><?
extends DBWritable
<http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/lib/db/DBWritable.html>>
inputClass,
String
<http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true>
tableName,
String
<http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true>
conditions,
String
<http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true>
splitBy,
String
<http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true>...
fieldNames)*
*Note that the "orderBy" column is called the "splitBy" in this version. We
reuse the same field, but it's not strictly ordering it -- just partitioning
the results.
*
So I get all data from myTable and I try to split by date column. I obtain
milions rows and I supose that DataDrivenDBInputFormat generates many splits
and i don't know how to reduce this splits or how to indicates to
DataDrivenDBInputFormat splits by my date column (corresponds to splitBy).
The main goal's improve performance, so I want to my Map's faster.
Can someone help me?
Thanks
Joan
|