hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joan <joan.monp...@gmail.com>
Subject How to reduce number of splits in DataDrivenDBInputFormat?
Date Wed, 19 Jan 2011 16:02:57 GMT
Hi,

I want to reduce number of splits because I think that I get many splits and
I want to reduce these splits.
While my job is running I can see:

*INFO mapreduce.Job:  map ∞% reduce 0%*

I'm using DataDrivenDBInputFormat:
*
** setInput*

*public static void setInput(Job
<http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html>
job,
                            Class
<http://java.sun.com/javase/6/docs/api/java/lang/Class.html?is-external=true><?
extends DBWritable
<http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/lib/db/DBWritable.html>>
inputClass,
                            String
<http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true>
tableName,
                            String
<http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true>
conditions,
                            String
<http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true>
splitBy,
                            String
<http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true>...
fieldNames)*

*Note that the "orderBy" column is called the "splitBy" in this version. We
reuse the same field, but it's not strictly ordering it -- just partitioning
the results.
*

So I get all data from myTable and I try to split by date column. I obtain
milions rows and I supose that DataDrivenDBInputFormat generates many splits
and i don't know how to reduce this splits or how to indicates to
DataDrivenDBInputFormat splits by my date column (corresponds to splitBy).

The main goal's improve performance, so I want to my Map's faster.


Can someone help me?

Thanks

Joan

Mime
View raw message