hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joan <joan.monp...@gmail.com>
Subject Re: How to reduce number of splits in DataDrivenDBInputFormat?
Date Thu, 20 Jan 2011 08:38:14 GMT
Hi Sonal,

I'm using hadoop 0.21.0

2011/1/20 Sonal Goyal <sonalgoyal4@gmail.com>

> Which hadoop version are you on?
>
> You can alternatively try using hiho from
> https://github.com/sonalgoyal/hiho  to get your data from the db. Please
> write to me directly if you need any help there.
>
>
> Thanks and Regards,
> Sonal
> <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
> Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
>
> On Thu, Jan 20, 2011 at 1:03 PM, Joan <joan.monplet@gmail.com> wrote:
>
>> Hi Sonal,
>>
>> I put both configurations:
>>
>>         job.getConfiguration().set("mapreduce.job.maps","4");
>>         job.getConfiguration().set("mapreduce.map.tasks","4");
>>
>> But both configurations don't run. I also try to set "mapred.map.task" but
>> It neither run.
>>
>> Joan
>>
>> 2011/1/20 Sonal Goyal <sonalgoyal4@gmail.com>
>>
>> Joan,
>>>
>>> You should be able to set the mapred.map.tasks property to the maximum
>>> number of mappers you want. This can control parallelism.
>>>
>>> Thanks and Regards,
>>> Sonal
>>> <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
>>> Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
>>> Nube Technologies <http://www.nubetech.co>
>>>
>>> <http://in.linkedin.com/in/sonalgoyal>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Jan 19, 2011 at 9:32 PM, Joan <joan.monplet@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to reduce number of splits because I think that I get many splits
>>>> and I want to reduce these splits.
>>>> While my job is running I can see:
>>>>
>>>> *INFO mapreduce.Job:  map ∞% reduce 0%*
>>>>
>>>> I'm using DataDrivenDBInputFormat:
>>>> *
>>>> ** setInput*
>>>>
>>>> *public static void setInput(Job <http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html>
job,
>>>>                             Class <http://java.sun.com/javase/6/docs/api/java/lang/Class.html?is-external=true><?
extends DBWritable <http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/lib/db/DBWritable.html>>
inputClass,
>>>>
>>>>
>>>>
>>>>
>>>>                             String <http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true>
tableName,
>>>>                             String <http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true>
conditions,
>>>>
>>>>
>>>>
>>>>
>>>>                             String <http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true>
splitBy,
>>>>                             String <http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true>...
fieldNames)*
>>>>
>>>> *Note that the "orderBy" column is called the "splitBy" in this
>>>> version. We reuse the same field, but it's not strictly ordering it -- just
>>>> partitioning the results.
>>>> *
>>>>
>>>> So I get all data from myTable and I try to split by date column. I
>>>> obtain milions rows and I supose that DataDrivenDBInputFormat generates many
>>>> splits and i don't know how to reduce this splits or how to indicates to
>>>> DataDrivenDBInputFormat splits by my date column (corresponds to splitBy).
>>>>
>>>> The main goal's improve performance, so I want to my Map's faster.
>>>>
>>>>
>>>> Can someone help me?
>>>>
>>>> Thanks
>>>>
>>>> Joan
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message