hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahsa Mofidpoor <mofidp...@gmail.com>
Subject Re: running a job on single-node setup takes less time than running on a cluster
Date Mon, 20 Aug 2012 18:31:42 GMT
Thnaks Saurabh

On Mon, Aug 20, 2012 at 12:15 PM, Saurabh bhutyani <s4saurabh@gmail.com>wrote:

> Dear Mahsa,
>
> You need to increase the data size to benefit out of Hadoop. Basically
> hadoop creates splits based on the configured value. The default being
> 64MB. So if your data size is less than 64MB it would basically run only 1
> MR job.
>
> Thanks & Regards,
> Saurabh Bhutyani
>
> Call  : 9820083104
> Gtalk: s4saurabh@gmail.com
>
>
>
> On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mofidpoor@gmail.com>wrote:
>
>> Hello,
>>
>> I run a simple join (select col_list from table1 join table2 on
>> (join_condition)) on both single-node and multi-nodes  setup. The table
>> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
>> the query on the cluster then to run it on a single-node hadoop setup.
>> I checked to map logs and I saw that both mappings happen on the master
>> node.
>> Do I need to increase the data in order to benefit from the multi-nodes
>> capacity?
>> How can I make sure that my data is distributed on all the nodes?
>>
>> Thank you in advance for your assistance.
>>
>> Reagrds,
>> Mahsa
>>
>
>

Mime
View raw message