hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saurabh bhutyani <s4saur...@gmail.com>
Subject Re: running a job on single-node setup takes less time than running on a cluster
Date Mon, 20 Aug 2012 16:15:10 GMT
Dear Mahsa,

You need to increase the data size to benefit out of Hadoop. Basically
hadoop creates splits based on the configured value. The default being
64MB. So if your data size is less than 64MB it would basically run only 1
MR job.

Thanks & Regards,
Saurabh Bhutyani

Call  : 9820083104
Gtalk: s4saurabh@gmail.com

On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mofidpoor@gmail.com>wrote:

> Hello,
> I run a simple join (select col_list from table1 join table2 on
> (join_condition)) on both single-node and multi-nodes  setup. The table
> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
> the query on the cluster then to run it on a single-node hadoop setup.
> I checked to map logs and I saw that both mappings happen on the master
> node.
> Do I need to increase the data in order to benefit from the multi-nodes
> capacity?
> How can I make sure that my data is distributed on all the nodes?
> Thank you in advance for your assistance.
> Reagrds,
> Mahsa

View raw message