hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahsa Mofidpoor <mofidp...@gmail.com>
Subject running a job on single-node setup takes less time than running on a cluster
Date Mon, 20 Aug 2012 13:03:31 GMT
Hello,

I run a simple join (select col_list from table1 join table2 on
(join_condition)) on both single-node and multi-nodes  setup. The table
sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
the query on the cluster then to run it on a single-node hadoop setup.
I checked to map logs and I saw that both mappings happen on the master
node.
Do I need to increase the data in order to benefit from the multi-nodes
capacity?
How can I make sure that my data is distributed on all the nodes?

Thank you in advance for your assistance.

Reagrds,
Mahsa

Mime
View raw message