hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Graham (Greywolf)" <greyw...@searchme.com>
Subject performance not great, or did I miss something?
Date Fri, 08 Aug 2008 20:25:46 GMT

I'm very very new to this (as you could probably tell from my other postings).

I have 20 nodes available as a cluster, less one as the namenode and one as
the jobtracker (unless I can use them too).  Specs are:

226GB of available disk space on each one;
4 processors (2 x dualcore)
8GB of RAM each.

The RandomWriter takes just over 17 minutes to complete;
the Sorter takes well over three to four hours or more to complete
on only about a half terabyte of data.

This is certainly not the speed or power I had been led to expect from
Hadoop, so I am guessing I have some things tuned wrong (actually, I'm
certain some are tuned wrong as during the reduce phase, I'm seeing processes
die from lack of memory...).

Given the above hardware specs, what should I expect as a theoretical maximum
throughput?  machines 3-10 are on 1GbE, machines 11-20 are on a second 1GbE,
connected by a mutual 1GbE upstream (another switch).

James Graham (Greywolf)							      |
650.930.1138|925.768.4053						      *
greywolf@searchme.com							      |
Check out what people are saying about SearchMe! -- click below

View raw message