hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Owen O'Malley <omal...@apache.org>
Subject Re: Hadoop Cluster Survey
Date Mon, 13 Jul 2009 16:32:20 GMT
On Jul 12, 2009, at 12:21 PM, Jon Miller wrote:

> I realize what I'm asking is highly subjective to the hardware, number
> of nodes, how large the data set is, etc and not to mention the
> particular calculation being performed. Realizing this, I know it's
> nearly impossible to provide a prediction to the readers but perhaps
> if I can survey this mailing list with a few questions, then maybe I
> can develop a nice heuristic formula which can be used?

I'd start with the Hadoop Powered By page:

Of course the numbers run 3-6 months stale, but they are a good first  

I'd also suggest you look through my petabyte sort blog entry:


I'd also suggest you watch the opening talk of the Hadoop Summit '09  
where Eric, and I cover a lot of this stuff with respect to Yahoo:


-- Owen

View raw message