hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Kellerman <...@powerset.com>
Subject Re: [Lucene-hadoop Wiki] Update of "FAQ" by DevarajDas
Date Sat, 08 Sep 2007 15:07:48 GMT
On Fri, 2007-09-07 at 23:14 -0700, Eric Baldeschwieler wrote:
> I think we should also add an available RAM variable and then do a  
> reasonable job of deriving a bunch of the other variables in these  
> settings from that (we may need one for task trackers, one for  
> namenodes and so on.

+1

> A lot of the memory related default settings make no sense on the  
> boxes we use.
> 
> What RAM size should we assume is a reasonable default?
> 2GB? 1GB?

If you are using EC2, I think all you get is 1GB.

Our current machines are 8 core with 16GB, but we are 'zenifying' them
so each instance will have 1 core with 2GB. The exception will be the
name node, especially as our cluster grows, but I am not sure how that
will be configured. (maybe 4 cores and 8GB)?

> We are currently standardizing on 8.
> 
> On Sep 7, 2007, at 7:41 AM, Enis Soztutar wrote:
> 
> > Hadoop has been used in quite varying cluster sizes (in the range
> > 1-2000), so am strongly in favor of as much automatic configuration as
> > possible.
> >
> > Doug Cutting wrote:
> > > Raghu Angadi wrote:
> > >> Right now Namenode does not know about the cluster size before
> > >> starting IPC server.
> > >
> > > Sounds like perhaps we should make the handler count, queue size,  
> > etc.
> > > dynamically adjustable, e.g., by adding Server methods for
> > > setHandlerCount(), setQueueSize(), etc.  There's been talk of trying
> > > to automatically adjust these within Server.java, based on load, and
> > > that would be better yet, but short of that, we might adjust them
> > > heuristically based on cluster size.
> > >
> > > The urgent thing, since we expect the best settings for large  
> > clusters
> > > to change, is to make it so that folks don't need to adjust these
> > > manually, even if the automation is an ill-understood heuristic.  I
> > > think we can easily get some workable heuristics into 0.15, but we
> > > might not get be able to implement async responses or figure out how
> > > to adjust it automatically in Server.java or whatever in that
> > > timeframe.  Perhaps we should just change the defaults to be big
> > > enough for 2000 nodes, but that seems like too big of a hammer.
> > >
> > > Doug
> > >
> >
> 
-- 
Jim Kellerman, Senior Engineer; Powerset
jim@powerset.com

Mime
View raw message