hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@facebook.com>
Subject RE: Memory Consumption and Processing questions
Date Mon, 02 Aug 2010 04:08:56 GMT
One reason not to extrapolate that is that leaving lots of memory for the linux buffer cache
is a good way to improve overall performance of typically i/o bound applications like Hadoop
and HBase.

Also, I'm unsure that "most people use ~8 for hdfs/mr".  DataNodes generally require almost
no significant memory (though generally run with 1GB); their performance will improve with
more free memory for the os buffer cache.  As for MR, this completely depends on the tasks
running.  The TaskTrackers also don't require significant memory, so this completely depends
on the number of tasks per node and the memory requirements of the tasks.

Unfortunately you can't always generalize the requirements too much, especially in MR.

JG

> -----Original Message-----
> From: Jacques [mailto:whshub@gmail.com]
> Sent: Sunday, August 01, 2010 5:30 PM
> To: user@hbase.apache.org
> Subject: Re: Memory Consumption and Processing questions
> 
> Thanks, that was very helpful.
> 
> Regarding 24gb-- I saw people using servers with 32gb of server memory
> (a
> recent thread here and hstack.org).  I extrapolated the use since it
> seems
> most people use ~8 for hdfs/mr.
> 
> -Jacques
> 
> 
> On Sun, Aug 1, 2010 at 11:39 AM, Jonathan Gray <jgray@facebook.com>
> wrote:
> 
> >
> >
> > > -----Original Message-----
> > > From: Jacques [mailto:whshub@gmail.com]
> > > Sent: Friday, July 30, 2010 1:16 PM
> > > To: user@hbase.apache.org
> > > Subject: Memory Consumption and Processing questions
> > >
> > > Hello all,
> > >
> > > I'm planning an hbase implementation and had some questions I was
> > > hoping
> > > someone could help with.
> > >
> > > 1. Can someone give me a basic overview of how memory is used in
> Hbase?
> > >  Various places on the web people state that 16-24gb is the minimum
> for
> > > region servers if they also operate as hdfs/mr nodes.  Assuming
> that
> > > hdfs/mr
> > > nodes consume ~8gb that leaves a "minimum" of 8-16gb for hbase.  It
> > > seems
> > > like lots of people suggesting use of even 24gb+ for hbase.  Why so
> > > much?
> > >  Is it simply to avoid gc problems?  Have data in memory for fast
> > > random
> > > reads? Or?
> >
> > Where exactly are you reading this from?  I'm not actually aware of
> people
> > using 24GB+ heaps for HBase.
> >
> > I would not recommend using less than 4GB for RegionServers.  Beyond
> that,
> > it very much depends on your application.  8GB is often sufficient
> but I've
> > seen as much as 16GB used in production.
> >
> > You need at least 4GB because of GC.  General experience has been
> that
> > below that the CMS GC does not work well.
> >
> > Memory is used primarily for the MemStores (write cache) and Block
> Cache
> > (read cache).  In addition, memory is allocated as part of normal
> operations
> > to store in-memory state and in processing reads.
> >
> > > 2. What types of things put more/less pressure on memory?  I saw
> > > insinuation
> > > that insert speed can create substantial memory pressure.  What
> type of
> > > relative memory pressure do scanners, random reads, random writes,
> > > region
> > > quantity and compactions cause?
> >
> > Writes are buffered and flushed to disk when the write buffer gets to
> a
> > local or global limit.  The local limit (per region) defaults to
> 64MB.  The
> > global limit is based on the total amount of heap available (default,
> I
> > think, is 40%).  So there is interplay between how much heap you have
> and
> > how many regions are actively written to.  If you have too many
> regions and
> > not enough memory to allow them to hit the local/region limit, you
> end up
> > flushing undersized files.
> >
> > Scanning/random reading will utilize the block cache, if configured
> to.
> >  The more room for the block cache, the more data you can keep in-
> memory.
> >  Reads from the block cache are significantly faster than non-cached
> reads,
> > obviously.
> >
> > Compactions are not generally an issue.
> >
> > > 2. How cpu intensive are the region servers?  It seems like most of
> > > their
> > > performance is based on i/o.  (I've noted the caution in starving
> > > region
> > > servers of cycles--which seems primarily focused on avoiding zk
> timeout
> > > >
> > > region reassignment problems.)  Does anyone suggest or recommend
> > > against
> > > dedicating only one or two cores to a region server?  Do individual
> > > compactions benefit from multiple cores are they single-threaded?
> >
> > I would dedicate at least one core to a region server, but as we add
> more
> > and more concurrency, it may become important to have two cores
> available.
> >  Many things, like compactions, are only single threaded today but
> there's a
> > very good chance you will be able to configure multiple threads in
> the next
> > major release.
> >
> > > 3. What are the memory and cpu resource demands of the master
> server?
> > > It
> > > seems like more and more of that load is moving to zk.
> >
> > Not too much.  I'm putting a change in TRUNK right now that keeps all
> > region assignments in the master, so there is some memory usage, but
> not
> > much.  I would think 2GB heap and 1-2 cores is sufficient.
> >
> > > 4. General HDFS question-- when the namenode dies, what happens to
> the
> > > datanodes and how does that relate to Hbase?  E.g., can hbase
> continue
> > > to
> > > operate in a read-only mode (assuming no datanode/regionserver
> failures
> > > post
> > > namenode failure)?
> >
> > Today, HBase will probably die ungracefully once it does start to hit
> the
> > NN.  There are some open JIRAs about HBase behavior under different
> HDFS
> > faults and trying to be as graceful as possible when they happen,
> including
> > HBASE-2183 about riding over an HDFS restart.
> >
> > >
> > > Thanks for your help,
> > > Jacques
> >

Mime
View raw message