hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Hbase pausing problems
Date Tue, 09 Feb 2010 17:46:37 GMT
> I read somewhere that it is wise to place a datanode and regionserver
> together per server.  Is this wise? 

That is the typical deployment strategy. In this way, you can scale the
HBase region carrying capacity and the capacity of the underlying HDFS
volume at the same time. 

Hadoop TaskTrackers and user MapReduce tasks are often colocated as 
well. This is moving computation to data instead of the other way
around. 

However once all of the services are colocated you may find that loading
is uneven when the cluster is below a certain size, where that threshold
depends on your application. What I mean here is you may find that you
need to add servers to spread the load of one subsystem because it is
affecting all. 

You may want to search the hbase-user@ mailing list archive using the
search term "storage engineering". 

Hope that helps,

   - Andy




----- Original Message ----
> From: Seraph Imalia <seraph@eisp.co.za>
> To: hbase-user@hadoop.apache.org
> Sent: Tue, February 9, 2010 7:30:42 AM
> Subject: Re: Hbase pausing problems
> 
> Hi Jean-Daniel,
> 
> Thank you for your input - I'll make these changes and try it tonight.  I
> think it is probably also a good idea for me to enable compression now
> whilst the load is off the servers.
> 
> We have a physical space issue in our server cabinet which will get resolved
> sometime in march and we are planning to add an additional 3 servers to the
> setup + maybe an additional one for the namenode and master hBase server.  I
> read somewhere that it is wise to place a datanode and regionserver together
> per server.  Is this wise?  Or is there a better way to configure this?
> 
> Regards,
> Seraph 
> 
> 
> > From: Jean-Daniel Cryans 
> > Reply-To: 
> > Date: Mon, 8 Feb 2010 10:11:36 -0800
> > To: 
> > Subject: Re: Hbase pausing problems
> > 
> > The "too many store files" is due to this
> > 
> >  
> >    hbase.hstore.blockingStoreFiles
> >    7
> >    
> >     If more than this number of StoreFiles in any one Store
> >     (one StoreFile is written per flush of MemStore) then updates are
> >     blocked for this HRegion until a compaction is completed, or
> >     until hbase.hstore.blockingWaitTime has been exceeded.
> >    
> >  
> > 
> > This block is there in order to not overrun the system with uncompacted
> > files. In the past I saw an import driving the number of store files to more
> > than 100 and it was just impossible to compact. The default setting is
> > especially low since the default heap size is 1GB, with 3GB you could set it
> > to 13-15.
> > 
> > Since you have a high number of regions, consider tweaking this:
> > 
> >  
> >    hbase.regions.percheckin
> >    10
> >    Maximum number of regions that can be assigned in a single
> > go
> >     to a region server.
> >    
> >  
> > 
> > Since you have such a low number of nodes, a value of 100 would make a lot
> > of sense.
> > 
> > On a general note, it seems that your machines are unable to keep up with
> > the size of data that's coming in and lots of compaction (and flushes) need
> > to happen. The fact that only 3 machines are doing the work exacerbates the
> > problem. Using the configurations I just told you about will lesser the
> > problem but you should really consider using LZO or even GZ since all you
> > care about is storing a lot of data and only read a few rows per day.
> > Enabling GZ won't require any new software on these nodes and there's no
> > chance of losing data.
> > 
> > J-D
> > 
> > On Mon, Feb 8, 2010 at 5:28 AM, Seraph Imalia wrote:
> > 
> >> Hi Guys,
> >> 
> >> I am having another problem with hBase that is probably related to the
> >> problems I was emailing you about earlier this year.
> >> 
> >> I have finally had a chance to at least try one of the suggestions you had
> >> to help resolve our problems.  I increased the heap size per server to 3Gig
> >> and added the following to the hbase-site.xml files on each server last
> >> night (I have not enabled compression yet for fear of loosing data - I need
> >> to wait for when I have a long period of time where hBase can be offline for
> >> and for incase there are problems I need to resolve) ...
> >> 
> >> 
> >>    hbase.regionserver.global.memstore.upperLimit
> >>    0.5
> >>    Maximum size of all memstores in a region server before new
> >>      updates are blocked and flushes are forced. Defaults to 40% of heap
> >>    
> >> 
> >> 
> >>    hbase.regionserver.global.memstore.lowerLimit
> >>    0.48
> >>    When memstores are being forced to flush to make room in
> >>      memory, keep flushing until we hit this mark. Defaults to 30% of heap.
> >>      This value equal to hbase.regionserver.global.memstore.upperLimit
> >> causes
> >>      the minimum possible flushing to occur when updates are blocked due to
> >>      memstore limiting.
> >>    
> >> 
> >> 
> >> ...and then restarted hbase
> >> bin/stop-hbase.sh
> >> bin/start-hbase.sh
> >> 
> >> Hbase spent about 30 minutes assigning regions to each of the region
> >> servers (we now have 2595 regions).  When it had finished (which is usually
> >> when our clients apps are able to start adding rows), client apps were only
> >> able to add rows at an incredibly slow rate (about 1 every second) which was
> >> not even able to cope with the miniscule load we have at 3AM in the morning.
> >> 
> >> I left hBase for about 30 minutes after region assignment had completed and
> >> the situation did not improve.  I then tried changing the lowerLimit to 0.38
> >> and restart again which also did not improve the situation.  I then removed
> >> the above lines by commenting them out () and restarted hBase again.
> >>  Again, 30 minutes later after it had finished assigning regions, it was no
> >> different.
> >> 
> >> I therefore assumed that the problem was not caused by the addition of the
> >> properties but rather just by the fact that it had been restarted.  I
> >> checked the log files very closely and I noticed that when I disable the
> >> client apps, the regionservers are frantically requesting major compactions
> >> and complaining about too many store files for a region.
> >> 
> >> I then assumed that the system is under strain performing houskeeping and
> >> there is nothing I can do with my limited knowledge to improve it without
> >> contacting you guys about it first.  It was 4AM this morning and I had no
> >> choice but to do whatever I could to get our client apps up and running
> >> before morning, so I wrote some quick coldfusion and java code to get the
> >> data inserted into local mysql servers so that hBase could have time to do
> >> whatever it was doing.
> >> 
> >> It is still compacting and it is now 9 hours after the last restart. With 0
> >> load from client apps.
> >> 
> >> Please can you assist by shedding some light on what is actually happening?
> >> - Is my thinking correct? - Is it related to the "hBase pausing problems" we
> >> are still having? - What do I do to fix it or make it hurry up?
> >> 
> >> Regards,
> >> Seraph
> >> 
> >> 



      


Mime
View raw message