hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Hbase pausing problems
Date Tue, 09 Feb 2010 17:45:42 GMT
You want namenode+jobtracker+hbase-master on one node
Then you want all slaves to be datanode+tastracker+regionserver

Since HDFS writes the first replica on the local node if possible, that
improves data locality when the DN and the RS are together.

Don't spend time on getting 2 namenode nodes, that will involve an
incredible amount or work and will possibly fail. Just make sure your master
node is reliable (mirrored disks, 2 PSU, etc) and that should almost never
be a problem. In 2 years of using Hadoop I never had a NN failure. Also with
such a small cluster you lose too much processing power, unless that node is
also configured as a slave.

J-D

On Tue, Feb 9, 2010 at 7:30 AM, Seraph Imalia <seraph@eisp.co.za> wrote:

> Hi Jean-Daniel,
>
> Thank you for your input - I'll make these changes and try it tonight.  I
> think it is probably also a good idea for me to enable compression now
> whilst the load is off the servers.
>
> We have a physical space issue in our server cabinet which will get
> resolved
> sometime in march and we are planning to add an additional 3 servers to the
> setup + maybe an additional one for the namenode and master hBase server.
>  I
> read somewhere that it is wise to place a datanode and regionserver
> together
> per server.  Is this wise?  Or is there a better way to configure this?
>
> Regards,
> Seraph
>
>
> > From: Jean-Daniel Cryans <jdcryans@apache.org>
> > Reply-To: <hbase-user@hadoop.apache.org>
> > Date: Mon, 8 Feb 2010 10:11:36 -0800
> > To: <hbase-user@hadoop.apache.org>
> > Subject: Re: Hbase pausing problems
> >
> > The "too many store files" is due to this
> >
> >   <property>
> >     <name>hbase.hstore.blockingStoreFiles</name>
> >     <value>7</value>
> >     <description>
> >     If more than this number of StoreFiles in any one Store
> >     (one StoreFile is written per flush of MemStore) then updates are
> >     blocked for this HRegion until a compaction is completed, or
> >     until hbase.hstore.blockingWaitTime has been exceeded.
> >     </description>
> >   </property>
> >
> > This block is there in order to not overrun the system with uncompacted
> > files. In the past I saw an import driving the number of store files to
> more
> > than 100 and it was just impossible to compact. The default setting is
> > especially low since the default heap size is 1GB, with 3GB you could set
> it
> > to 13-15.
> >
> > Since you have a high number of regions, consider tweaking this:
> >
> >   <property>
> >     <name>hbase.regions.percheckin</name>
> >     <value>10</value>
> >     <description>Maximum number of regions that can be assigned in a
> single
> > go
> >     to a region server.
> >     </description>
> >   </property>
> >
> > Since you have such a low number of nodes, a value of 100 would make a
> lot
> > of sense.
> >
> > On a general note, it seems that your machines are unable to keep up with
> > the size of data that's coming in and lots of compaction (and flushes)
> need
> > to happen. The fact that only 3 machines are doing the work exacerbates
> the
> > problem. Using the configurations I just told you about will lesser the
> > problem but you should really consider using LZO or even GZ since all you
> > care about is storing a lot of data and only read a few rows per day.
> > Enabling GZ won't require any new software on these nodes and there's no
> > chance of losing data.
> >
> > J-D
> >
> > On Mon, Feb 8, 2010 at 5:28 AM, Seraph Imalia <seraph@eisp.co.za> wrote:
> >
> >> Hi Guys,
> >>
> >> I am having another problem with hBase that is probably related to the
> >> problems I was emailing you about earlier this year.
> >>
> >> I have finally had a chance to at least try one of the suggestions you
> had
> >> to help resolve our problems.  I increased the heap size per server to
> 3Gig
> >> and added the following to the hbase-site.xml files on each server last
> >> night (I have not enabled compression yet for fear of loosing data - I
> need
> >> to wait for when I have a long period of time where hBase can be offline
> for
> >> and for incase there are problems I need to resolve) ...
> >>
> >> <property>
> >>    <name>hbase.regionserver.global.memstore.upperLimit</name>
> >>    <value>0.5</value>
> >>    <description>Maximum size of all memstores in a region server before
> new
> >>      updates are blocked and flushes are forced. Defaults to 40% of heap
> >>    </description>
> >> </property>
> >> <property>
> >>    <name>hbase.regionserver.global.memstore.lowerLimit</name>
> >>    <value>0.48</value>
> >>    <description>When memstores are being forced to flush to make room
in
> >>      memory, keep flushing until we hit this mark. Defaults to 30% of
> heap.
> >>      This value equal to hbase.regionserver.global.memstore.upperLimit
> >> causes
> >>      the minimum possible flushing to occur when updates are blocked due
> to
> >>      memstore limiting.
> >>    </description>
> >> </property>
> >>
> >> ...and then restarted hbase
> >> bin/stop-hbase.sh
> >> bin/start-hbase.sh
> >>
> >> Hbase spent about 30 minutes assigning regions to each of the region
> >> servers (we now have 2595 regions).  When it had finished (which is
> usually
> >> when our clients apps are able to start adding rows), client apps were
> only
> >> able to add rows at an incredibly slow rate (about 1 every second) which
> was
> >> not even able to cope with the miniscule load we have at 3AM in the
> morning.
> >>
> >> I left hBase for about 30 minutes after region assignment had completed
> and
> >> the situation did not improve.  I then tried changing the lowerLimit to
> 0.38
> >> and restart again which also did not improve the situation.  I then
> removed
> >> the above lines by commenting them out (<!-- -->) and restarted hBase
> again.
> >>  Again, 30 minutes later after it had finished assigning regions, it was
> no
> >> different.
> >>
> >> I therefore assumed that the problem was not caused by the addition of
> the
> >> properties but rather just by the fact that it had been restarted.  I
> >> checked the log files very closely and I noticed that when I disable the
> >> client apps, the regionservers are frantically requesting major
> compactions
> >> and complaining about too many store files for a region.
> >>
> >> I then assumed that the system is under strain performing houskeeping
> and
> >> there is nothing I can do with my limited knowledge to improve it
> without
> >> contacting you guys about it first.  It was 4AM this morning and I had
> no
> >> choice but to do whatever I could to get our client apps up and running
> >> before morning, so I wrote some quick coldfusion and java code to get
> the
> >> data inserted into local mysql servers so that hBase could have time to
> do
> >> whatever it was doing.
> >>
> >> It is still compacting and it is now 9 hours after the last restart.
> With 0
> >> load from client apps.
> >>
> >> Please can you assist by shedding some light on what is actually
> happening?
> >> - Is my thinking correct? - Is it related to the "hBase pausing
> problems" we
> >> are still having? - What do I do to fix it or make it hurry up?
> >>
> >> Regards,
> >> Seraph
> >>
> >>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message