hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Issues running a large MapReduce job over a complete HBase table
Date Tue, 07 Dec 2010 16:46:34 GMT
On Mon, Dec 6, 2010 at 11:15 PM, Gabriel Reid <gabriel.reid@gmail.com> wrote:
> Hi St.Ack,
> The cluster is a set of 5 machines, each with 3GB of RAM and 1TB of
> storage. One machine is doing duty as Namenode, HBase Master, HBase
> Regionserver, Datanode, Job Tracker, and Task Tracker, while the other
> four are all Datanodes, Regionservers, and Task Trackers.

OK.  Yeah, 3GB of RAM is a little anemic to be carrying 1k plus
regions per server.  You'll have to be tread carefully (Make sure you
are not swapping),

> The OOME that I was getting on the datanodes when I set the
> dfs.datanode.max.xcievers to 8192 was as follows:
> 2010-12-03 01:33:45,748 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(,
> storageID=DS-89185505-,
> infoPort=50075, ipcPort=50020):DataXceiveServer: Exiting due
> to:java.lang.OutOfMemoryError: unable to create new native thread
>        at java.lang.Thread.start0(Native Method)
>        at java.lang.Thread.start(Thread.java:640)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:132)
>        at java.lang.Thread.run(Thread.java:662)

Yeah, above should be fixable messing with thread stack size -- and
yes would seem to come of the pure number of open files.  Google it.
Here's an example

> While I'm certainly willing to try shifting memory allotment around
> and/or changing the thread stack size, I'd also like to get a better
> idea of if this is due to a huge number of regions (just over ten
> thousand regions) and if so, can it be remedied (or could hit have
> been remedied) by reducing the number of regions? Or is this just
> something that can happen when you have a large quantity of data and
> underpowered machines?

Yeah, you are up against the limits of your current
deploy/configuration.  Do any of the following to get out of the bind:

+ Up your region sizes so bigger files so less of them open at any one time.
+ Mess with configuration that effects RAM -- i.e. thread stack sizes
or, dependent on what your query path looks like, shrink size given
over to block cache (will slow your reads though).
+ Add machiness

> Also, any insight as to how 10240 xceivers in total (2048 per machine
> * five machines) are being used up while scanning over two column
> families of a single table while writing two column families to
> another table with 5 task trackers would also be appreciated.

Well, you have 10k regions.  Each region is made of 0 to N store
files.  In your case you can probably be sure there is at least one
store file and often more than one (To have only 1 store file, run a
major compaction.  A major compaction on your cluster will be taxing
given its up against the wall already).  Each store file consumes one
(maybe its two -- a thread dump on datanode would tell for sure)
threads over on the Datanode. So, assuming that on average you have 2
files per region, thats (10k * 2 * 2) / 4 datanodes = 10k threads per
datanode?  You said you had xceivers set at 8k?  That might not be
enough... or be aware that fixing the OOME might take you to next
issue, not enough xceivers.

 It makes
> me feel like I might be doing something wrong (ie leaking open
> scanners, etc), as I don't understand how so many xceivers are being
> used at the same time. On the other hand, my understanding of the
> underlying workings is certainly limited, so that could also explain
> my lack of understanding of the situation.

You might be leaking scanners but that should have no effect on number
of open store files.  On deploy of a region, we open its store files
and hold them open and do not open others -- not unless

Hope this helps,

> Thanks,
> Gabriel
> On Mon, Dec 6, 2010 at 7:21 PM, Stack <stack@duboce.net> wrote:
>> Tell us more about your cluster Gabriel.  Can you take 1M from hbase
>> and give it to HDFS?  Does that make a difference?  What kinda OOME is
>> it?  Whats the message?  You might tune the thread stack size and that
>> might give you headroom you need.  How many nodes in your cluster and
>> how much RAM they have?
>> Thanks,
>> St.Ack
>> P.S. Yes, bigger files could help but OOME in DN is a little unusual.
>> On Mon, Dec 6, 2010 at 4:30 AM, Gabriel Reid <gabriel.reid@gmail.com> wrote:
>>> Hi Lars,
>>> All of the max heap sizes are left on their default values (ie 1000MB).
>>> The OOMEs that I encountered in the data nodes was only when I put the
>>> dfs.datanode.max.xcievers unrealistically high (8192) in an effort to
>>> escape the "xceiverCount X exceeds the limit of concurrent xcievers"
>>> errors. The datanodes weren't really having hard crashes, but they
>>> were getting OOMEs and becoming unusable until a restart.
>>> - Gabriel
>>> On Mon, Dec 6, 2010 at 12:33 PM, Lars George <lars.george@gmail.com> wrote:
>>>> Hi Gabriel,
>>>> What max heap to you give the various daemons? This is really odd that
>>>> you see OOMEs, I would like to know what it has consumed. You are
>>>> saying the Hadoop DataNodes actually crash with the OOME?
>>>> Lars
>>>> On Mon, Dec 6, 2010 at 9:02 AM, Gabriel Reid <gabriel.reid@gmail.com>
>>>>> Hi,
>>>>> We're currently running into issues with running a MapReduce job over
>>>>> a complete HBase table - we can't seem to find a balance between
>>>>> having dfs.datanode.max.xcievers set too low (and getting
>>>>> "xceiverCount X exceeds the limit of concurrent xcievers") and getting
>>>>> OutOfMemoryErrors on datanodes.
>>>>> When trying to run a MapReduce job on the complete table we inevitably
>>>>> get one of the two above errors eventually -- using a more restrictive
>>>>> Scan with a startRow and stopRow for the job runs without problems.
>>>>> An important note is that the table that is being scanned has a large
>>>>> disparity in the size of the values being stored -- one column family
>>>>> contains values that are all generally around 256 kB in size, while
>>>>> the other column families in the table contain values that are closer
>>>>> to 256 bytes. The hbase.hregion.max.filesize setting is still at the
>>>>> default (256 MB), meaning that we have HFiles for the big column that
>>>>> are around 256 MB, and HFiles for the other columns that are around
>>>>> 256 kB. The dfs.datanode.max.xcievers setting is currently at 2048,
>>>>> and this is running a 5-node cluster.
>>>>> The table in question has about 7 million rows, and we're using
>>>>> Cloudera CDH3 (HBase 0.89.20100924 and Hadoop 0.20.2).
>>>>> As far as I have been able to discover, the correct thing to do (or to
>>>>> have done) is to set the hbase.hregion.max.filesize to a larger value
>>>>> to have a smaller number of rows, which as I understand would probably
>>>>> solve the issue here.
>>>>> My questions are:
>>>>> 1. Is my analysis about having a larger hbase.hregion.max.filesize correct?
>>>>> 2. Is there something else that we can do to resolve this?
>>>>> 3. Am I correct in assuming that the best way to resolve this now is
>>>>> to make the hbase.hregion.max.filesize setting larger, and then use
>>>>> the org.apache.hadoop.hbase.util.Merge tool as discussed at
>>>>> http://osdir.com/ml/general/2010-12/msg00534.html ?
>>>>> Any help on this would be greatly appreciated.
>>>>> Thanks,
>>>>> Gabriel

View raw message