hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Issues running a large MapReduce job over a complete HBase table
Date Mon, 06 Dec 2010 18:21:27 GMT
Tell us more about your cluster Gabriel.  Can you take 1M from hbase
and give it to HDFS?  Does that make a difference?  What kinda OOME is
it?  Whats the message?  You might tune the thread stack size and that
might give you headroom you need.  How many nodes in your cluster and
how much RAM they have?

Thanks,
St.Ack
P.S. Yes, bigger files could help but OOME in DN is a little unusual.

On Mon, Dec 6, 2010 at 4:30 AM, Gabriel Reid <gabriel.reid@gmail.com> wrote:
> Hi Lars,
>
> All of the max heap sizes are left on their default values (ie 1000MB).
>
> The OOMEs that I encountered in the data nodes was only when I put the
> dfs.datanode.max.xcievers unrealistically high (8192) in an effort to
> escape the "xceiverCount X exceeds the limit of concurrent xcievers"
> errors. The datanodes weren't really having hard crashes, but they
> were getting OOMEs and becoming unusable until a restart.
>
>
> - Gabriel
>
> On Mon, Dec 6, 2010 at 12:33 PM, Lars George <lars.george@gmail.com> wrote:
>> Hi Gabriel,
>>
>> What max heap to you give the various daemons? This is really odd that
>> you see OOMEs, I would like to know what it has consumed. You are
>> saying the Hadoop DataNodes actually crash with the OOME?
>>
>> Lars
>>
>> On Mon, Dec 6, 2010 at 9:02 AM, Gabriel Reid <gabriel.reid@gmail.com> wrote:
>>> Hi,
>>>
>>> We're currently running into issues with running a MapReduce job over
>>> a complete HBase table - we can't seem to find a balance between
>>> having dfs.datanode.max.xcievers set too low (and getting
>>> "xceiverCount X exceeds the limit of concurrent xcievers") and getting
>>> OutOfMemoryErrors on datanodes.
>>>
>>> When trying to run a MapReduce job on the complete table we inevitably
>>> get one of the two above errors eventually -- using a more restrictive
>>> Scan with a startRow and stopRow for the job runs without problems.
>>>
>>> An important note is that the table that is being scanned has a large
>>> disparity in the size of the values being stored -- one column family
>>> contains values that are all generally around 256 kB in size, while
>>> the other column families in the table contain values that are closer
>>> to 256 bytes. The hbase.hregion.max.filesize setting is still at the
>>> default (256 MB), meaning that we have HFiles for the big column that
>>> are around 256 MB, and HFiles for the other columns that are around
>>> 256 kB. The dfs.datanode.max.xcievers setting is currently at 2048,
>>> and this is running a 5-node cluster.
>>>
>>> The table in question has about 7 million rows, and we're using
>>> Cloudera CDH3 (HBase 0.89.20100924 and Hadoop 0.20.2).
>>>
>>> As far as I have been able to discover, the correct thing to do (or to
>>> have done) is to set the hbase.hregion.max.filesize to a larger value
>>> to have a smaller number of rows, which as I understand would probably
>>> solve the issue here.
>>>
>>> My questions are:
>>> 1. Is my analysis about having a larger hbase.hregion.max.filesize correct?
>>> 2. Is there something else that we can do to resolve this?
>>> 3. Am I correct in assuming that the best way to resolve this now is
>>> to make the hbase.hregion.max.filesize setting larger, and then use
>>> the org.apache.hadoop.hbase.util.Merge tool as discussed at
>>> http://osdir.com/ml/general/2010-12/msg00534.html ?
>>>
>>> Any help on this would be greatly appreciated.
>>>
>>> Thanks,
>>>
>>> Gabriel
>>>
>>
>

Mime
View raw message