hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: HTable.put hangs on bulk loading
Date Fri, 13 May 2011 19:34:07 GMT
On Fri, May 13, 2011 at 7:44 AM, Stan Barton <bartx007@gmail.com> wrote:
> stack-3 wrote:
>>
>> On Thu, Apr 28, 2011 at 6:54 AM, Stan Barton <bartx007@gmail.com> wrote:
>> Are you swapping Stan?  You are close to the edge with your RAM
>> allocations.  What do you have swappyness set to?  Is it default?
>>
>> Writing you don't need that much memory usually but you do have a lot
>> of regions so you could be flushing a bunch, a bunch of small files.
>>
>
> Due to various problems with swap, the swap was turned off and the
> overcommitment of the memory was turned on.
>

Sorry.  How do you enable overcommitment of memory, or do you mean to
say that your processes add up to more than the RAM you have?


> stack-3 wrote:
>> These are old IA stock machines?  Do they have ECC RAM?  (IIRC, they
>> used to not have ECC RAM).
>>
>
> Strangely, on the machines and the debian installed, only this (star * )
> approach works.

OK.  New to me, but hey, what do I know!


> Originally, I was running the DB on the same cluster as the
> processing took place - mostly mapreduce jobs reading the data and doing
> some analysis. But when I started using nutchwax on the same cluster I
> started running out of memory (on the mapreduce side) and since the machines
> are so sensitive (no swap and overcommitment) that became a nightmare. So
> right now the nutch is being ran on a separate cluster - I have tweaked
> nutchwax to work with recent Hadoop apis and also to take the hbase stored
> content on as the input (instead of ARC files).
>

Good stuff

> The machines are somehow renovated old red boxes (I dont know what
> configuration they were originally). The RAM is not an ECC as far as I know,
> because the chipset on the motherboards does not support that technology.
>

OK.  You seeing any issues arising because of checksum issues?  (BTW,
IIRC, these non-ECC red boxes are the reason HDFS is a checksummed
filesystem)


> stack-3 wrote:
>>
>>> hadoop/hdfs-site.xml http://pastebin.ca/2051529
>>>
>>
>> Did you change the dfs block size?   Looks like its 256M rather than
>> usual 64M.  Any reason for that?  Would suggest going w/ defaults at
>> first.
>>
>> Remove dfs.datanode.socket.write.timeout == 0.  Thats an old config.
>> recommendation that should no longer be necessary and is likely
>> corrosive.
>>
>
> I have changed the size of the block, to diminish the overall number of
> blocks. I was following some advices regarding managing that large amount of
> data in HDFS that I found in the fora.
>

Yeah, I suppose, bigger blocksizes would make it so you need less RAM
in your namenode.  You have lots of files on here?  On the other side,
bigger blocks are harder for hbase to sling.


> As for the dfs.datanode.socket.write.timeout, that was set up because I was
> observing quite often timeouts on the DFS sockets, and by digging around, I
> have found out, that for some reason the internal java times were not
> aligned of the connecting machines (even though the hw clock were), I think
> there was a JIRA for that.
>

Not sure what this one is about.  The
dfs.datanode.socket.write.timeout=0 is old lore by this stage I think
you'll find.

>

> Again, the reason to upper the block size was motivated by the assumption of
> lowering the overall number of blocks. If it imposes stress on the RAM it
> makes sense to leave it on the defaults. I guess it also helps the
> parallelization.
>

Yeah, would suggest you run w/ default sizes.


> stack-3 wrote:
>>
>>
>>
>>> hbase/hbase-env.sh http://pastebin.ca/2051535
>>>
>>
>> Remove this:
>>
>> -XX:+HeapDumpOnOutOfMemoryError
>>
>> Means it will dump heap if JVM crashes.  This is probably of no
>> interest to you and could actually cause you pain if you have small
>> root file system if the heap dump causes you to fill.
>>
>> The -XX:CMSInitiatingOccupancyFraction=90 is probably near useless
>> (default is 92% or 88% -- I don't remember which).  Set it down to 80%
>> or 75% if you want it to actually make a difference.
>>
>> Are you having issues w/ GC'ing?  I see you have mslab enabled.
>>
>>
> On the version 0.20.6 I have seen long pauses during the importing phase and
> also when querying. I was measuring the how many queries were processed per
> second and could see pauses in the throughput. The only culprit I could find
> was the gc, but still could not figure out why it pauses the whole DB.
> Therefore I gave it a shot with mslab with 0.90, but I do still see those
> pauses in the throughput.
>

Importing, yeah, you are probably running into the 'gate' that a
regionserver puts up when it has filled its memstore while waiting on
flush to complete.  Check regionserver logs at about this time.  You
should see 'blocking' messages followed soon after by unblocking after
the flush runs.

St.Ack

Mime
View raw message