hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Hanging regionservers
Date Fri, 16 Jul 2010 22:32:43 GMT
So, it seems like you are by-passing issue by having no time out on
the socket.  Would be for sure interested though if you have the issue
still on cdh3b2.  Most folks will not be running with no socket
timeout.

Thanks Luke.
St.Ack


On Fri, Jul 16, 2010 at 3:01 PM, Luke Forehand
<luke.forehand@networkedinsights.com> wrote:
> Using Ryan Rawson's suggested config tweaks, we have just completed a successful job
run with a 15GB sequence file, no hang.  I'm setting up to have multiple files process this
weekend with the new settings.  :-)  I believe the dfs socket write timeout being indefinite
was the trick.
>
> I'll post my results on Monday.  Thanks for the support thus far!
>
> -Luke
>
> On 7/15/10 10:17 PM, "Ryan Rawson" <ryanobjc@gmail.com> wrote:
>
> I'm not seeing anything in that logfile, you are seeing compactions
> for various regions, but im not seeing flushes (typical during insert
> loads) and nothing else. One thing we look to see is a log message
> "Blocking updates" which indicates that a particular region has
> decided it's holding up to prevent taking too many inserts.
>
> Like I said, you could be seeing this on a different regionserver, if
> all the clients are blocked on 1 regionserver and can't get to the
> others then most will look idle and only one will actually show
> anything interesting in the log.
>
> Can you check for this behaviour?
>
> Also if you want to tweak the config with the values I pasted that should help.
>
> On Thu, Jul 15, 2010 at 7:25 PM, Luke Forehand
> <luke.forehand@networkedinsights.com> wrote:
>> It looks like we are going straight from the default config, no expicit setting of
anything.
>>
>> On 7/15/10 9:03 PM, "Ryan Rawson" <ryanobjc@gmail.com> wrote:
>>
>> In this case the regionserver isn't actually doing anything - all the
>> IPC thread handlers are waiting in their queue handoff thingy (how
>> they get socket/work to do).
>>
>> Something elsewhere perhaps?  Check the logs of your jobs, there might
>> be something interesting there.
>>
>> One thing that frequently happens is you overrun 1 regionserver with
>> edits and it isnt flushing fast enough, so it pauses updates and all
>> clients end up stuck on it.
>>
>> What was that config again?  I use these settings:
>>
>> <property>
>>  <name>hbase.hstore.blockingStoreFiles</name>
>>  <value>15</value>
>> </property>
>>
>> <property>
>>  <name>dfs.datanode.socket.write.timeout</name>
>>  <value>0</value>
>> </property>
>>
>> <property>
>>  <name>hbase.hregion.memstore.block.multiplier</name>
>>  <value>8</value>
>> </property>
>>
>> perhaps try these ones?
>>
>> -ryan
>

Mime
View raw message