hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: SocketTimeoutException caused by GC?
Date Fri, 28 Jan 2011 00:24:29 GMT
Not as far as I know, unless you disabled splits from the beginning
like some ppl do.

J-D

On Thu, Jan 27, 2011 at 4:22 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> Is there a way to disable splitting (on a particular region server) ?
>
> On Thu, Jan 27, 2011 at 4:20 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:
>
>> Mmm yes for the sake of not having a single region that moved, but it
>> wouldn't be so bad... it just means that those regions will be closed
>> when the RS closes.
>>
>> Also it's possible to have splits during that time, again it's not
>> dramatic as long as the script doesn't freak out because a region is
>> gone.
>>
>> J-D
>>
>> On Thu, Jan 27, 2011 at 4:13 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>> > Should steps 1 and 2 below be exchanged ?
>> >
>> > Regards
>> >
>> > On Thu, Jan 27, 2011 at 3:53 PM, Jean-Daniel Cryans <jdcryans@apache.org
>> >wrote:
>> >
>> >> To mitigate heap fragmentation, you could consider adding more nodes
>> >> to the cluster :)
>> >>
>> >> Regarding rolling restarts, currently there's one major issue:
>> >> https://issues.apache.org/jira/browse/HBASE-3441
>> >>
>> >> How it currently works is a bit dumb, when you cleanly close a region
>> >> server it will first close all incoming connections and then will
>> >> procede to close the regions and it's not until it's fully done that
>> >> it will report to the master. What it means for your clients is that a
>> >> portion of the regions will become unavailable for some time until the
>> >> region server is done shutting down. How long you ask? Well it depends
>> >> on 1) how many regions you have but also mostly 2) how much data needs
>> >> to be flushed from the MemStores. On one of our clusters, shutting
>> >> down HBase takes a few minutes since our write pattern is almost
>> >> perfectly distributed meaning that all the memstore space is always
>> >> full from all the regions (luckily it's a cluster that serves only
>> >> mapreduce jobs).
>> >>
>> >> Writing this gives me an idea... I think one "easy" way we could
>> >> achieve this region draining problem is by writing a jruby script
>> >> that:
>> >>
>> >> 1- Retrieves the list of regions served by a RS
>> >> 2- Disables master balancing
>> >> 3- Moves one by one every region out of the RS, assigning them to the
>> >> other RSs in a round-robin fashion
>> >> 4- Shuts down the RS
>> >> 5- Reenables master balancing
>> >>
>> >> I wonder if it would work... At least it's a process that you could
>> >> stop at any time without breaking everything.
>> >>
>> >> J-D
>> >>
>> >> On Thu, Jan 27, 2011 at 11:38 AM, Wayne <wav100@gmail.com> wrote:
>> >> > I assumed GC was *trying* to roll. It shows the last 30min of logs
>> with
>> >> > control characters at the end.
>> >> >
>> >> > We are not all writes. In terms of writes we can wait and the
>> zookeeper
>> >> > timeout can go way up, but we also need to support real-time reads
>> (end
>> >> user
>> >> > based) and that is why the zookeeper timeout is not our first choice
>> to
>> >> > increase (we would rather decrease it). The funny part is that .90
>> seems
>> >> > faster for us and churns through writes at a faster clip thereby
>> probably
>> >> > becoming less stable sooner due to the JVM not being able to handle
>> it.
>> >> > Should we schedule a rolling restart every 24 hours? How do production
>> >> > systems accept volume writes through the front door without melting
>> the
>> >> JVM
>> >> > due to fragmentation? We can possibly switch to bulk writes but
>> >> performance
>> >> > is not our problem...stability is. We are pushing 40k writes/node/sec
>> >> > sustained with well balanced regions hour after hour day after day
>> (until
>> >> a
>> >> > zookeeper tear down).
>> >> >
>> >> > Great to hear it is actively being looked at. I will keep an eye on
>> >> #3455.
>> >> >
>> >> > Below are our GC options, many of which are from work with the other
>> java
>> >> > database. Should I go back to the default settings? Should I use those
>> >> > referenced in the Jira #3455 (-XX:+UseConcMarkSweepGC
>> >> > -XX:CMSInitiatingOccupancyFraction=65 -Xms8g -Xmx8g). We are also
>> using
>> >> > Java6u23.
>> >> >
>> >> >
>> >> > export HBASE_HEAPSIZE=8192
>> >> > export HBASE_OPTS="-XX:+UseCMSInitiatingOccupancyOnly
>> >> > -XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSParallelRemarkEnabled
>> >> > -XX:SurvivorRatio=8 -XX:NewRatio=3 -XX:MaxTenuringThreshold=1
>> >> > -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
>> >> > -XX:+CMSIncrementalMode"
>> >> > export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails
>> >> > -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
>> >> >
>> >> >
>> >> > Thanks for your help!
>> >> >
>> >> >
>> >>
>> >
>>
>

Mime
View raw message