hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From charan kumar <charan.ku...@gmail.com>
Subject Re: Region Servers Crashing during Random Reads
Date Fri, 04 Feb 2011 06:26:05 GMT
Here you go..

HBase Performance tuning page
http://wiki.apache.org/hadoop/Hbase/FAQ#A7refers to the following
hadoop URL.

http://wiki.apache.org/hadoop/PerformanceTuning

Thanks,
Charan


On Thu, Feb 3, 2011 at 10:22 PM, Todd Lipcon <todd@cloudera.com> wrote:

> Does the wiki really recommend that? Got a link handy?
>
> On Thu, Feb 3, 2011 at 10:20 PM, charan kumar <charan.kumar@gmail.com
> >wrote:
>
> > Todd,
> >
> >  That did the trick.  I think the wiki should be updated as well, no
> point
> > in recommending ParNew 6M or is it?
> >
> > Thanks,
> > Charan.
> >
> > On Thu, Feb 3, 2011 at 2:06 PM, Charan K <charan.kumar@gmail.com> wrote:
> >
> > > Thanks Todd.. I will try it out ..
> > >
> > >
> > > On Feb 3, 2011, at 1:43 PM, Todd Lipcon <todd@cloudera.com> wrote:
> > >
> > > > Hi Charan,
> > > >
> > > > Your GC settings are way off - 6m newsize will promote way too much
> to
> > > the
> > > > oldgen.
> > > >
> > > > Try this:
> > > >
> > > > -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Xmn256m
> > > > -XX:CMSInitiatingOccupancyFraction=70
> > > >
> > > > -Todd
> > > >
> > > > On Thu, Feb 3, 2011 at 12:28 PM, charan kumar <
> charan.kumar@gmail.com
> > > >wrote:
> > > >
> > > >> HI Jonathan,
> > > >>
> > > >> Thanks for you quick reply..
> > > >>
> > > >> Heap is set to 4G.
> > > >>
> > > >> Following are the JVM opts.
> > > >> export HBASE_OPTS="$HBASE_OPTS -XX:+HeapDumpOnOutOfMemoryError
> > > >> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:NewSize=6m
> > > >> -XX:MaxNewSize=6m"
> > > >>
> > > >> Are there any other options apart from increasing the RAM?
> > > >>
> > > >> I am adding some more info about the app.
> > > >>
> > > >>> We are storing web page data in HBase.
> > > >>> Row key is Hashed URL, for random distribution, since we dont
plan
> to
> > > do
> > > >> scan's..
> > > >>> We have LZOCompression Set on this column family.
> > > >>> We were noticing 1500 Reads, when reading the page content.
> > > >>> We have a column family, which stores just metadata of the page
> > "title"
> > > >> etc... When reading this the performance is whopping 12000 TPS.
> > > >>
> > > >> We though the issue could be because of N/w bandwidth used between
> > HBase
> > > >> and Clients. So we disable LZO Compression on Column Family and
> > started
> > > >> doing the compression of the raw page on the client and decompress
> it
> > > when
> > > >> readind (LZO).
> > > >>
> > > >>> With this my write performance jumped up from 2000 to 5000 at
peak.
> > > >>> With this approach, the servers are crashing... Not sure , why
only
> > > >> after
> > > >> turning of LZO... and doing the same from client.
> > > >>
> > > >>
> > > >>
> > > >> On Thu, Feb 3, 2011 at 12:13 PM, Jonathan Gray <jgray@fb.com>
> wrote:
> > > >>
> > > >>> How much heap are you running on your RegionServers?
> > > >>>
> > > >>> 6GB of total RAM is on the low end.  For high throughput
> > applications,
> > > I
> > > >>> would recommend at least 6-8GB of heap (so 8+ GB of RAM).
> > > >>>
> > > >>>> -----Original Message-----
> > > >>>> From: charan kumar [mailto:charan.kumar@gmail.com]
> > > >>>> Sent: Thursday, February 03, 2011 11:47 AM
> > > >>>> To: user@hbase.apache.org
> > > >>>> Subject: Region Servers Crashing during Random Reads
> > > >>>>
> > > >>>> Hello,
> > > >>>>
> > > >>>> I am using hbase 0.90.0 with hadoop-append. h/w ( Dell 1950,
2
> CPU,
> > 6
> > > >> GB
> > > >>>> RAM)
> > > >>>>
> > > >>>> I had 9 Region Servers crash (out of 30) in a span of 30 minutes
> > > during
> > > >> a
> > > >>> heavy
> > > >>>> reads. It looks like a GC, ZooKeeper Connection Timeout thingy
to
> > me.
> > > >>>> I did all recommended configuration from the Hbase wiki...
Any
> other
> > > >>>> suggestions?
> > > >>>>
> > > >>>>
> > > >>>> 2011-02-03T09:43:07.890-0800: 70693.632: [GC 70693.632: [ParNew
> > > >>>> (promotion
> > > >>>> failed): 5555K->5540K(5568K), 0.0280950 secs]70693.660:
> > > >>>> [CMS2011-02-03T09:43:16.864-0800: 70702.606: [CMS-concurrent-mark:
> > > >>>> 12.549/69.323 secs] [Times: user=11.90 sys=1.26, real=69.31
secs]
> > > >>>>
> > > >>>> 2011-02-03T09:53:35.165-0800: 71320.785: [GC 71320.785: [ParNew
> > > >>>> (promotion
> > > >>>> failed): 5568K->5568K(5568K), 0.4384530 secs]71321.224:
> > > >>>> [CMS2011-02-03T09:53:45.111-0800: 71330.731: [CMS-concurrent-mark:
> > > >>>> 17.511/51.564 secs] [Times: user=38.72 sys=5.67, real=51.60
secs]
> > > >>>>
> > > >>>> 2011-02-03T09:43:07.890-0800: 70693.632: [GC 70693.632: [ParNew
> > > >>>> (promotion
> > > >>>> failed): 5555K->5540K(5568K), 0.0280950 secs]70693.660:
> > > >>>> [CMS2011-02-03T09:43:16.864-0800: 70702.606: [CMS-concurrent-mark:
> > > >>>> 12.549/69.323 secs] [Times: user=11.90 sys=1.26, real=69.31
secs]
> > > >>>>
> > > >>>>
> > > >>>> The following is the log entry in region Server
> > > >>>>
> > > >>>> 2011-02-03 10:37:43,946 INFO org.apache.zookeeper.ClientCnxn:
> Client
> > > >>>> session timed out, have not heard from server in 47172ms for
> > sessionid
> > > >>>> 0x12db9f722421ce3, closing socket connection and attempting
> > reconnect
> > > >>>> 2011-02-03 10:37:43,947 INFO org.apache.zookeeper.ClientCnxn:
> Client
> > > >>>> session timed out, have not heard from server in 48159ms for
> > sessionid
> > > >>>> 0x22db9f722501d93, closing socket connection and attempting
> > reconnect
> > > >>>> 2011-02-03 10:37:44,401 INFO org.apache.zookeeper.ClientCnxn:
> > Opening
> > > >>>> socket connection to server XXXXXXXXXXXXXXXX
> > > >>>> 2011-02-03 10:37:44,402 INFO org.apache.zookeeper.ClientCnxn:
> Socket
> > > >>>> connection established to XXXXXXXXX, initiating session
> > > >>>> 2011-02-03 10:37:44,709 INFO org.apache.zookeeper.ClientCnxn:
> > Opening
> > > >>>> socket connection to server XXXXXXXXXXXXXXX
> > > >>>> 2011-02-03 10:37:44,709 INFO org.apache.zookeeper.ClientCnxn:
> Socket
> > > >>>> connection established to XXXXXXXXXXXXXXXXXXXXX, initiating
> session
> > > >>>> 2011-02-03 10:37:44,767 DEBUG
> > > >>>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache
LRU
> > > >> eviction
> > > >>>> started; Attempting to free 81.93 MB of total=696.25 MB
> > > >>>> 2011-02-03 10:37:44,784 DEBUG
> > > >>>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache
LRU
> > > >> eviction
> > > >>>> completed; freed=81.94 MB, total=614.81 MB, single=379.98
MB,
> > > >>>> multi=309.77 MB, memory=0 KB
> > > >>>> 2011-02-03 10:37:45,205 INFO org.apache.zookeeper.ClientCnxn:
> Unable
> > > to
> > > >>>> reconnect to ZooKeeper service, session 0x22db9f722501d93
has
> > expired,
> > > >>>> closing socket connection
> > > >>>> 2011-02-03 10:37:45,206 INFO
> > > >>>>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplem
> > > >>>> entation:
> > > >>>> This client just lost it's session with ZooKeeper, trying
to
> > > reconnect.
> > > >>>> 2011-02-03 10:37:45,453 INFO
> > > >>>>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplem
> > > >>>> entation:
> > > >>>> Trying to reconnect to zookeeper
> > > >>>> 2011-02-03 10:37:45,206 INFO org.apache.zookeeper.ClientCnxn:
> Unable
> > > to
> > > >>>> reconnect to ZooKeeper service, session 0x12db9f722421ce3
has
> > expired,
> > > >>>> closing socket connection
> > > >>>> gionserver:60020-0x22db9f722501d93 regionserver:60020-
> > > >>>> 0x22db9f722501d93
> > > >>>> received expired from ZooKeeper, aborting
> > > >>>> org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > >>>> KeeperErrorCode = Session expired
> > > >>>>        at
> > > >>>>
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(
> > > >>>> ZooKeeperWatcher.java:328)
> > > >>>>        at
> > > >>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeep
> > > >>>> erWatcher.java:246)
> > > >>>>        at
> > > >>>>
> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.ja
> > > >>>> va:530)
> > > >>>>        at
> > > >>>>
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
> > > >>>> handled exception: org.apache.hadoop.hbase.YouAreDeadException:
> > Server
> > > >>>> REPORT rejected; currently processing
> > XXXXXXXXXXXX,60020,1296684296172
> > > >>>> as dead server
> > > >>>> org.apache.hadoop.hbase.YouAreDeadException:
> > > >>>> org.apache.hadoop.hbase.YouAreDeadException: Server REPORT
> rejected;
> > > >>>> currently processing XXXXXXXXXXXX,60020,1296684296172 as dead
> server
> > > >>>>        at
> > > >> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> > > >>>> Method)
> > > >>>>        at
> > > >>>>
> > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructor
> > > >>>> AccessorImpl.java:39)
> > > >>>>        at
> > > >>>>
> > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingCon
> > > >>>> structorAccessorImpl.java:27)
> > > >>>>        at
> > > >>> java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> > > >>>>        at
> > > >>>>
> > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteExce
> > > >>>> ption.java:96)
> > > >>>>        at
> > > >>>> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(Remote
> > > >>>> Exception.java:80)
> > > >>>>        at
> > > >>>>
> > org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerRep
> > > >>>> ort(HRegionServer.java:729)
> > > >>>>        at
> > > >>>>
> > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.j
> > > >>>> ava:586)
> > > >>>>        at java.lang.Thread.run(Thread.java:619)
> > > >>>>
> > > >>>>
> > > >>>> 2011-02-03T09:53:35.165-0800: 71320.785: [GC 71320.785: [ParNew
> > > >>>> (promotion
> > > >>>> failed): 5568K->5568K(5568K), 0.4384530 secs]71321.224:
> > > >>>> [CMS2011-02-03T09:53:45.111-0800: 71330.731: [CMS-concurrent-mark:
> > > >>>> 17.511/51.564 secs] [Times: user=38.72 sys=5.67, real=51.60
secs]
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Charan
> > > >>>
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Todd Lipcon
> > > > Software Engineer, Cloudera
> > >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message