hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: HBase Large Load Issue
Date Wed, 04 Dec 2013 02:41:59 GMT
Like Vladimir is saying. Do you have any need of storing the files into
HBase? 20mb is pretty big. Can you not just store the file into HDFS and
store only the path of the file into HBase?

Do you have the logs of when the servers are died? Any GC pause?


2013/12/3 Vladimir Rodionov <vrodionov@carrieriq.com>

> >>Any advice is appreciated.
> Do not store your files in HBase, store only references.
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
> ________________________________________
> From: Bill Sanchez [bill.sanchez2487@gmail.com]
> Sent: Tuesday, December 03, 2013 3:45 PM
> To: user@hbase.apache.org
> Subject: HBase Large Load Issue
> Hello,
> I am seeking some advice on my hbase issue.  I am trying to configure a
> system that will eventually load and store approximately 50GB-80GB of data
> daily.  This data consists of files that are roughly 3MB-5MB each with some
> reaching 20MB and some as small as 1MB.  The load job does roughly 20,000
> puts to the same table spread across an initial set of 20 pre-split regions
> on 20 region servers.  During the first load I see some splitting (ending
> with around 50 regions) and in subsequent loads the number of regions will
> go much higher.
> After running similarly sized loads about 4 or 5 times I start to see the
> following behavior that I cannot explain.  The table in question has
> VERSIONS=1 and some of these test loads use the same data, but not all.
> Below is a summary of the behavior along with a few of the configuration
> settings I have tried so far.
> Environment:
> HBase 0.94.13-security with Kerberos enabled
> Zookeeper 3.4.5
> Hadoop 1.0.4
> Symptoms:
> 1.  Requests per second fall to 0 for all region servers
> 2.  Log files show socket timeout exceptions after waiting for scans of
> 3.  Region servers sometimes eventually show up as dead
> 4.  Once HBase reaches a broken state some regions show up as in a
> transition state indefinitely
> 5.  All of these issues seem to happen around the time of major compaction
> events
> This issue seems to be sensitive to hbase.rpc.timeout which I increased
> significantly but only served to lengthen the amount of time until I see
> socket timeout exceptions.
> A few notes:
> 1.  I don't see massive GC in the gc log.
> 2.  Originally Snappy compression was enabled, but as a test I turned it
> off and it doesn't seem to make any difference in the testing.
> 3.  The WAL is disabled for the table involved in the load
> 4.  TeraSort appears to run normally in HDFS
> 5.  The HBase randomWrite and randomRead tests appear to run normally on
> this cluster (although randomWrite does not write anywhere close to
> 3MB-5MB)
> 6.  Ganglia is available in my environment
> Settings already altered:
> 1.  hbase.rpc.timeout=900000 (I realize this may be too high)
> 2.  hbase.regionserver.handler.count=100
> 3.  ipc.server.max.callqueue.size=10737418240
> 4.  hbase.regionserver.lease.period=900000
> 5.  hbase.hregion.majorcompaction=0 (I have been manually compacting
> between loads with no difference in behavior)
> 6.  hbase.hregion.memstore.flush.size=268435456
> 7.  dfs.datanode.max.xcievers=131072
> 8.  dfs.datanode.handler.count=100
> 9.  ipc.server.listen.queue.size=256
> 10.  -Xmx16384m XX:+UseConcMarkSweepGC -XX:+UseMembar -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/logs/gc.log -Xms16384m
> -XX:PrintFLSStatistics=1 -XX:+CMSParallelRemarkEnabled
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseParNewGC
> 11. I have tried other GC settings but they don't seem to have any real
> impact on GC performance in this case
> Any advice is appreciated.
> Thanks
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message