hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Washusen <...@reactive.org>
Subject Re: Failures after a few hours of heavy load
Date Sat, 20 Feb 2010 20:58:48 GMT
Sorry I can't be more helpful but just to double check it's not a file
limits issue could you run the following on each of the hosts:

$ ulimit -a
$ lsof | wc -l

The first command will show you (among other things) the file limits, it
should be above the default 1024.  The second will tell you have many files
are currently open...

Cheers,
Dan

On 21 February 2010 03:14, Rod Cope <rod.cope@openlogic.com> wrote:

> I¹ve been loading some large data sets over the last week or so, but keep
> running into failures between 4 and 15 hours into the process.  I¹ve wiped
> HBase and/or HDFS a few times hoping that would help, but it hasn¹t.
>
> I¹ve implemented all the recommendations for increasing file limits and the
> like on the troubleshooting wiki page.  There¹s plenty of free disk space
> and memory with no swap being used on any of the 9 machines in the cluster.
> All 9 boxes run a managed ZK, regionserver, datanode, and MR jobs loading
> data from HDFS and NFS-mounted disk into HBase.  Doing a zk_dump shows an
> average of 1 for all machines with the highest max being 621.  The
> regionserver having trouble varies from load to load, so the problem
> doesn¹t
> appear to be machine-specific.
>
> You can see in the logs below that a compaction is started which leads to a
> LeaseExpiredException: File does not exist (I¹ve done a hadoop ­get and
> it¹s
> really not there).  Then an Error Recovery for a block, compaction/split
> fail, ³Premeture EOF from inputStream²,  ³No live nodes contain current
> block², and finally ³Cannot open filename².  At this point, there¹s a
> meltdown where the vast majority of the rest of the log is filled with
> exceptions like these back to back.  The regionserver doesn¹t go down,
> however.
>
> I¹m on the released HBase 0.20.3 with Hadoop 0.20.2 as of yesterday (RC4).
> I upgraded Hadoop from 0.20.1 hoping that would help some of the problems
> I¹ve been having, but it only seemed to change the details of the
> exceptions
> and not the results.  Once I upgraded to Hadoop 0.20.2, I replaced HBase's
> hadoop-0.20.1-hdfs127-core.jar in lib with the new hadoop-0.20.2-core.jar.
>
> Any ideas?  I¹m really under the gun to get this data loaded, so any
> workarounds or other recommendations are much appreciated.
>
> Thanks,
> Rod
>
> ----
>
> Here¹s a link to the logs below in case they¹re not easy to read:
> http://pastebin.com/d7907bca
>
>
> 2010-02-19 21:59:24,950 DEBUG
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
> requested for region
> files,nerdpass\x7Chttp://nerdpass.googlecode.com/svn/trunk
> \x7Csrc/svn/n/ne/n
>
> erdpass/application/library/Zend/Server/Reflection/Method.php,1266641963606/
> 25429292 because: Region has references on open
> 2010-02-19 21:59:24,950 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Starting compaction on region
> files,nerdpass\x7Chttp://nerdpass.googlecode.com/svn/trunk
> \x7Csrc/svn/n/ne/n
> erdpass/application/library/Zend/Server/Reflection/Method.php,1266641963606
> 2010-02-19 21:59:24,953 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> Started compaction of 4 file(s), hasReferences=true, into
> /hbase/files/compaction.dir/25429292, seqid=2811972
> 2010-02-19 21:59:27,992 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
> Exception: org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
> /hbase/files/compaction.dir/25429292/2021896477663224037 File does not
> exist. [Lease.  Holder: DFSClient_-1386101021, pendingcreates: 1]
>        at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.
> java:1332)
>      (...rest of stack trace...)
> 2010-02-19 21:59:27,992 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_2006633705539782284_253567 bad datanode[0] nodes ==
> null
> 2010-02-19 21:59:27,992 WARN org.apache.hadoop.hdfs.DFSClient: Could not
> get
> block locations. Source file
> "/hbase/files/compaction.dir/25429292/2021896477663224037" - Aborting...
> 2010-02-19 21:59:27,997 ERROR
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split
> failed for region
> files,nerdpass\x7Chttp://nerdpass.googlecode.com/svn/trunk
> \x7Csrc/svn/n/ne/n
> erdpass/application/library/Zend/Server/Reflection/Method.php,1266641963606
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
> /hbase/files/compaction.dir/25429292/2021896477663224037 File does not
> exist. [Lease.  Holder: DFSClient_-1386101021, pendingcreates: 1]
>        at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.
> java:1332)
>      (...rest of stack trace...)
> 2010-02-19 22:00:23,627 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
> Total=624.38275MB (654712760), Free=172.29224MB (180661512), Max=796.675MB
> (835374272), Counts: Blocks=9977, Access=3726192, Hit=2782447, Miss=943745,
> Evictions=67, Evicted=85131, Ratios: Hit Ratio=74.67266917228699%, Miss
> Ratio=25.327330827713013%, Evicted/Run=1270.6119384765625
> 2010-02-19 22:00:41,978 INFO org.apache.hadoop.hdfs.DFSClient: Could not
> obtain block blk_-5162944092610390422_253522 from any node:
> java.io.IOException: No live nodes contain current block
> 2010-02-19 22:00:44,990 INFO org.apache.hadoop.hdfs.DFSClient: Could not
> obtain block blk_-5162944092610390422_253522 from any node:
> java.io.IOException: No live nodes contain current block
> 2010-02-19 22:00:47,994 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read:
> java.io.IOException: Cannot open filename
> /hbase/files/929080390/metadata/6217150884710004337
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1497
> )
>      (...rest of stack trace...)
> 2010-02-19 22:00:47,994 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException:
> Premeture EOF from inputStream
>        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102)
>      (...rest of stack trace...)
> 2010-02-19 22:00:47,995 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 76 on 60020, call get([B@3a73f53,
> row=netbeans|
> https://olex.openlogic.com/packages/netbeans|src/archive/n/ne/n
> etbeans/5.0/netbeans-5.0-src/apisupport/l10n.list, maxVersions=1,
> timeRange=[0,9223372036854775807), families={(family=metadata,
> columns={updated_at}}) from 192.168.60.106:45445: error:
> java.io.IOException: Premeture EOF from inputStream
> java.io.IOException: Premeture EOF from inputStream
>      (...rest of stack trace...)
> 2010-02-19 22:00:49,009 INFO org.apache.hadoop.hdfs.DFSClient: Could not
> obtain block blk_-5162944092610390422_253522 from any node:
> java.io.IOException: No live nodes contain current block
> 2010-02-19 22:00:52,019 INFO org.apache.hadoop.hdfs.DFSClient: Could not
> obtain block blk_-5162944092610390422_253522 from any node:
> java.io.IOException: No live nodes contain current block
> 2010-02-19 22:00:54,514 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Flush requested on
> files,python\x7Chttps://olex.openlogic.com/packages/python
> \x7Csrc/archive/p/
> py/python/2.4.6/python-2.4.6-src/Modules/_csv.c,1266641716429
> 2010-02-19 22:00:54,520 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Started memstore flush for region
> files,python\x7Chttps://olex.openlogic.com/packages/python
> \x7Csrc/archive/p/
> py/python/2.4.6/python-2.4.6-src/Modules/_csv.c,1266641716429. Current
> region memstore size 64.1m
> 2010-02-19 22:00:54,911 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> Added hdfs://dd01:54310/hbase/files/1086732894/content/9096973985255757264,
> entries=4486, sequenceid=2812095, memsize=29.5m, filesize=10.8m to
> files,python\x7Chttps://olex.openlogic.com/packages/python
> \x7Csrc/archive/p/
> py/python/2.4.6/python-2.4.6-src/Modules/_csv.c,1266641716429
> 2010-02-19 22:00:54,987 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> Added
> hdfs://dd01:54310/hbase/files/1086732894/metadata/3183633054937023200,
> entries=28453, sequenceid=2812095, memsize=8.2m, filesize=638.5k to
> files,python\x7Chttps://olex.openlogic.com/packages/python
> \x7Csrc/archive/p/
> py/python/2.4.6/python-2.4.6-src/Modules/_csv.c,1266641716429
> 2010-02-19 22:00:55,022 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read:
> java.io.IOException: Cannot open filename
> /hbase/files/929080390/metadata/6217150884710004337
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1497
> )
>      (...rest of stack trace...)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message