Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 99527 invoked from network); 28 Jan 2009 22:23:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Jan 2009 22:23:46 -0000 Received: (qmail 77740 invoked by uid 500); 28 Jan 2009 22:23:44 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 77720 invoked by uid 500); 28 Jan 2009 22:23:44 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 77709 invoked by uid 99); 28 Jan 2009 22:23:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Jan 2009 14:23:44 -0800 X-ASF-Spam-Status: No, hits=3.7 required=10.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of slava.gorelik@gmail.com designates 209.85.134.190 as permitted sender) Received: from [209.85.134.190] (HELO mu-out-0910.google.com) (209.85.134.190) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Jan 2009 22:23:37 +0000 Received: by mu-out-0910.google.com with SMTP id w1so7077023mue.9 for ; Wed, 28 Jan 2009 14:23:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=UD8u4jpav2fxwPMztD0hsEc0VqsrGsp6RYSgoP70um8=; b=eISg7RuXhlCf5WdLXa2k6yDj9lowbxK0m5wJVeSoPimmJrqvC9Udk1idSoff90Jw8H R4KX02+1Y+cV/Q3hrL29V04No4YelBpwTQWIcHxlWy2qiYahznoAPxBzoIW6f7svuycf AwFAW2iuYNACHoRr1NlxK8TD/jzxhCTsZltt8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=J3H+YjlQxdRGKGV7M8ykr3JwrdcHTwf6o3Sz2An2Fsi6/mfkZYJd1ahn0XLzYVxhOO 8kHsmmOjEQAQnk+b1n4CZhriSQUaf84seYS8qXf+Sb+iGpVOsMYqBgHiIfoTIbeAQb0C GT5YZwM85KG6HQ4vXibQmUmqhHqad4w31s6t4= MIME-Version: 1.0 Received: by 10.181.146.14 with SMTP id y14mr1316607bkn.16.1233181395908; Wed, 28 Jan 2009 14:23:15 -0800 (PST) In-Reply-To: References: <497F566D.5020507@duboce.net> <162001c980c4$21f4da90$65de8fb0$@com> <163301c980c9$3b3216d0$b1964470$@com> <164101c980d2$7583a8e0$608afaa0$@com> Date: Thu, 29 Jan 2009 00:23:15 +0200 Message-ID: Subject: Re: java.lang.NegativeArraySizeException From: Slava Gorelik To: hbase-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001485f7909e0bf89f0461926b21 X-Virus-Checked: Checked by ClamAV on apache.org --001485f7909e0bf89f0461926b21 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hi. After some investigation i found: 1) There was couple of hardware problems (on 2 from 7 machines) that cause to the cluster be not stable 2) I don't have any cron jobs that are cleaning pid files from the /tmp folder even more i have hadoop storage also in /tmp folder So, it's looks very strange that after some time scripts can't see .pid files. Now the question, from my hadoop understanding, it should be very reliable file system and hardware malfunction of 2 computers from 7 shouldn't cause my cluster to be very corrupted (I mean that after hardware problems were fixed, hadoop safe mode is also stuck on 0.449 ratio with only one datanode report about 49 blocks and other 6 datanodes are stuck on 0 blocks report). To solve this i need to reformat the hadoop (btw very annoying bug, it's not enough to reformat the hadoop, but before i need to remove all hadoop storage folder, otherwise i get wrong index error). So, how could it be that failure of 2 computers from 7 (with replication 3) caused to my cluster to have such a strange behavior. When there will be commodity computers cluster, such problem will occur twice a day or even faster, is it mean that each time i need to reformat the cluster ? It's looks very strange. Thank You for your assistance and Best Regards. On Wed, Jan 28, 2009 at 8:03 AM, Slava Gorelik wrote: > Thank You. > > > On Wed, Jan 28, 2009 at 12:56 AM, Jonathan Gray wrote: > >> Oops, hit send before I finished my msg. >> >> https://issues.apache.org/jira/browse/HADOOP/fixforversion/12313473 >> >> Only 2 blockers and 1 critical left, so could be any day. >> >> JG >> >> > -----Original Message----- >> > From: Slava Gorelik [mailto:slava.gorelik@gmail.com] >> > Sent: Tuesday, January 27, 2009 1:56 PM >> > To: hbase-user@hadoop.apache.org >> > Subject: Re: java.lang.NegativeArraySizeException >> > >> > Thanks , sure I'll upgrade to 0.19.0.Any estimation when Hadoop 0.19.1 >> > will >> > be out ? >> > >> > Best Regards. >> > >> > >> > >> > On Tue, Jan 27, 2009 at 11:50 PM, Jonathan Gray >> > wrote: >> > >> > > Upgrade to HBase 0.19.x. >> > > >> > > But you should probably wait until Hadoop 0.19.1 is released as this >> > > resolves some known issues that can lead to problems with HBase and >> > HDFS. >> > > >> > > > -----Original Message----- >> > > > From: Slava Gorelik [mailto:slava.gorelik@gmail.com] >> > > > Sent: Tuesday, January 27, 2009 1:32 PM >> > > > To: hbase-user@hadoop.apache.org >> > > > Subject: Re: java.lang.NegativeArraySizeException >> > > > >> > > > Hi.Yes, i'm using /tmp folder, but, it's never cleaned, anyway i'll >> > > > check >> > > > this tomorrow. >> > > > >> > > > Thank You for advise, any idea on how can i fix / avoid data >> > corruption >> > > > ? >> > > > >> > > > Best Regards. >> > > > >> > > > >> > > > On Tue, Jan 27, 2009 at 11:13 PM, Jonathan Gray >> > > > wrote: >> > > > >> > > > > Slava, >> > > > > >> > > > > Are you using /tmp as location for hdfs? >> > > > > >> > > > > It seems that you were missing the pid files in /tmp, that's why >> > the >> > > > > scripts >> > > > > didn't properly shut down DNs/NN. >> > > > > >> > > > > You might have cron jobs that are cleaning /tmp so your HDFS >> > block >> > > > files >> > > > > were all deleted. >> > > > > >> > > > > JG >> > > > > >> > > > > > -----Original Message----- >> > > > > > From: Slava Gorelik [mailto:slava.gorelik@gmail.com] >> > > > > > Sent: Tuesday, January 27, 2009 11:04 AM >> > > > > > To: hbase-user@hadoop.apache.org >> > > > > > Subject: Re: java.lang.NegativeArraySizeException >> > > > > > >> > > > > > This is still 0.18.0.I tried to run fsck but it didn't told me >> > > > nothing. >> > > > > > Even more, when i tried to stop the hadoop cluster it told me >> > that >> > > > no >> > > > > > datanodes and no namnodes are alive, but the process were alive >> > and >> > > > > > response >> > > > > > to every hadoop request. After i killed all the processes on >> > all >> > > > > > machines, >> > > > > > hadoop is started but stacked on safe mode. >> > > > > > My hadoop cluster is 7 datanodes and one also is a namenode. So >> > > > only >> > > > > > the >> > > > > > last machine in the cluster reported some blocks and other are >> > > > stuck on >> > > > > > 0 >> > > > > > block reporting. >> > > > > > >> > > > > > Very strange behavior. >> > > > > > >> > > > > > >> > > > > > Thank You. >> > > > > > >> > > > > > >> > > > > > >> > > > > > On Tue, Jan 27, 2009 at 8:46 PM, stack >> > wrote: >> > > > > > >> > > > > > > Is this hbase 0.19.x Slava? Issue looks a little like HBASE- >> > 1135 >> > > > > > only the >> > > > > > > cause seems to be bubbling up from HDFS. Is your HDFS >> > healthy >> > > > (Whats >> > > > > > hadoop >> > > > > > > fsck say?). >> > > > > > > >> > > > > > > Anything happen on this cluster preceding the below uglyness? >> > > > > > > >> > > > > > > St.Ack >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > Slava Gorelik wrote: >> > > > > > > >> > > > > > >> Hi guys. >> > > > > > >> After some not intensive and regular work i have some >> > problems >> > > > in >> > > > > > hbase >> > > > > > >> that >> > > > > > >> looks like data corruption. >> > > > > > >> I started to get this exception, on HMaster >> > > > > > >> 2009-01-27 14:30:16,555 FATAL >> > > > > > org.apache.hadoop.hbase.master.HMaster: Not >> > > > > > >> starting HMaster because: >> > > > > > >> org.apache.hadoop.ipc.RemoteException: java.io.IOException: >> > > > > > >> java.lang.NegativeArraySizeException >> > > > > > >> at >> > > > > > >> >> > > > > > >> >> > > > > > >> > > > >> > org.apache.hadoop.dfs.FSNamesystem.getBlockLocationsInternal(FSNamesyst >> > > > > > em.java:780) >> > > > > > >> at >> > > > > > >> >> > > > > > >> >> > > > > > >> > > > >> > org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java: >> > > > > > 727) >> > > > > > >> at >> > > > > > >> >> > > > > > >> >> > > > > > >> > > > >> > org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java: >> > > > > > 703) >> > > > > > >> at >> > > > > > >> > org.apache.hadoop.dfs.NameNode.getBlockLocations(NameNode.java:257) >> > > > > > >> at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown >> > Source) >> > > > > > >> at >> > > > > > >> >> > > > > > >> >> > > > > > >> > > > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso >> > > > > > rImpl.java:25) >> > > > > > >> at java.lang.reflect.Method.invoke(Method.java:597) >> > > > > > >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) >> > > > > > >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888) >> > > > > > >> >> > > > > > >> >> > > > > > >> On Region Server i have this: >> > > > > > >> 2009-01-27 14:55:26,638 INFO >> > > > > > org.apache.hadoop.hbase.regionserver.HStore: >> > > > > > >> HSTORE_LOGINFOFILE 1028785192/info/5593692610357375495 does >> > not >> > > > > > contain a >> > > > > > >> sequen >> > > > > > >> ce number - ignoring >> > > > > > >> 2009-01-27 14:55:26,662 INFO >> > > > > > org.apache.hadoop.hbase.regionserver.HStore: >> > > > > > >> HSTORE_LOGINFOFILE 1028785192/info/7096195127965906654 does >> > not >> > > > > > contain a >> > > > > > >> sequen >> > > > > > >> ce number - ignoring >> > > > > > >> 2009-01-27 14:55:26,682 ERROR >> > > > > > >> org.apache.hadoop.hbase.regionserver.HRegionServer: error >> > > > opening >> > > > > > region >> > > > > > >> .META.,,1 >> > > > > > >> org.apache.hadoop.ipc.RemoteException: java.io.IOException: >> > > > > > >> java.lang.ArrayIndexOutOfBoundsException: 1 >> > > > > > >> at >> > > > > > >> >> > > > > > >> >> > > > > > >> > > > >> > org.apache.hadoop.dfs.FSNamesystem.getBlockLocationsInternal(FSNamesyst >> > > > > > em.java:789) >> > > > > > >> at >> > > > > > >> >> > > > > > >> >> > > > > > >> > > > >> > org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java: >> > > > > > 727) >> > > > > > >> at >> > > > > > >> >> > > > > > >> >> > > > > > >> > > > >> > org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java: >> > > > > > 703) >> > > > > > >> at >> > > > > > >> >> > > > org.apache.hadoop.dfs.NameNode.getBlockLocations(NameNode.java:257) >> > > > > > >> at >> > sun.reflect.GeneratedMethodAccessor10.invoke(Unknown >> > > > > > Source) >> > > > > > >> at >> > > > > > >> >> > > > > > >> >> > > > > > >> > > > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso >> > > > > > rImpl.java:25) >> > > > > > >> at java.lang.reflect.Method.invoke(Method.java:597) >> > > > > > >> at >> > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) >> > > > > > >> at >> > > > org.apache.hadoop.ipc.Server$Handler.run(Server.java:888) >> > > > > > >> >> > > > > > >> at org.apache.hadoop.ipc.Client.call(Client.java:715) >> > > > > > >> at >> > org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) >> > > > > > >> at >> > > > org.apache.hadoop.dfs.$Proxy1.getBlockLocations(Unknown >> > > > > > Source) >> > > > > > >> at >> > sun.reflect.GeneratedMethodAccessor3.invoke(Unknown >> > > > > > Source) >> > > > > > >> at >> > > > > > >> >> > > > > > >> >> > > > > > >> > > > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso >> > > > > > rImpl.java:25) >> > > > > > >> at java.lang.reflect.Method.invoke(Method.java:597) >> > > > > > >> at >> > > > > > >> >> > > > > > >> >> > > > > > >> > > > >> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInv >> > > > > > ocationHandler.java:82) >> > > > > > >> at >> > > > > > >> >> > > > > > >> >> > > > > > >> > > > >> > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocatio >> > > > > > nHandler.java:59) >> > > > > > >> at >> > > > org.apache.hadoop.dfs.$Proxy1.getBlockLocations(Unknown >> > > > > > Source) >> > > > > > >> at >> > > > > > >> >> > > > > > >> > > > >> > org.apache.hadoop.dfs.DFSClient.callGetBlockLocations(DFSClient.java:29 >> > > > > > 7) >> > > > > > >> >> > > > > > >> Or >> > > > > > >> >> > > > > > >> >> > > > > > >> 2009-01-27 14:45:50,760 INFO org.apache.hadoop.ipc.Server: >> > IPC >> > > > > > Server >> > > > > > >> handler 1 on 60020, call openScanner([B@62fa5ff3, >> > [[B@23b17d49, >> > > > > > >> [B@599855ed, >> > > > > > >> 9223372 >> > > > > > >> 036854775807, null) from 10.26.237.136:36936: error: >> > > > > > >> org.apache.hadoop.hbase.NotServingRegionException: .META.,,1 >> > > > > > >> org.apache.hadoop.hbase.NotServingRegionException: .META.,,1 >> > > > > > >> at >> > > > > > >> >> > > > > > >> >> > > > > > >> > > > >> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer >> > > > > > ver.java:1560) >> > > > > > >> at >> > > > > > >> >> > > > > > >> >> > > > > > >> > > > >> > org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionS >> > > > > > erver.java:1210) >> > > > > > >> at >> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native >> > > > > > Method) >> > > > > > >> at >> > > > > > >> >> > > > > > >> >> > > > > > >> > > > >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.ja >> > > > > > va:39) >> > > > > > >> at >> > > > > > >> >> > > > > > >> >> > > > > > >> > > > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso >> > > > > > rImpl.java:25) >> > > > > > >> at java.lang.reflect.Method.invoke(Method.java:597) >> > > > > > >> at >> > > > > > >> >> > > > org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:554) >> > > > > > >> at >> > > > org.apache.hadoop.ipc.Server$Handler.run(Server.java:888) >> > > > > > >> 2009-01-27 14:56:49,208 INFO >> > > > > > >> org.apache.hadoop.hbase.regionserver.LogRoller: >> > > > > > >> Rolling hlog. Number of entries: 0 >> > > > > > >> >> > > > > > >> It's very annoying :-( >> > > > > > >> >> > > > > > >> Any help ? >> > > > > > >> >> > > > > > >> Thank You and Best Regards. >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > > >> > > > > > > >> > > > > >> > > > > >> > > >> > > >> >> > --001485f7909e0bf89f0461926b21--