Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of slava.gorelik@gmail.com
 designates 209.85.134.190 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=J3H+YjlQxdRGKGV7M8ykr3JwrdcHTwf6o3Sz2An2Fsi6/mfkZYJd1ahn0XLzYVxhOO
         8kHsmmOjEQAQnk+b1n4CZhriSQUaf84seYS8qXf+Sb+iGpVOsMYqBgHiIfoTIbeAQb0C
         GT5YZwM85KG6HQ4vXibQmUmqhHqad4w31s6t4=
MIME-Version: 1.0
In-Reply-To: <fdc46e690901272203v66413581h6244c1e95b4eba8a@mail.gmail.com>
References: <fdc46e690901270458x57c2f964x787dd1f78d798cc1@mail.gmail.com>
	 <497F566D.5020507@duboce.net>
	 <fdc46e690901271103m647ada5ew3d6155d68980dd5f@mail.gmail.com>
	 <162001c980c4$21f4da90$65de8fb0$@com>
	 <fdc46e690901271332h4d60a468sf59a48940eff02f6@mail.gmail.com>
	 <163301c980c9$3b3216d0$b1964470$@com>
	 <fdc46e690901271355s249ce09bmec30a0b07fc31c6d@mail.gmail.com>
	 <164101c980d2$7583a8e0$608afaa0$@com>
	 <fdc46e690901272203v66413581h6244c1e95b4eba8a@mail.gmail.com>
Date: Thu, 29 Jan 2009 00:23:15 +0200
Message-ID: <fdc46e690901281423g702ab15ejbf87ca09b7d4ebe8@mail.gmail.com>
Subject: Re: java.lang.NegativeArraySizeException
From: Slava Gorelik <slava.gorelik@gmail.com>
To: hbase-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001485f7909e0bf89f0461926b21

--001485f7909e0bf89f0461926b21
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Hi.
After some investigation i found:

1) There was couple of hardware problems (on 2 from 7 machines) that cause
to the cluster be not stable

2) I don't have any cron jobs that are cleaning pid files from the /tmp
folder  even more i have hadoop storage also in /tmp folder
    So, it's looks very strange that after some time scripts can't see .pid
files.

Now the question, from my hadoop understanding,  it should be very reliable
file system and  hardware malfunction of 2 computers from 7 shouldn't cause
my cluster to be very corrupted (I mean that after hardware problems were
fixed, hadoop safe mode is also stuck on 0.449 ratio with only one datanode
report about 49 blocks  and other 6 datanodes are stuck on 0 blocks report).
To solve this i need to reformat the hadoop (btw very annoying bug, it's not
enough to reformat the hadoop, but before i need to remove all hadoop
storage folder, otherwise i get wrong index error). So, how could it be that
failure of 2 computers from 7 (with replication 3) caused to my cluster to
have such a strange behavior. When there will be commodity computers
cluster, such problem will occur twice a day or even faster, is it mean that
each time i need to reformat the cluster ? It's looks very strange.


Thank You for your assistance and Best Regards.


On Wed, Jan 28, 2009 at 8:03 AM, Slava Gorelik <slava.gorelik@gmail.com>wrote:

> Thank You.
>
>
> On Wed, Jan 28, 2009 at 12:56 AM, Jonathan Gray <jlist@streamy.com> wrote:
>
>> Oops, hit send before I finished my msg.
>>
>> https://issues.apache.org/jira/browse/HADOOP/fixforversion/12313473
>>
>> Only 2 blockers and 1 critical left, so could be any day.
>>
>> JG
>>
>> > -----Original Message-----
>> > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
>> > Sent: Tuesday, January 27, 2009 1:56 PM
>> > To: hbase-user@hadoop.apache.org
>> > Subject: Re: java.lang.NegativeArraySizeException
>> >
>> > Thanks , sure I'll upgrade to 0.19.0.Any estimation when Hadoop 0.19.1
>> > will
>> > be out ?
>> >
>> > Best Regards.
>> >
>> >
>> >
>> > On Tue, Jan 27, 2009 at 11:50 PM, Jonathan Gray <jlist@streamy.com>
>> > wrote:
>> >
>> > > Upgrade to HBase 0.19.x.
>> > >
>> > > But you should probably wait until Hadoop 0.19.1 is released as this
>> > > resolves some known issues that can lead to problems with HBase and
>> > HDFS.
>> > >
>> > > > -----Original Message-----
>> > > > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
>> > > > Sent: Tuesday, January 27, 2009 1:32 PM
>> > > > To: hbase-user@hadoop.apache.org
>> > > > Subject: Re: java.lang.NegativeArraySizeException
>> > > >
>> > > > Hi.Yes, i'm using /tmp folder, but, it's never cleaned, anyway i'll
>> > > > check
>> > > > this tomorrow.
>> > > >
>> > > > Thank You for advise, any idea on how can i fix / avoid data
>> > corruption
>> > > > ?
>> > > >
>> > > > Best Regards.
>> > > >
>> > > >
>> > > > On Tue, Jan 27, 2009 at 11:13 PM, Jonathan Gray <jlist@streamy.com>
>> > > > wrote:
>> > > >
>> > > > > Slava,
>> > > > >
>> > > > > Are you using /tmp as location for hdfs?
>> > > > >
>> > > > > It seems that you were missing the pid files in /tmp, that's why
>> > the
>> > > > > scripts
>> > > > > didn't properly shut down DNs/NN.
>> > > > >
>> > > > > You might have cron jobs that are cleaning /tmp so your HDFS
>> > block
>> > > > files
>> > > > > were all deleted.
>> > > > >
>> > > > > JG
>> > > > >
>> > > > > > -----Original Message-----
>> > > > > > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
>> > > > > > Sent: Tuesday, January 27, 2009 11:04 AM
>> > > > > > To: hbase-user@hadoop.apache.org
>> > > > > > Subject: Re: java.lang.NegativeArraySizeException
>> > > > > >
>> > > > > > This is still 0.18.0.I tried to run fsck but it didn't told me
>> > > > nothing.
>> > > > > > Even more, when i tried to stop the hadoop cluster it told me
>> > that
>> > > > no
>> > > > > > datanodes and no namnodes are alive, but the process were alive
>> > and
>> > > > > > response
>> > > > > > to every hadoop request. After i killed all the processes on
>> > all
>> > > > > > machines,
>> > > > > > hadoop is started but stacked on safe mode.
>> > > > > > My hadoop cluster is 7 datanodes and one also is a namenode. So
>> > > > only
>> > > > > > the
>> > > > > > last machine in the cluster reported some blocks and other are
>> > > > stuck on
>> > > > > > 0
>> > > > > > block reporting.
>> > > > > >
>> > > > > > Very strange behavior.
>> > > > > >
>> > > > > >
>> > > > > > Thank You.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Tue, Jan 27, 2009 at 8:46 PM, stack <stack@duboce.net>
>> > wrote:
>> > > > > >
>> > > > > > > Is this hbase 0.19.x Slava?  Issue looks a little like HBASE-
>> > 1135
>> > > > > > only the
>> > > > > > > cause seems to be bubbling up from HDFS.  Is your HDFS
>> > healthy
>> > > > (Whats
>> > > > > > hadoop
>> > > > > > > fsck say?).
>> > > > > > >
>> > > > > > > Anything happen on this cluster preceding the below uglyness?
>> > > > > > >
>> > > > > > > St.Ack
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > Slava Gorelik wrote:
>> > > > > > >
>> > > > > > >> Hi guys.
>> > > > > > >> After some not intensive and regular work i have some
>> > problems
>> > > > in
>> > > > > > hbase
>> > > > > > >> that
>> > > > > > >> looks like data corruption.
>> > > > > > >> I started to get this exception, on HMaster
>> > > > > > >> 2009-01-27 14:30:16,555 FATAL
>> > > > > > org.apache.hadoop.hbase.master.HMaster: Not
>> > > > > > >> starting HMaster because:
>> > > > > > >> org.apache.hadoop.ipc.RemoteException: java.io.IOException:
>> > > > > > >> java.lang.NegativeArraySizeException
>> > > > > > >> at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.dfs.FSNamesystem.getBlockLocationsInternal(FSNamesyst
>> > > > > > em.java:780)
>> > > > > > >> at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:
>> > > > > > 727)
>> > > > > > >> at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:
>> > > > > > 703)
>> > > > > > >> at
>> > > > > >
>> > org.apache.hadoop.dfs.NameNode.getBlockLocations(NameNode.java:257)
>> > > > > > >> at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown
>> > Source)
>> > > > > > >> at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
>> > > > > > rImpl.java:25)
>> > > > > > >> at java.lang.reflect.Method.invoke(Method.java:597)
>> > > > > > >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
>> > > > > > >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
>> > > > > > >>
>> > > > > > >>
>> > > > > > >> On Region Server i have this:
>> > > > > > >> 2009-01-27 14:55:26,638 INFO
>> > > > > > org.apache.hadoop.hbase.regionserver.HStore:
>> > > > > > >> HSTORE_LOGINFOFILE 1028785192/info/5593692610357375495 does
>> > not
>> > > > > > contain a
>> > > > > > >> sequen
>> > > > > > >> ce number - ignoring
>> > > > > > >> 2009-01-27 14:55:26,662 INFO
>> > > > > > org.apache.hadoop.hbase.regionserver.HStore:
>> > > > > > >> HSTORE_LOGINFOFILE 1028785192/info/7096195127965906654 does
>> > not
>> > > > > > contain a
>> > > > > > >> sequen
>> > > > > > >> ce number - ignoring
>> > > > > > >> 2009-01-27 14:55:26,682 ERROR
>> > > > > > >> org.apache.hadoop.hbase.regionserver.HRegionServer: error
>> > > > opening
>> > > > > > region
>> > > > > > >> .META.,,1
>> > > > > > >> org.apache.hadoop.ipc.RemoteException: java.io.IOException:
>> > > > > > >> java.lang.ArrayIndexOutOfBoundsException: 1
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.dfs.FSNamesystem.getBlockLocationsInternal(FSNamesyst
>> > > > > > em.java:789)
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:
>> > > > > > 727)
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:
>> > > > > > 703)
>> > > > > > >>        at
>> > > > > > >>
>> > > > org.apache.hadoop.dfs.NameNode.getBlockLocations(NameNode.java:257)
>> > > > > > >>        at
>> > sun.reflect.GeneratedMethodAccessor10.invoke(Unknown
>> > > > > > Source)
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
>> > > > > > rImpl.java:25)
>> > > > > > >>        at java.lang.reflect.Method.invoke(Method.java:597)
>> > > > > > >>        at
>> > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
>> > > > > > >>        at
>> > > > org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
>> > > > > > >>
>> > > > > > >>        at org.apache.hadoop.ipc.Client.call(Client.java:715)
>> > > > > > >>        at
>> > org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>> > > > > > >>        at
>> > > > org.apache.hadoop.dfs.$Proxy1.getBlockLocations(Unknown
>> > > > > > Source)
>> > > > > > >>        at
>> > sun.reflect.GeneratedMethodAccessor3.invoke(Unknown
>> > > > > > Source)
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
>> > > > > > rImpl.java:25)
>> > > > > > >>        at java.lang.reflect.Method.invoke(Method.java:597)
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInv
>> > > > > > ocationHandler.java:82)
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocatio
>> > > > > > nHandler.java:59)
>> > > > > > >>        at
>> > > > org.apache.hadoop.dfs.$Proxy1.getBlockLocations(Unknown
>> > > > > > Source)
>> > > > > > >>        at
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.dfs.DFSClient.callGetBlockLocations(DFSClient.java:29
>> > > > > > 7)
>> > > > > > >>
>> > > > > > >> Or
>> > > > > > >>
>> > > > > > >>
>> > > > > > >> 2009-01-27 14:45:50,760 INFO org.apache.hadoop.ipc.Server:
>> > IPC
>> > > > > > Server
>> > > > > > >> handler 1 on 60020, call openScanner([B@62fa5ff3,
>> > [[B@23b17d49,
>> > > > > > >> [B@599855ed,
>> > > > > > >> 9223372
>> > > > > > >> 036854775807, null) from 10.26.237.136:36936: error:
>> > > > > > >> org.apache.hadoop.hbase.NotServingRegionException: .META.,,1
>> > > > > > >> org.apache.hadoop.hbase.NotServingRegionException: .META.,,1
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
>> > > > > > ver.java:1560)
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionS
>> > > > > > erver.java:1210)
>> > > > > > >>        at
>> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> > > > > > Method)
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.ja
>> > > > > > va:39)
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
>> > > > > > rImpl.java:25)
>> > > > > > >>        at java.lang.reflect.Method.invoke(Method.java:597)
>> > > > > > >>        at
>> > > > > > >>
>> > > > org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:554)
>> > > > > > >>        at
>> > > > org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
>> > > > > > >> 2009-01-27 14:56:49,208 INFO
>> > > > > > >> org.apache.hadoop.hbase.regionserver.LogRoller:
>> > > > > > >> Rolling hlog. Number of entries: 0
>> > > > > > >>
>> > > > > > >> It's very annoying :-(
>> > > > > > >>
>> > > > > > >> Any help ?
>> > > > > > >>
>> > > > > > >> Thank You and Best Regards.
>> > > > > > >>
>> > > > > > >>
>> > > > > > >>
>> > > > > > >
>> > > > > > >
>> > > > >
>> > > > >
>> > >
>> > >
>>
>>
>

--001485f7909e0bf89f0461926b21--