hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Slava Gorelik <slava.gore...@gmail.com>
Subject Re: java.lang.NegativeArraySizeException
Date Wed, 28 Jan 2009 22:23:15 GMT
Hi.
After some investigation i found:

1) There was couple of hardware problems (on 2 from 7 machines) that cause
to the cluster be not stable

2) I don't have any cron jobs that are cleaning pid files from the /tmp
folder  even more i have hadoop storage also in /tmp folder
    So, it's looks very strange that after some time scripts can't see .pid
files.

Now the question, from my hadoop understanding,  it should be very reliable
file system and  hardware malfunction of 2 computers from 7 shouldn't cause
my cluster to be very corrupted (I mean that after hardware problems were
fixed, hadoop safe mode is also stuck on 0.449 ratio with only one datanode
report about 49 blocks  and other 6 datanodes are stuck on 0 blocks report).
To solve this i need to reformat the hadoop (btw very annoying bug, it's not
enough to reformat the hadoop, but before i need to remove all hadoop
storage folder, otherwise i get wrong index error). So, how could it be that
failure of 2 computers from 7 (with replication 3) caused to my cluster to
have such a strange behavior. When there will be commodity computers
cluster, such problem will occur twice a day or even faster, is it mean that
each time i need to reformat the cluster ? It's looks very strange.


Thank You for your assistance and Best Regards.






On Wed, Jan 28, 2009 at 8:03 AM, Slava Gorelik <slava.gorelik@gmail.com>wrote:

> Thank You.
>
>
> On Wed, Jan 28, 2009 at 12:56 AM, Jonathan Gray <jlist@streamy.com> wrote:
>
>> Oops, hit send before I finished my msg.
>>
>> https://issues.apache.org/jira/browse/HADOOP/fixforversion/12313473
>>
>> Only 2 blockers and 1 critical left, so could be any day.
>>
>> JG
>>
>> > -----Original Message-----
>> > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
>> > Sent: Tuesday, January 27, 2009 1:56 PM
>> > To: hbase-user@hadoop.apache.org
>> > Subject: Re: java.lang.NegativeArraySizeException
>> >
>> > Thanks , sure I'll upgrade to 0.19.0.Any estimation when Hadoop 0.19.1
>> > will
>> > be out ?
>> >
>> > Best Regards.
>> >
>> >
>> >
>> > On Tue, Jan 27, 2009 at 11:50 PM, Jonathan Gray <jlist@streamy.com>
>> > wrote:
>> >
>> > > Upgrade to HBase 0.19.x.
>> > >
>> > > But you should probably wait until Hadoop 0.19.1 is released as this
>> > > resolves some known issues that can lead to problems with HBase and
>> > HDFS.
>> > >
>> > > > -----Original Message-----
>> > > > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
>> > > > Sent: Tuesday, January 27, 2009 1:32 PM
>> > > > To: hbase-user@hadoop.apache.org
>> > > > Subject: Re: java.lang.NegativeArraySizeException
>> > > >
>> > > > Hi.Yes, i'm using /tmp folder, but, it's never cleaned, anyway i'll
>> > > > check
>> > > > this tomorrow.
>> > > >
>> > > > Thank You for advise, any idea on how can i fix / avoid data
>> > corruption
>> > > > ?
>> > > >
>> > > > Best Regards.
>> > > >
>> > > >
>> > > > On Tue, Jan 27, 2009 at 11:13 PM, Jonathan Gray <jlist@streamy.com>
>> > > > wrote:
>> > > >
>> > > > > Slava,
>> > > > >
>> > > > > Are you using /tmp as location for hdfs?
>> > > > >
>> > > > > It seems that you were missing the pid files in /tmp, that's
why
>> > the
>> > > > > scripts
>> > > > > didn't properly shut down DNs/NN.
>> > > > >
>> > > > > You might have cron jobs that are cleaning /tmp so your HDFS
>> > block
>> > > > files
>> > > > > were all deleted.
>> > > > >
>> > > > > JG
>> > > > >
>> > > > > > -----Original Message-----
>> > > > > > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
>> > > > > > Sent: Tuesday, January 27, 2009 11:04 AM
>> > > > > > To: hbase-user@hadoop.apache.org
>> > > > > > Subject: Re: java.lang.NegativeArraySizeException
>> > > > > >
>> > > > > > This is still 0.18.0.I tried to run fsck but it didn't told
me
>> > > > nothing.
>> > > > > > Even more, when i tried to stop the hadoop cluster it told
me
>> > that
>> > > > no
>> > > > > > datanodes and no namnodes are alive, but the process were
alive
>> > and
>> > > > > > response
>> > > > > > to every hadoop request. After i killed all the processes
on
>> > all
>> > > > > > machines,
>> > > > > > hadoop is started but stacked on safe mode.
>> > > > > > My hadoop cluster is 7 datanodes and one also is a namenode.
So
>> > > > only
>> > > > > > the
>> > > > > > last machine in the cluster reported some blocks and other
are
>> > > > stuck on
>> > > > > > 0
>> > > > > > block reporting.
>> > > > > >
>> > > > > > Very strange behavior.
>> > > > > >
>> > > > > >
>> > > > > > Thank You.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Tue, Jan 27, 2009 at 8:46 PM, stack <stack@duboce.net>
>> > wrote:
>> > > > > >
>> > > > > > > Is this hbase 0.19.x Slava?  Issue looks a little like
HBASE-
>> > 1135
>> > > > > > only the
>> > > > > > > cause seems to be bubbling up from HDFS.  Is your HDFS
>> > healthy
>> > > > (Whats
>> > > > > > hadoop
>> > > > > > > fsck say?).
>> > > > > > >
>> > > > > > > Anything happen on this cluster preceding the below
uglyness?
>> > > > > > >
>> > > > > > > St.Ack
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > Slava Gorelik wrote:
>> > > > > > >
>> > > > > > >> Hi guys.
>> > > > > > >> After some not intensive and regular work i have
some
>> > problems
>> > > > in
>> > > > > > hbase
>> > > > > > >> that
>> > > > > > >> looks like data corruption.
>> > > > > > >> I started to get this exception, on HMaster
>> > > > > > >> 2009-01-27 14:30:16,555 FATAL
>> > > > > > org.apache.hadoop.hbase.master.HMaster: Not
>> > > > > > >> starting HMaster because:
>> > > > > > >> org.apache.hadoop.ipc.RemoteException: java.io.IOException:
>> > > > > > >> java.lang.NegativeArraySizeException
>> > > > > > >> at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.dfs.FSNamesystem.getBlockLocationsInternal(FSNamesyst
>> > > > > > em.java:780)
>> > > > > > >> at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:
>> > > > > > 727)
>> > > > > > >> at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:
>> > > > > > 703)
>> > > > > > >> at
>> > > > > >
>> > org.apache.hadoop.dfs.NameNode.getBlockLocations(NameNode.java:257)
>> > > > > > >> at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown
>> > Source)
>> > > > > > >> at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
>> > > > > > rImpl.java:25)
>> > > > > > >> at java.lang.reflect.Method.invoke(Method.java:597)
>> > > > > > >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
>> > > > > > >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
>> > > > > > >>
>> > > > > > >>
>> > > > > > >> On Region Server i have this:
>> > > > > > >> 2009-01-27 14:55:26,638 INFO
>> > > > > > org.apache.hadoop.hbase.regionserver.HStore:
>> > > > > > >> HSTORE_LOGINFOFILE 1028785192/info/5593692610357375495
does
>> > not
>> > > > > > contain a
>> > > > > > >> sequen
>> > > > > > >> ce number - ignoring
>> > > > > > >> 2009-01-27 14:55:26,662 INFO
>> > > > > > org.apache.hadoop.hbase.regionserver.HStore:
>> > > > > > >> HSTORE_LOGINFOFILE 1028785192/info/7096195127965906654
does
>> > not
>> > > > > > contain a
>> > > > > > >> sequen
>> > > > > > >> ce number - ignoring
>> > > > > > >> 2009-01-27 14:55:26,682 ERROR
>> > > > > > >> org.apache.hadoop.hbase.regionserver.HRegionServer:
error
>> > > > opening
>> > > > > > region
>> > > > > > >> .META.,,1
>> > > > > > >> org.apache.hadoop.ipc.RemoteException: java.io.IOException:
>> > > > > > >> java.lang.ArrayIndexOutOfBoundsException: 1
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.dfs.FSNamesystem.getBlockLocationsInternal(FSNamesyst
>> > > > > > em.java:789)
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:
>> > > > > > 727)
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:
>> > > > > > 703)
>> > > > > > >>        at
>> > > > > > >>
>> > > > org.apache.hadoop.dfs.NameNode.getBlockLocations(NameNode.java:257)
>> > > > > > >>        at
>> > sun.reflect.GeneratedMethodAccessor10.invoke(Unknown
>> > > > > > Source)
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
>> > > > > > rImpl.java:25)
>> > > > > > >>        at java.lang.reflect.Method.invoke(Method.java:597)
>> > > > > > >>        at
>> > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
>> > > > > > >>        at
>> > > > org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
>> > > > > > >>
>> > > > > > >>        at org.apache.hadoop.ipc.Client.call(Client.java:715)
>> > > > > > >>        at
>> > org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>> > > > > > >>        at
>> > > > org.apache.hadoop.dfs.$Proxy1.getBlockLocations(Unknown
>> > > > > > Source)
>> > > > > > >>        at
>> > sun.reflect.GeneratedMethodAccessor3.invoke(Unknown
>> > > > > > Source)
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
>> > > > > > rImpl.java:25)
>> > > > > > >>        at java.lang.reflect.Method.invoke(Method.java:597)
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInv
>> > > > > > ocationHandler.java:82)
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocatio
>> > > > > > nHandler.java:59)
>> > > > > > >>        at
>> > > > org.apache.hadoop.dfs.$Proxy1.getBlockLocations(Unknown
>> > > > > > Source)
>> > > > > > >>        at
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.dfs.DFSClient.callGetBlockLocations(DFSClient.java:29
>> > > > > > 7)
>> > > > > > >>
>> > > > > > >> Or
>> > > > > > >>
>> > > > > > >>
>> > > > > > >> 2009-01-27 14:45:50,760 INFO org.apache.hadoop.ipc.Server:
>> > IPC
>> > > > > > Server
>> > > > > > >> handler 1 on 60020, call openScanner([B@62fa5ff3,
>> > [[B@23b17d49,
>> > > > > > >> [B@599855ed,
>> > > > > > >> 9223372
>> > > > > > >> 036854775807, null) from 10.26.237.136:36936: error:
>> > > > > > >> org.apache.hadoop.hbase.NotServingRegionException:
.META.,,1
>> > > > > > >> org.apache.hadoop.hbase.NotServingRegionException:
.META.,,1
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
>> > > > > > ver.java:1560)
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionS
>> > > > > > erver.java:1210)
>> > > > > > >>        at
>> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> > > > > > Method)
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.ja
>> > > > > > va:39)
>> > > > > > >>        at
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > >
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
>> > > > > > rImpl.java:25)
>> > > > > > >>        at java.lang.reflect.Method.invoke(Method.java:597)
>> > > > > > >>        at
>> > > > > > >>
>> > > > org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:554)
>> > > > > > >>        at
>> > > > org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
>> > > > > > >> 2009-01-27 14:56:49,208 INFO
>> > > > > > >> org.apache.hadoop.hbase.regionserver.LogRoller:
>> > > > > > >> Rolling hlog. Number of entries: 0
>> > > > > > >>
>> > > > > > >> It's very annoying :-(
>> > > > > > >>
>> > > > > > >> Any help ?
>> > > > > > >>
>> > > > > > >> Thank You and Best Regards.
>> > > > > > >>
>> > > > > > >>
>> > > > > > >>
>> > > > > > >
>> > > > > > >
>> > > > >
>> > > > >
>> > >
>> > >
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message