hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Lamberger <daniel.lamber...@gigya-inc.com>
Subject ArrayIndexOutOfBoundsException in 0.90.7-SNAPSHOT
Date Tue, 27 Mar 2012 16:07:17 GMT
Hello,

We recently migrated to 0.90.7-SNAPSHOT, and are encountering the above
exception, which seems to fail various HBase operations.

How it came to be:

* We upgraded from 0.90.4 to 0.90.7, however not all slaves were restarted,
i.e. we ran slaves from different versions for a couple of days.

* We tried disabling a table and that operation locked up, with the
following recurring errors in the log file:

2012-03-27 09:11:54,402 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Region has been
PENDING_CLOSE for too long, running forced unassign again on region=...
2012-03-27 09:12:04,404 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
timed out: ... state=PENDING_CLOSE, ts=1332853083660

* We restarted the cluster and the table we previously tried to disable was
now marked as disabled. When trying to re-enable it, the operation failed.
The log:

hbase.master.handler.EnableTableHandler: Attemping to enable the table
api_status
hbase.master.handler.EnableTableHandler: Table has 7 regions of which 7 are
online.
hbase.zookeeper.ZKAssign: master:60000-0x23004a31d9083df Creating (or
updating) unassigned node for 1572f94d627fe784eb5653d6f32378c8 with OFFLINE
state
hbase.zookeeper.ZKAssign: master:60000-0x23004a31d9083df Creating (or
updating) unassigned node for 2ebf932e9bf7c438db3144b892918d08 with OFFLINE
state
hbase.zookeeper.ZKAssign: master:60000-0x23004a31d9083df Creating (or
updating) unassigned node for 499064a3f5de2b6b11144c3f5d4c8060 with OFFLINE
state
hbase.zookeeper.ZKAssign: master:60000-0x23004a31d9083df Creating (or
updating) unassigned node for a9dea3db85a4219057ae71a79ad92c8c with OFFLINE
state
hbase.zookeeper.ZKAssign: master:60000-0x23004a31d9083df Creating (or
updating) unassigned node for 8373f4db3d61a8e2ea209b2fdebd4c33 with OFFLINE
state
hbase.zookeeper.ZKAssign: master:60000-0x23004a31d9083df Creating (or
updating) unassigned node for 50d12796747cdc5c188589b6ed47d485 with OFFLINE
state
hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE,
server=hadoop1-hbm1.XXX, region=1572f94d627fe784eb5653d6f32378c8
hbase.master.AssignmentManager: No previous transition plan was found (or
we are ignoring an existing plan) for
api_status,,1332438613488.1572f94d627fe784eb5653d6f32378c8. so generated a
random one;
hri=api_status,,1332438613488.1572f94d627fe784eb5653d6f32378c8., src=,
dest=hadoop1-s02.XXX,60020,1332856612196; 10 (online=10, exclude=null)
available servers

hbase.master.AssignmentManager: Assigning region
api_status,,1332438613488.1572f94d627fe784eb5653d6f32378c8. to
hadoop1-s02.XXX,60020,1332856612196
hbase.master.AssignmentManager: No previous transition plan was found (or
we are ignoring an existing plan) for
api_status,XXX,1332440559822.2ebf932e9bf7c438db3144b892918d08. so generated
a random one;
hri=api_status,XXX,1332440559822.2ebf932e9bf7c438db3144b892918d08., src=,
dest=hadoop1-s02.XXX,60020,1332856612196; 10 (online=10, exclude=null)
available servers

(those couple of lines are repeated for the rest of the slaves)

hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING,
server=hadoop1-s05.XXX,60020,1332857174731,
region=499064a3f5de2b6b11144c3f5d4c8060
hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING,
server=hadoop1-s05.XXX,60020,1332857174731,
region=a9dea3db85a4219057ae71a79ad92c8c
hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING,
server=hadoop1-s02.XXX,60020,1332856612196,
region=1572f94d627fe784eb5653d6f32378c8
hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING,
server=hadoop1-s05.XXX,60020,1332857174731,
region=499064a3f5de2b6b11144c3f5d4c8060
hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED,
server=hadoop1-s02.XXX,60020,1332856612196,
region=2ebf932e9bf7c438db3144b892918d08
hbase.master.handler.OpenedRegionHandler: Handling OPENED event for
2ebf932e9bf7c438db3144b892918d08; deleting unassigned node
hbase.zookeeper.ZKAssign: master:60000-0x23004a31d9083df Deleting existing
unassigned node for 2ebf932e9bf7c438db3144b892918d08 that is in expected
state RS_ZK_REGION_OPENED
hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING,
server=hadoop1-s05.XXX,60020,1332857174731,
region=a9dea3db85a4219057ae71a79ad92c8c
hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING,
server=hadoop1-s01.XXX,60020,1332856427347,
region=8373f4db3d61a8e2ea209b2fdebd4c33
hbase.zookeeper.ZKAssign: master:60000-0x23004a31d9083df Successfully
deleted unassigned node for region 2ebf932e9bf7c438db3144b892918d08 in
expected state RS_ZK_REGION_OPENED
hbase.master.handler.OpenedRegionHandler: Opened region
api_status,XXX,1332440559822.2ebf932e9bf7c438db3144b892918d08. on
hadoop1-s02.XXX,60020,1332856612196
hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING,
server=hadoop1-s06.XXX,60020,1332857364660,
region=50d12796747cdc5c188589b6ed47d485
hbase.master.AssignmentManager: Regions in transition timed out:
 gs_users,5351402|fWTtXMEa2WXHOo01esutJA==,1330321577716.01f70fbfd1a6b6582c4c4c2c814fb3ed.
state=OPENING, ts=1332857787876
hbase.master.AssignmentManager: Region has been OPENING for too long,
reassigning
region=gs_users,5351402|fWTtXMEa2WXHOo01esutJA==,1330321577716.01f70fbfd1a6b6582c4c4c2c814fb3ed.
ERROR org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor:
Caught exception
java.lang.ArrayIndexOutOfBoundsException: 20
        at
org.apache.hadoop.hbase.executor.RegionTransitionData.readFields(RegionTransitionData.java:148)
        at
org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:105)
        at
org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
        at
org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198)
        at
org.apache.hadoop.hbase.zookeeper.ZKAssign.getDataNoWatch(ZKAssign.java:755)
        at
org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1743)
        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)


Any insights would be appreciated.

Thank you.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message