hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Errors after major compaction
Date Sun, 03 Jul 2011 17:09:47 GMT
HBASE-3789 should have sped up region assignment.
The patch for 0.90 is attached to that JIRA.

You may prudently apply that patch.

Regards

On Sun, Jul 3, 2011 at 10:01 AM, Eran Kutner <eran@gigya.com> wrote:

> Thanks Ted, but, as stated before, I'm already using 0.90.3, so either it's
> not fixed or it's not the same thing.
>
> -eran
>
>
>
> On Sun, Jul 3, 2011 at 17:27, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > Eran:
> > I was thinking of this:
> > HBASE-3789  Cleanup the locking contention in the master
> >
> > though it doesn't directly handle 'PENDING_OPEN for too long' case.
> >
> > https://issues.apache.org/jira/browse/HBASE-3741 is in 0.90.3 and
> actually
> > close to the symptom you described.
> >
> > On Sun, Jul 3, 2011 at 12:00 AM, Eran Kutner <eran@gigya.com> wrote:
> >
> > > It does seem that both servers opened the same region around the same
> > time.
> > > The region was offline because I disabled the table so I can change its
> > > TTL.
> > >
> > > Here is the log from haddop1-s05 :
> > > 2011-06-29 16:37:12,576 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to
> > > open
> > > region:
> > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > 2011-06-29 16:37:12,680 DEBUG
> > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
> > Processing
> > > open of
> > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > 2011-06-29 16:37:12,680 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > regionserver:60020-0x33004a38816050b Attempting to transition node
> > > 584dac5cc70d8682f71c4675a843c309 from M_ZK_REGION_OFFLINE to
> > > RS_ZK_REGION_OPENING
> > > 2011-06-29 16:37:12,711 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > regionserver:60020-0x33004a38816050b Successfully transitioned node
> > > 584dac5cc70d8682f71c4675a843c309 from M_ZK_REGION_OFFLINE to
> > > RS_ZK_REGION_OPENING
> > > 2011-06-29 16:37:12,711 DEBUG
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > Opening region: REGION => {NAME =>
> > >
> > >
> >
> 'gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.',
> > > STARTKEY => 'GSLoad_1308518553_168_WEB204', ENDKEY =>
> > > 'GSLoad_1308518810_1249_WEB204', ENCODED =>
> > > 584dac5cc70d8682f71c4675a843c309, TABLE => {{NAME => 'gs_raw_events',
> > > FAMILIES => [{NAME => 'events', BLOOMFILTER => 'NONE',
> REPLICATION_SCOPE
> > =>
> > > '1', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '604800', BLOCKSIZE
> =>
> > > '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
> > > 2011-06-29 16:37:12,711 DEBUG
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > Instantiated
> > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > 2011-06-29 16:37:12,847 DEBUG
> org.apache.hadoop.hbase.regionserver.Store:
> > > loaded
> > >
> > >
> >
> hdfs://hadoop1-m1:8020/hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/events/1971818821800304360,
> > > isReference=false, isBulkLoadResult=false, seqid=1162228062,
> > > majorCompaction=false
> > > 2011-06-29 16:37:12,848 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > Onlined
> > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.;
> > > next sequenceid=1162228063
> > > 2011-06-29 16:37:12,849 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > regionserver:60020-0x33004a38816050b Attempting to transition node
> > > 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to
> > > RS_ZK_REGION_OPENING
> > > 2011-06-29 16:37:12,875 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > regionserver:60020-0x33004a38816050b Successfully transitioned node
> > > 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to
> > > RS_ZK_REGION_OPENING
> > > 2011-06-29 16:37:12,951 INFO
> org.apache.hadoop.hbase.catalog.MetaEditor:
> > > Updated row
> > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > in region .META.,,1 with server=hadoop1-s05.farm-ny.gigya.com:60020,
> > > startcode=1307349217076
> > > 2011-06-29 16:37:12,951 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > regionserver:60020-0x33004a38816050b Attempting to transition node
> > > 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to
> > > RS_ZK_REGION_OPENED
> > > 2011-06-29 16:37:12,964 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > regionserver:60020-0x33004a38816050b Successfully transitioned node
> > > 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to
> > > RS_ZK_REGION_OPENED
> > > 2011-06-29 16:37:12,964 DEBUG
> > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened
> > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > 2011-06-29 16:40:00,878 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to
> > > open
> > > region:
> > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > 2011-06-29 16:40:00,878 DEBUG
> > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
> > Processing
> > > open of
> > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > 2011-06-29 16:40:01,079 WARN
> > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
> Attempted
> > > open of
> > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > but already online on this server
> > > 2011-06-29 16:43:50,395 DEBUG
> > > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
> > (major)
> > > requested for
> > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > because User-triggered major compaction; priority=1, compaction queue
> > > size=1248
> > > 2011-06-29 20:19:49,906 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > Starting major compaction on region
> > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > 2011-06-29 20:19:49,906 INFO
> org.apache.hadoop.hbase.regionserver.Store:
> > > Started compaction of 1 file(s) in cf=events  into
> > >
> > >
> >
> hdfs://hadoop1-m1:8020/hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/.tmp,
> > > seqid=1162228062, totalSize=98.3m
> > > 2011-06-29 20:19:49,906 DEBUG
> org.apache.hadoop.hbase.regionserver.Store:
> > > Compacting
> > >
> > >
> >
> hdfs://hadoop1-m1:8020/hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/events/1971818821800304360,
> > > keycount=6882816, bloomtype=NONE, size=98.3m
> > > 2011-06-29 20:19:59,920 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > completed compaction on region
> > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > after 10sec
> > >
> > > And here is the one from hadoop1-s02:
> > > 2011-06-29 16:43:57,935 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to
> > > open
> > > region:
> > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > 2011-06-29 16:43:58,990 DEBUG
> > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
> > Processing
> > > open of
> > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > 2011-06-29 16:43:58,990 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > regionserver:60020-0x23004a31d8904de Attempting to transition node
> > > 584dac5cc70d8682f71c4675a843c309 from M_ZK_REGION_OFFLINE to
> > > RS_ZK_REGION_OPENING
> > > 2011-06-29 16:43:59,002 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > regionserver:60020-0x23004a31d8904de Successfully transitioned node
> > > 584dac5cc70d8682f71c4675a843c309 from M_ZK_REGION_OFFLINE to
> > > RS_ZK_REGION_OPENING
> > > 2011-06-29 16:43:59,002 DEBUG
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > Opening region: REGION => {NAME =>
> > >
> > >
> >
> 'gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.',
> > > STARTKEY => 'GSLoad_1308518553_168_WEB204', ENDKEY =>
> > > 'GSLoad_1308518810_1249_WEB204', ENCODED =>
> > > 584dac5cc70d8682f71c4675a843c309, TABLE => {{NAME => 'gs_raw_events',
> > > FAMILIES => [{NAME => 'events', BLOOMFILTER => 'NONE',
> REPLICATION_SCOPE
> > =>
> > > '1', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '604800', BLOCKSIZE
> =>
> > > '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
> > > 2011-06-29 16:43:59,003 DEBUG
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > Instantiated
> > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > 2011-06-29 16:43:59,204 DEBUG
> org.apache.hadoop.hbase.regionserver.Store:
> > > loaded
> > >
> > >
> >
> hdfs://hadoop1-m1:8020/hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/events/1971818821800304360,
> > > isReference=false, isBulkLoadResult=false, seqid=1162228062,
> > > majorCompaction=false
> > > 2011-06-29 16:43:59,205 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > Onlined
> > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.;
> > > next sequenceid=1162228063
> > > 2011-06-29 16:43:59,205 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > regionserver:60020-0x23004a31d8904de Attempting to transition node
> > > 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to
> > > RS_ZK_REGION_OPENING
> > > 2011-06-29 16:43:59,212 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > regionserver:60020-0x23004a31d8904de Successfully transitioned node
> > > 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to
> > > RS_ZK_REGION_OPENING
> > > 2011-06-29 16:43:59,214 INFO
> org.apache.hadoop.hbase.catalog.MetaEditor:
> > > Updated row
> > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > in region .META.,,1 with server=hadoop1-s02.farm-ny.gigya.com:60020,
> > > startcode=1306919627544
> > > 2011-06-29 16:43:59,214 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > regionserver:60020-0x23004a31d8904de Attempting to transition node
> > > 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to
> > > RS_ZK_REGION_OPENED
> > > 2011-06-29 16:43:59,224 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > regionserver:60020-0x23004a31d8904de Successfully transitioned node
> > > 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to
> > > RS_ZK_REGION_OPENED
> > > 2011-06-29 16:43:59,224 DEBUG
> > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened
> > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > java.io.IOException: Got error in response to OP_READ_BLOCK self=/
> > > 10.1.104.2:33356, remote=/10.1.104.2:50010 for file
> > >
> > >
> >
> /hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/events/1971818821800304360
> > > for block 3674866614142268536_674205
> > >
> > >
> > > Ted, can you please point me to J-D's bug fix you mentioned? Are you
> > > positive it's the same scenario - data loss is a very serious problem
> for
> > a
> > > DB.
> > > I'd really like to apply that patch ASAP, because when I run hbck I get
> > > over
> > > 400 regions which are multiply assigned.
> > > Last question, I understand the region's data is lost but is there a
> way
> > to
> > > at least make the table consistent again by some how removing the lost
> > > region?
> > >
> > > Thanks.
> > >
> > > -eran
> > >
> > >
> > >
> > > On Sat, Jul 2, 2011 at 01:46, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > > > >> 2011-06-29 16:43:57,880 INFO
> > > > org.apache.hadoop.hbase.
> > > > master.AssignmentManager: Region has been
> > > > PENDING_OPEN for too long, reassigning
> > > >
> > > >
> > >
> >
> region=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > >
> > > > The double assignment should have been fixed by J-D's recent checkin.
> > > >
> > > > On Fri, Jul 1, 2011 at 3:14 PM, Stack <stack@duboce.net> wrote:
> > > >
> > > > > Is
> > > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > > > the region that was having the issue?  If so, if you looked in
> > > > > hadoop1-s05's logs, was this region opened around 2011-06-29
> > 16:43:57?
> > > > >  Was it also opened hadoop1-s02 not long after?  Did you say what
> > > > > version of hbase you are on?
> > > > >
> > > > > St.Ack
> > > > >
> > > > > On Fri, Jul 1, 2011 at 5:08 AM, Eran Kutner <eran@gigya.com>
> wrote:
> > > > > > Hi Stack,
> > > > > > I'm not sure what the log means. I do see references to two
> > different
> > > > > > servers, but that would probably happen if there was normal
> > > transition
> > > > I
> > > > > > assume. I'm using version 0.90.3
> > > > > > Here are the relevant lines from the master logs:
> > > > > >
> > > > > > 2011-06-19 21:39:37,164 INFO
> > > > > org.apache.hadoop.hbase.master.ServerManager:
> > > > > > Received REGION_SPLIT:
> > > > > >
> > > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533691659.9000a5d8df9502efc90d2c23567e4658.:
> > > > > > Daughters;
> > > > > >
> > > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.,
> > > > > >
> > > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518810_1249_WEB204,1308533970928.46f876a4e97be04edb35eb8f8959d482.
> > > > > > from hadoop1-s05.farm-ny.gigya.com,60020,1307349217076
> > > > > > 2011-06-19 21:43:12,983 INFO
> > > > org.apache.hadoop.hbase.catalog.MetaEditor:
> > > > > > Deleted daughter reference
> > > > > >
> > > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.,
> > > > > > qualifier=splitA, from parent
> > > > > >
> > > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533691659.9000a5d8df9502efc90d2c23567e4658.
> > > > > > 2011-06-29 16:29:36,143 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Starting
> > > unassignment
> > > > > of
> > > > > > region
> > > > > >
> > > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > > > > (offlining)
> > > > > > 2011-06-29 16:29:36,146 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE
to
> > > > > serverName=
> > > > > > hadoop1-s05.farm-ny.gigya.com,60020,1307349217076,
> > load=(requests=0,
> > > > > > regions=1654, usedHeap=1870, maxHeap=12483) for region
> > > > > >
> > > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > > > > 2011-06-29 16:29:38,327 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Handling new
> > > > unassigned
> > > > > > node: /hbase/unassigned/584dac5cc70d8682f71c4675a843c309
> > > > > >
> > > > >
> > > >
> > >
> >
> (region=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.,
> > > > > > server=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076,
> > > > > > state=RS_ZK_REGION_CLOSED)
> > > > > > 2011-06-29 16:29:38,327 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
> > > > > > transition=RS_ZK_REGION_CLOSED,
> > > > > > server=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076,
> > > > > > region=584dac5cc70d8682f71c4675a843c309
> > > > > > 2011-06-29 16:30:53,742 DEBUG
> > > > > > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler:
> > Handling
> > > > > CLOSED
> > > > > > event for 584dac5cc70d8682f71c4675a843c309
> > > > > > 2011-06-29 16:30:53,742 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Table being
> > > disabled
> > > > so
> > > > > > deleting ZK node and removing from regions in transition,
> skipping
> > > > > > assignment of region
> > > > > >
> > > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > > > > 2011-06-29 16:30:53,742 DEBUG
> > > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > > master:60000-0x13004a31d7804c4 Deleting existing unassigned
node
> > for
> > > > > > 584dac5cc70d8682f71c4675a843c309 that is in expected state
> > > > > > RS_ZK_REGION_CLOSED
> > > > > > 2011-06-29 16:30:53,801 DEBUG
> > > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > > master:60000-0x13004a31d7804c4 Successfully deleted unassigned
> node
> > > for
> > > > > > region 584dac5cc70d8682f71c4675a843c309 in expected state
> > > > > > RS_ZK_REGION_CLOSED
> > > > > > 2011-06-29 16:34:01,453 INFO
> > > > org.apache.hadoop.hbase.catalog.MetaEditor:
> > > > > > Updated region
> > > > > >
> > > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > > > > in META
> > > > > > 2011-06-29 16:37:12,247 DEBUG
> > > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > > master:60000-0x13004a31d7804c4 Creating (or updating) unassigned
> > node
> > > > for
> > > > > > 584dac5cc70d8682f71c4675a843c309 with OFFLINE state
> > > > > > 2011-06-29 16:37:12,576 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: No previous
> > > > transition
> > > > > > plan was found (or we are ignoring an existing plan) for
> > > > > >
> > > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > > > > so generated a random one;
> > > > > >
> > > > >
> > > >
> > >
> >
> hri=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.,
> > > > > > src=, dest=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076;
5
> > > > > (online=5,
> > > > > > exclude=null) available servers
> > > > > > 2011-06-29 16:37:12,576 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning
> region
> > > > > >
> > > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > > > > to hadoop1-s05.farm-ny.gigya.com,60020,1307349217076
> > > > > > 2011-06-29 16:37:13,102 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
> > > > > > transition=RS_ZK_REGION_OPENED,
> > > > > > server=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076,
> > > > > > region=584dac5cc70d8682f71c4675a843c309
> > > > > > 2011-06-29 16:39:54,075 DEBUG
> > > > > > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
> > Handling
> > > > > OPENED
> > > > > > event for 584dac5cc70d8682f71c4675a843c309; deleting unassigned
> > node
> > > > > > 2011-06-29 16:39:54,075 DEBUG
> > > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > > master:60000-0x13004a31d7804c4 Deleting existing unassigned
node
> > for
> > > > > > 584dac5cc70d8682f71c4675a843c309 that is in expected state
> > > > > > RS_ZK_REGION_OPENED
> > > > > > 2011-06-29 16:39:54,192 DEBUG
> > > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > > master:60000-0x13004a31d7804c4 Successfully deleted unassigned
> node
> > > for
> > > > > > region 584dac5cc70d8682f71c4675a843c309 in expected state
> > > > > > RS_ZK_REGION_OPENED
> > > > > > 2011-06-29 16:39:54,326 DEBUG
> > > > > > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
> Opened
> > > > region
> > > > > >
> > > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > > > > on hadoop1-s05.farm-ny.gigya.com,60020,1307349217076
> > > > > > 2011-06-29 16:40:00,598 DEBUG
> > > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > > master:60000-0x13004a31d7804c4 Creating (or updating) unassigned
> > node
> > > > for
> > > > > > 584dac5cc70d8682f71c4675a843c309 with OFFLINE state
> > > > > > 2011-06-29 16:40:00,877 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: No previous
> > > > transition
> > > > > > plan was found (or we are ignoring an existing plan) for
> > > > > >
> > > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > > > > so generated a random one;
> > > > > >
> > > > >
> > > >
> > >
> >
> hri=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.,
> > > > > > src=, dest=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076;
5
> > > > > (online=5,
> > > > > > exclude=null) available servers
> > > > > > 2011-06-29 16:40:00,877 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning
> region
> > > > > >
> > > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > > > > to hadoop1-s05.farm-ny.gigya.com,60020,1307349217076
> > > > > > 2011-06-29 16:43:57,879 INFO
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Regions in
> > > transition
> > > > > > timed out:
> > > > > >
> > > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > > > > state=PENDING_OPEN, ts=1309380052723
> > > > > > 2011-06-29 16:43:57,880 INFO
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region has
been
> > > > > > PENDING_OPEN for too long, reassigning
> > > > > >
> > > > >
> > > >
> > >
> >
> region=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > > > > 2011-06-29 16:43:57,936 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Forcing
> OFFLINE;
> > > > > >
> > > > >
> > > >
> > >
> >
> was=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > > > > state=PENDING_OPEN, ts=1309380052723
> > > > > > 2011-06-29 16:43:57,936 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: No previous
> > > > transition
> > > > > > plan was found (or we are ignoring an existing plan) for
> > > > > >
> > > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > > > > so generated a random one;
> > > > > >
> > > > >
> > > >
> > >
> >
> hri=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.,
> > > > > > src=, dest=hadoop1-s02.farm-ny.gigya.com,60020,1306919627544;
5
> > > > > (online=5,
> > > > > > exclude=null) available servers
> > > > > > 2011-06-29 16:43:57,936 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning
> region
> > > > > >
> > > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > > > > to hadoop1-s02.farm-ny.gigya.com,60020,1306919627544
> > > > > > 2011-06-29 16:43:59,022 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
> > > > > > transition=RS_ZK_REGION_OPENING,
> > > > > > server=hadoop1-s02.farm-ny.gigya.com,60020,1306919627544,
> > > > > > region=584dac5cc70d8682f71c4675a843c309
> > > > > > 2011-06-29 16:43:59,221 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
> > > > > > transition=RS_ZK_REGION_OPENING,
> > > > > > server=hadoop1-s02.farm-ny.gigya.com,60020,1306919627544,
> > > > > > region=584dac5cc70d8682f71c4675a843c309
> > > > > > 2011-06-29 16:43:59,226 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
> > > > > > transition=RS_ZK_REGION_OPENED,
> > > > > > server=hadoop1-s02.farm-ny.gigya.com,60020,1306919627544,
> > > > > > region=584dac5cc70d8682f71c4675a843c309
> > > > > > 2011-06-29 16:43:59,274 DEBUG
> > > > > > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
> > Handling
> > > > > OPENED
> > > > > > event for 584dac5cc70d8682f71c4675a843c309; deleting unassigned
> > node
> > > > > > 2011-06-29 16:43:59,274 DEBUG
> > > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > > master:60000-0x13004a31d7804c4 Deleting existing unassigned
node
> > for
> > > > > > 584dac5cc70d8682f71c4675a843c309 that is in expected state
> > > > > > RS_ZK_REGION_OPENED
> > > > > > 2011-06-29 16:43:59,296 DEBUG
> > > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > > master:60000-0x13004a31d7804c4 Successfully deleted unassigned
> node
> > > for
> > > > > > region 584dac5cc70d8682f71c4675a843c309 in expected state
> > > > > > RS_ZK_REGION_OPENED
> > > > > > 2011-06-29 16:43:59,375 WARN
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Overwriting
> > > > > > 584dac5cc70d8682f71c4675a843c309 on
> > > > > > serverName=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076,
> > > > > > load=(requests=0, regions=1273, usedHeap=2676, maxHeap=12483)
> > > > > > 2011-06-29 16:43:59,375 DEBUG
> > > > > > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
> Opened
> > > > region
> > > > > >
> > > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > > > > on hadoop1-s02.farm-ny.gigya.com,60020,1306919627544
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > -eran
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 1, 2011 at 09:05, Stack <stack@duboce.net>
wrote:
> > > > > >
> > > > > >> So, Eran, it seems as though two RegionServers were carrying
the
> > > > > >> region?  One deleted a file (compaction on its side)?  Can
you
> > > figure
> > > > > >> if indeed two servers had same region?  (Check master logs
for
> > this
> > > > > >> regions assignments).
> > > > > >>
> > > > > >> What version of hbase?
> > > > > >>
> > > > > >> St.Ack
> > > > > >>
> > > > > >>
> > > > > >> On Thu, Jun 30, 2011 at 3:58 AM, Eran Kutner <eran@gigya.com>
> > > wrote:
> > > > > >> > Hi,
> > > > > >> > I have a cluster of 5 nodes with one large table that
> currently
> > > has
> > > > > >> around
> > > > > >> > 12000 regions. Everything was working fine for relatively
long
> > > time,
> > > > > >> until
> > > > > >> > now.
> > > > > >> > Yesterday I significantly reduced the TTL on the table
and
> > > initiated
> > > > > >> major
> > > > > >> > compaction. This should have reduced the table size
to about
> 20%
> > > of
> > > > > its
> > > > > >> > original size.
> > > > > >> > Today, I'm getting errors of inaccessible files on
HDFS, for
> > > > example:
> > > > > >> > java.io.IOException: Got error in response to OP_READ_BLOCK
> > self=/
> > > > > >> > 10.1.104.2:58047, remote=/10.1.104.2:50010 for file
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/events/1971818821800304360
> > > > > >> > for block 3674866614142268536_674205
> > > > > >> >        at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1487)
> > > > > >> >        at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1811)
> > > > > >> >        at
> > > > > >> >
> > > > >
> > >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1948)
> > > > > >> >        at
> java.io.DataInputStream.read(DataInputStream.java:132)
> > > > > >> >        at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:105)
> > > > > >> >        at
> > > > > java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> > > > > >> >        at
> > > > > java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> > > > > >> >        at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.io.compress.BlockDecompressorStream.rawReadInt(BlockDecompressorStream.java:128)
> > > > > >> >        at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:68)
> > > > > >> >        at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:75)
> > > > > >> >        at
> > > > > java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
> > > > > >> >        at
> > > > > java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> > > > > >> >        at
> > org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102)
> > > > > >> >        at
> > > > > >> >
> > > > >
> > >
> org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1094)
> > > > > >> >        at
> > > > > >> >
> > > > >
> > >
> org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:1036)
> > > > > >> >        at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.seekTo(HFile.java:1433)
> > > > > >> >        at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:139)
> > > > > >> >        at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:96)
> > > > > >> >        at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:77)
> > > > > >> >        at
> > > > > >> >
> > > >
> org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1341)
> > > > > >> >        at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.<init>(HRegion.java:2269)
> > > > > >> >        at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateInternalScanner(HRegion.java:1126)
> > > > > >> >        at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1118)
> > > > > >> >        at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1102)
> > > > > >> >        at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1781)
> > > > > >> >        at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown
> > > > Source)
> > > > > >> >        at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > > > >> >        at java.lang.reflect.Method.invoke(Method.java:597)
> > > > > >> >        at
> > > > > >> >
> > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
> > > > > >> >        at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
> > > > > >> >
> > > > > >> > I checked and the file, indeed doesn't exist on HDFS,
here is
> > the
> > > > name
> > > > > >> node
> > > > > >> > logs for this block, apparently because it was deleted:
> > > > > >> > 2011-06-19 21:39:36,651 INFO
> org.apache.hadoop.hdfs.StateChange:
> > > > > BLOCK*
> > > > > >> > NameSystem.allocateBlock:
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/.tmp/2096863423111131624.
> > > > > >> > blk_3674866614142268536_674205
> > > > > >> > 2011-06-19 21:40:11,954 INFO
> org.apache.hadoop.hdfs.StateChange:
> > > > > BLOCK*
> > > > > >> > NameSystem.addStoredBlock: blockMap updated: 10.1.104.2:50010
> is
> > > > > added
> > > > > >> to
> > > > > >> > blk_3674866614142268536_674205 size 67108864
> > > > > >> > 2011-06-19 21:40:11,954 INFO
> org.apache.hadoop.hdfs.StateChange:
> > > > > BLOCK*
> > > > > >> > NameSystem.addStoredBlock: blockMap updated: 10.1.104.3:50010
> is
> > > > > added
> > > > > >> to
> > > > > >> > blk_3674866614142268536_674205 size 67108864
> > > > > >> > 2011-06-19 21:40:11,955 INFO
> org.apache.hadoop.hdfs.StateChange:
> > > > > BLOCK*
> > > > > >> > NameSystem.addStoredBlock: blockMap updated: 10.1.104.5:50010
> is
> > > > > added
> > > > > >> to
> > > > > >> > blk_3674866614142268536_674205 size 67108864
> > > > > >> > 2011-06-29 20:20:01,662 INFO
> org.apache.hadoop.hdfs.StateChange:
> > > > > BLOCK*
> > > > > >> ask
> > > > > >> > 10.1.104.2:50010 to delete  blk_3674866614142268536_674205
> > > > > >> > 2011-06-29 20:20:13,671 INFO
> org.apache.hadoop.hdfs.StateChange:
> > > > > BLOCK*
> > > > > >> ask
> > > > > >> > 10.1.104.5:50010 to delete  blk_-4056387895369608597_675174
> > > > > >> > blk_-5017882805850873821_672281 blk_702373987100607684_672288
> > > > > >> > blk_-5357157478043290010_668506 blk_7118175133735412789_674903
> > > > > >> > blk_-3569812563715986384_675231 blk_8296855057240604851_669285
> > > > > >> > blk_-6483679172530609101_674268 blk_8738539715363739108_673682
> > > > > >> > blk_1744841904626813502_675238 blk_-6035315106100051103_674266
> > > > > >> > blk_-1789501623010070237_674908 blk_1944054629336265129_673689
> > > > > >> > blk_3674866614142268536_674205 blk_7930425446738143892_647410
> > > > > >> > blk_-3007186753042268449_669296
> blk_-5482302621772778061_647416
> > > > > >> > blk_-3765735404924932181_672004 blk_7476090998956811081_675169
> > > > > >> > blk_7862291659285127712_646890 blk_-2666244746343584727_672013
> > > > > >> > blk_6039172613960915602_674206 blk_-8470884397893086564_646899
> > > > > >> > blk_4558230221166712802_668510
> > > > > >> > 2011-06-29 20:20:46,698 INFO
> org.apache.hadoop.hdfs.StateChange:
> > > > > BLOCK*
> > > > > >> ask
> > > > > >> > 10.1.104.3:50010 to delete  blk_-7851606440036350812_671552
> > > > > >> > blk_9214649160203453845_647566 blk_702373987100607684_672288
> > > > > >> > blk_5958099369749234073_668143 blk_-5172218034084903173_673109
> > > > > >> > blk_-2934555181472719276_646476
> blk_-1409986679370073931_672552
> > > > > >> > blk_-2786034325506235869_669086 blk_3674866614142268536_674205
> > > > > >> > blk_510158930393283118_673225 blk_916244738216205237_677068
> > > > > >> > blk_-4317027806407316617_670379 blk_8555705688850972639_673485
> > > > > >> > blk_-3765735404924932181_672004
> blk_-5482302621772778061_647416
> > > > > >> > blk_-2461801145731752623_674605
> blk_-8737702908048998927_672549
> > > > > >> > blk_-8470884397893086564_646899 blk_4558230221166712802_668510
> > > > > >> > blk_-4056387895369608597_675174
> blk_-8675430610673886073_647695
> > > > > >> > blk_-6642870230256028318_668211
> blk_-3890408516362176771_677483
> > > > > >> > blk_-3569812563715986384_675231
> blk_-5007142629771321873_674548
> > > > > >> > blk_-3345355191863431669_667066 blk_8296855057240604851_669285
> > > > > >> > blk_-6595462308187757470_672420
> blk_-2583945228783203947_674607
> > > > > >> > blk_-346988625120916345_677063 blk_4449525876338684218_674496
> > > > > >> > blk_2617172363857549730_668201 blk_8738539715363739108_673682
> > > > > >> > blk_-208904675456598428_679286 blk_-497549341281882641_646477
> > > > > >> > blk_-6035315106100051103_674266
> blk_-2356539038067297411_672388
> > > > > >> > blk_-3881703084497103249_668137 blk_2214397881104950315_646643
> > > > > >> > blk_-5907671443455357710_673223
> blk_-2431880309956605679_669204
> > > > > >> > blk_6039172613960915602_674206 blk_5053643911633142711_669194
> > > > > >> > blk_-2636977729205236686_674664
> > > > > >> >
> > > > > >> > I assume the file loss is somehow related to this change
and
> the
> > > > major
> > > > > >> > compaction that followed because the same scan that
is failing
> > now
> > > > was
> > > > > >> > working fine yesterday and that is the only changed
that
> > happened
> > > on
> > > > > the
> > > > > >> > cluster. Any suggestions what to do now?
> > > > > >> >
> > > > > >> > Thanks.
> > > > > >> >
> > > > > >> > -eran
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message