hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Errors after major compaction
Date Thu, 07 Jul 2011 05:35:35 GMT
On Sun, Jul 3, 2011 at 12:00 AM, Eran Kutner <eran@gigya.com> wrote:
> It does seem that both servers opened the same region around the same time.
> The region was offline because I disabled the table so I can change its TTL.
.....
> 2011-06-29 16:37:12,964 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> regionserver:60020-0x33004a38816050b Successfully transitioned node
> 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to
> RS_ZK_REGION_OPENED


So, we just successfully OPENED the region on this server.


> 2011-06-29 16:40:00,878 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open
> region:
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.


Then we are asked open it again a few minutes later (Would be
interesting to see whats up on master at this time -- why it decided
to open region again).



> 2011-06-29 16:40:01,079 WARN
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted
> open of
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> but already online on this server

We refuse to open it again.




> And here is the one from hadoop1-s02:
> 2011-06-29 16:43:57,935 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open
> region:
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.


This is after we were asked open it on other server.  My guess that
the refusal of the above server to reopen region it already had showed
as exception opening the region and the master then assigned it here
(why it decided to reassign though is of interest).


....

> 2011-06-29 16:43:59,224 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> regionserver:60020-0x23004a31d8904de Successfully transitioned node
> 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to
> RS_ZK_REGION_OPENED

This looks like an almost successful open on this new regionserver.


> 2011-06-29 16:43:59,224 DEBUG
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> java.io.IOException: Got error in response to OP_READ_BLOCK self=/
> 10.1.104.2:33356, remote=/10.1.104.2:50010 for file
> /hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/events/1971818821800304360
> for block 3674866614142268536_674205
>

But then the compaction had run on the first server and changed the
list of files that made up a regions contents -- hence the failed
read.


> Ted, can you please point me to J-D's bug fix you mentioned?

J-D has been working on a few issues hereabouts.  There is the
HBASE-3789 Ted mentions and then there is HBASE-4015 which still needs
a patch.

> Are you
> positive it's the same scenario - data loss is a very serious problem for a
> DB.
> I'd really like to apply that patch ASAP, because when I run hbck I get over
> 400 regions which are multiply assigned.

How many regions on your cluster?  Is this on startup?

If you see the double-assignment comes because timeout monitor cuts
in, you could up your hbase.master.assignment.timeoutmonitor.timeout
period in your hbase-site.xml for now until you pick up the other
fixes that address this issue properly.

St.Ack

> Last question, I understand the region's data is lost but is there a way to
> at least make the table consistent again by some how removing the lost
> region?
>
> Thanks.
>
> -eran
>
>
>
> On Sat, Jul 2, 2011 at 01:46, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> >> 2011-06-29 16:43:57,880 INFO
>> org.apache.hadoop.hbase.
>> master.AssignmentManager: Region has been
>> PENDING_OPEN for too long, reassigning
>>
>> region=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
>>
>> The double assignment should have been fixed by J-D's recent checkin.
>>
>> On Fri, Jul 1, 2011 at 3:14 PM, Stack <stack@duboce.net> wrote:
>>
>> > Is
>> >
>> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
>> > the region that was having the issue?  If so, if you looked in
>> > hadoop1-s05's logs, was this region opened around 2011-06-29 16:43:57?
>> >  Was it also opened hadoop1-s02 not long after?  Did you say what
>> > version of hbase you are on?
>> >
>> > St.Ack
>> >
>> > On Fri, Jul 1, 2011 at 5:08 AM, Eran Kutner <eran@gigya.com> wrote:
>> > > Hi Stack,
>> > > I'm not sure what the log means. I do see references to two different
>> > > servers, but that would probably happen if there was normal transition
>> I
>> > > assume. I'm using version 0.90.3
>> > > Here are the relevant lines from the master logs:
>> > >
>> > > 2011-06-19 21:39:37,164 INFO
>> > org.apache.hadoop.hbase.master.ServerManager:
>> > > Received REGION_SPLIT:
>> > >
>> >
>> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533691659.9000a5d8df9502efc90d2c23567e4658.:
>> > > Daughters;
>> > >
>> >
>> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.,
>> > >
>> >
>> gs_raw_events,GSLoad_1308518810_1249_WEB204,1308533970928.46f876a4e97be04edb35eb8f8959d482.
>> > > from hadoop1-s05.farm-ny.gigya.com,60020,1307349217076
>> > > 2011-06-19 21:43:12,983 INFO
>> org.apache.hadoop.hbase.catalog.MetaEditor:
>> > > Deleted daughter reference
>> > >
>> >
>> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.,
>> > > qualifier=splitA, from parent
>> > >
>> >
>> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533691659.9000a5d8df9502efc90d2c23567e4658.
>> > > 2011-06-29 16:29:36,143 DEBUG
>> > > org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment
>> > of
>> > > region
>> > >
>> >
>> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
>> > > (offlining)
>> > > 2011-06-29 16:29:36,146 DEBUG
>> > > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to
>> > serverName=
>> > > hadoop1-s05.farm-ny.gigya.com,60020,1307349217076, load=(requests=0,
>> > > regions=1654, usedHeap=1870, maxHeap=12483) for region
>> > >
>> >
>> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
>> > > 2011-06-29 16:29:38,327 DEBUG
>> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling new
>> unassigned
>> > > node: /hbase/unassigned/584dac5cc70d8682f71c4675a843c309
>> > >
>> >
>> (region=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.,
>> > > server=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076,
>> > > state=RS_ZK_REGION_CLOSED)
>> > > 2011-06-29 16:29:38,327 DEBUG
>> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
>> > > transition=RS_ZK_REGION_CLOSED,
>> > > server=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076,
>> > > region=584dac5cc70d8682f71c4675a843c309
>> > > 2011-06-29 16:30:53,742 DEBUG
>> > > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling
>> > CLOSED
>> > > event for 584dac5cc70d8682f71c4675a843c309
>> > > 2011-06-29 16:30:53,742 DEBUG
>> > > org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled
>> so
>> > > deleting ZK node and removing from regions in transition, skipping
>> > > assignment of region
>> > >
>> >
>> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
>> > > 2011-06-29 16:30:53,742 DEBUG
>> org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> > > master:60000-0x13004a31d7804c4 Deleting existing unassigned node for
>> > > 584dac5cc70d8682f71c4675a843c309 that is in expected state
>> > > RS_ZK_REGION_CLOSED
>> > > 2011-06-29 16:30:53,801 DEBUG
>> org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> > > master:60000-0x13004a31d7804c4 Successfully deleted unassigned node for
>> > > region 584dac5cc70d8682f71c4675a843c309 in expected state
>> > > RS_ZK_REGION_CLOSED
>> > > 2011-06-29 16:34:01,453 INFO
>> org.apache.hadoop.hbase.catalog.MetaEditor:
>> > > Updated region
>> > >
>> >
>> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
>> > > in META
>> > > 2011-06-29 16:37:12,247 DEBUG
>> org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> > > master:60000-0x13004a31d7804c4 Creating (or updating) unassigned node
>> for
>> > > 584dac5cc70d8682f71c4675a843c309 with OFFLINE state
>> > > 2011-06-29 16:37:12,576 DEBUG
>> > > org.apache.hadoop.hbase.master.AssignmentManager: No previous
>> transition
>> > > plan was found (or we are ignoring an existing plan) for
>> > >
>> >
>> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
>> > > so generated a random one;
>> > >
>> >
>> hri=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.,
>> > > src=, dest=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076; 5
>> > (online=5,
>> > > exclude=null) available servers
>> > > 2011-06-29 16:37:12,576 DEBUG
>> > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
>> > >
>> >
>> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
>> > > to hadoop1-s05.farm-ny.gigya.com,60020,1307349217076
>> > > 2011-06-29 16:37:13,102 DEBUG
>> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
>> > > transition=RS_ZK_REGION_OPENED,
>> > > server=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076,
>> > > region=584dac5cc70d8682f71c4675a843c309
>> > > 2011-06-29 16:39:54,075 DEBUG
>> > > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling
>> > OPENED
>> > > event for 584dac5cc70d8682f71c4675a843c309; deleting unassigned node
>> > > 2011-06-29 16:39:54,075 DEBUG
>> org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> > > master:60000-0x13004a31d7804c4 Deleting existing unassigned node for
>> > > 584dac5cc70d8682f71c4675a843c309 that is in expected state
>> > > RS_ZK_REGION_OPENED
>> > > 2011-06-29 16:39:54,192 DEBUG
>> org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> > > master:60000-0x13004a31d7804c4 Successfully deleted unassigned node for
>> > > region 584dac5cc70d8682f71c4675a843c309 in expected state
>> > > RS_ZK_REGION_OPENED
>> > > 2011-06-29 16:39:54,326 DEBUG
>> > > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened
>> region
>> > >
>> >
>> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
>> > > on hadoop1-s05.farm-ny.gigya.com,60020,1307349217076
>> > > 2011-06-29 16:40:00,598 DEBUG
>> org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> > > master:60000-0x13004a31d7804c4 Creating (or updating) unassigned node
>> for
>> > > 584dac5cc70d8682f71c4675a843c309 with OFFLINE state
>> > > 2011-06-29 16:40:00,877 DEBUG
>> > > org.apache.hadoop.hbase.master.AssignmentManager: No previous
>> transition
>> > > plan was found (or we are ignoring an existing plan) for
>> > >
>> >
>> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
>> > > so generated a random one;
>> > >
>> >
>> hri=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.,
>> > > src=, dest=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076; 5
>> > (online=5,
>> > > exclude=null) available servers
>> > > 2011-06-29 16:40:00,877 DEBUG
>> > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
>> > >
>> >
>> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
>> > > to hadoop1-s05.farm-ny.gigya.com,60020,1307349217076
>> > > 2011-06-29 16:43:57,879 INFO
>> > > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
>> > > timed out:
>> > >
>> >
>> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
>> > > state=PENDING_OPEN, ts=1309380052723
>> > > 2011-06-29 16:43:57,880 INFO
>> > > org.apache.hadoop.hbase.master.AssignmentManager: Region has been
>> > > PENDING_OPEN for too long, reassigning
>> > >
>> >
>> region=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
>> > > 2011-06-29 16:43:57,936 DEBUG
>> > > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
>> > >
>> >
>> was=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
>> > > state=PENDING_OPEN, ts=1309380052723
>> > > 2011-06-29 16:43:57,936 DEBUG
>> > > org.apache.hadoop.hbase.master.AssignmentManager: No previous
>> transition
>> > > plan was found (or we are ignoring an existing plan) for
>> > >
>> >
>> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
>> > > so generated a random one;
>> > >
>> >
>> hri=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.,
>> > > src=, dest=hadoop1-s02.farm-ny.gigya.com,60020,1306919627544; 5
>> > (online=5,
>> > > exclude=null) available servers
>> > > 2011-06-29 16:43:57,936 DEBUG
>> > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
>> > >
>> >
>> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
>> > > to hadoop1-s02.farm-ny.gigya.com,60020,1306919627544
>> > > 2011-06-29 16:43:59,022 DEBUG
>> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
>> > > transition=RS_ZK_REGION_OPENING,
>> > > server=hadoop1-s02.farm-ny.gigya.com,60020,1306919627544,
>> > > region=584dac5cc70d8682f71c4675a843c309
>> > > 2011-06-29 16:43:59,221 DEBUG
>> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
>> > > transition=RS_ZK_REGION_OPENING,
>> > > server=hadoop1-s02.farm-ny.gigya.com,60020,1306919627544,
>> > > region=584dac5cc70d8682f71c4675a843c309
>> > > 2011-06-29 16:43:59,226 DEBUG
>> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
>> > > transition=RS_ZK_REGION_OPENED,
>> > > server=hadoop1-s02.farm-ny.gigya.com,60020,1306919627544,
>> > > region=584dac5cc70d8682f71c4675a843c309
>> > > 2011-06-29 16:43:59,274 DEBUG
>> > > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling
>> > OPENED
>> > > event for 584dac5cc70d8682f71c4675a843c309; deleting unassigned node
>> > > 2011-06-29 16:43:59,274 DEBUG
>> org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> > > master:60000-0x13004a31d7804c4 Deleting existing unassigned node for
>> > > 584dac5cc70d8682f71c4675a843c309 that is in expected state
>> > > RS_ZK_REGION_OPENED
>> > > 2011-06-29 16:43:59,296 DEBUG
>> org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> > > master:60000-0x13004a31d7804c4 Successfully deleted unassigned node for
>> > > region 584dac5cc70d8682f71c4675a843c309 in expected state
>> > > RS_ZK_REGION_OPENED
>> > > 2011-06-29 16:43:59,375 WARN
>> > > org.apache.hadoop.hbase.master.AssignmentManager: Overwriting
>> > > 584dac5cc70d8682f71c4675a843c309 on
>> > > serverName=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076,
>> > > load=(requests=0, regions=1273, usedHeap=2676, maxHeap=12483)
>> > > 2011-06-29 16:43:59,375 DEBUG
>> > > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened
>> region
>> > >
>> >
>> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
>> > > on hadoop1-s02.farm-ny.gigya.com,60020,1306919627544
>> > >
>> > > Thanks.
>> > >
>> > > -eran
>> > >
>> > >
>> > >
>> > > On Fri, Jul 1, 2011 at 09:05, Stack <stack@duboce.net> wrote:
>> > >
>> > >> So, Eran, it seems as though two RegionServers were carrying the
>> > >> region?  One deleted a file (compaction on its side)?  Can you figure
>> > >> if indeed two servers had same region?  (Check master logs for this
>> > >> regions assignments).
>> > >>
>> > >> What version of hbase?
>> > >>
>> > >> St.Ack
>> > >>
>> > >>
>> > >> On Thu, Jun 30, 2011 at 3:58 AM, Eran Kutner <eran@gigya.com>
wrote:
>> > >> > Hi,
>> > >> > I have a cluster of 5 nodes with one large table that currently
has
>> > >> around
>> > >> > 12000 regions. Everything was working fine for relatively long
time,
>> > >> until
>> > >> > now.
>> > >> > Yesterday I significantly reduced the TTL on the table and initiated
>> > >> major
>> > >> > compaction. This should have reduced the table size to about 20%
of
>> > its
>> > >> > original size.
>> > >> > Today, I'm getting errors of inaccessible files on HDFS, for
>> example:
>> > >> > java.io.IOException: Got error in response to OP_READ_BLOCK self=/
>> > >> > 10.1.104.2:58047, remote=/10.1.104.2:50010 for file
>> > >> >
>> > >>
>> >
>> /hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/events/1971818821800304360
>> > >> > for block 3674866614142268536_674205
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1487)
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1811)
>> > >> >        at
>> > >> >
>> > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1948)
>> > >> >        at java.io.DataInputStream.read(DataInputStream.java:132)
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:105)
>> > >> >        at
>> > java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>> > >> >        at
>> > java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.io.compress.BlockDecompressorStream.rawReadInt(BlockDecompressorStream.java:128)
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:68)
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:75)
>> > >> >        at
>> > java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
>> > >> >        at
>> > java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>> > >> >        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102)
>> > >> >        at
>> > >> >
>> > org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1094)
>> > >> >        at
>> > >> >
>> > org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:1036)
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.seekTo(HFile.java:1433)
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:139)
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:96)
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:77)
>> > >> >        at
>> > >> >
>> org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1341)
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.<init>(HRegion.java:2269)
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.instantiateInternalScanner(HRegion.java:1126)
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1118)
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1102)
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1781)
>> > >> >        at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown
>> Source)
>> > >> >        at
>> > >> >
>> > >>
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > >> >        at java.lang.reflect.Method.invoke(Method.java:597)
>> > >> >        at
>> > >> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
>> > >> >
>> > >> > I checked and the file, indeed doesn't exist on HDFS, here is
the
>> name
>> > >> node
>> > >> > logs for this block, apparently because it was deleted:
>> > >> > 2011-06-19 21:39:36,651 INFO org.apache.hadoop.hdfs.StateChange:
>> > BLOCK*
>> > >> > NameSystem.allocateBlock:
>> > >> >
>> > >>
>> >
>> /hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/.tmp/2096863423111131624.
>> > >> > blk_3674866614142268536_674205
>> > >> > 2011-06-19 21:40:11,954 INFO org.apache.hadoop.hdfs.StateChange:
>> > BLOCK*
>> > >> > NameSystem.addStoredBlock: blockMap updated: 10.1.104.2:50010
is
>> > added
>> > >> to
>> > >> > blk_3674866614142268536_674205 size 67108864
>> > >> > 2011-06-19 21:40:11,954 INFO org.apache.hadoop.hdfs.StateChange:
>> > BLOCK*
>> > >> > NameSystem.addStoredBlock: blockMap updated: 10.1.104.3:50010
is
>> > added
>> > >> to
>> > >> > blk_3674866614142268536_674205 size 67108864
>> > >> > 2011-06-19 21:40:11,955 INFO org.apache.hadoop.hdfs.StateChange:
>> > BLOCK*
>> > >> > NameSystem.addStoredBlock: blockMap updated: 10.1.104.5:50010
is
>> > added
>> > >> to
>> > >> > blk_3674866614142268536_674205 size 67108864
>> > >> > 2011-06-29 20:20:01,662 INFO org.apache.hadoop.hdfs.StateChange:
>> > BLOCK*
>> > >> ask
>> > >> > 10.1.104.2:50010 to delete  blk_3674866614142268536_674205
>> > >> > 2011-06-29 20:20:13,671 INFO org.apache.hadoop.hdfs.StateChange:
>> > BLOCK*
>> > >> ask
>> > >> > 10.1.104.5:50010 to delete  blk_-4056387895369608597_675174
>> > >> > blk_-5017882805850873821_672281 blk_702373987100607684_672288
>> > >> > blk_-5357157478043290010_668506 blk_7118175133735412789_674903
>> > >> > blk_-3569812563715986384_675231 blk_8296855057240604851_669285
>> > >> > blk_-6483679172530609101_674268 blk_8738539715363739108_673682
>> > >> > blk_1744841904626813502_675238 blk_-6035315106100051103_674266
>> > >> > blk_-1789501623010070237_674908 blk_1944054629336265129_673689
>> > >> > blk_3674866614142268536_674205 blk_7930425446738143892_647410
>> > >> > blk_-3007186753042268449_669296 blk_-5482302621772778061_647416
>> > >> > blk_-3765735404924932181_672004 blk_7476090998956811081_675169
>> > >> > blk_7862291659285127712_646890 blk_-2666244746343584727_672013
>> > >> > blk_6039172613960915602_674206 blk_-8470884397893086564_646899
>> > >> > blk_4558230221166712802_668510
>> > >> > 2011-06-29 20:20:46,698 INFO org.apache.hadoop.hdfs.StateChange:
>> > BLOCK*
>> > >> ask
>> > >> > 10.1.104.3:50010 to delete  blk_-7851606440036350812_671552
>> > >> > blk_9214649160203453845_647566 blk_702373987100607684_672288
>> > >> > blk_5958099369749234073_668143 blk_-5172218034084903173_673109
>> > >> > blk_-2934555181472719276_646476 blk_-1409986679370073931_672552
>> > >> > blk_-2786034325506235869_669086 blk_3674866614142268536_674205
>> > >> > blk_510158930393283118_673225 blk_916244738216205237_677068
>> > >> > blk_-4317027806407316617_670379 blk_8555705688850972639_673485
>> > >> > blk_-3765735404924932181_672004 blk_-5482302621772778061_647416
>> > >> > blk_-2461801145731752623_674605 blk_-8737702908048998927_672549
>> > >> > blk_-8470884397893086564_646899 blk_4558230221166712802_668510
>> > >> > blk_-4056387895369608597_675174 blk_-8675430610673886073_647695
>> > >> > blk_-6642870230256028318_668211 blk_-3890408516362176771_677483
>> > >> > blk_-3569812563715986384_675231 blk_-5007142629771321873_674548
>> > >> > blk_-3345355191863431669_667066 blk_8296855057240604851_669285
>> > >> > blk_-6595462308187757470_672420 blk_-2583945228783203947_674607
>> > >> > blk_-346988625120916345_677063 blk_4449525876338684218_674496
>> > >> > blk_2617172363857549730_668201 blk_8738539715363739108_673682
>> > >> > blk_-208904675456598428_679286 blk_-497549341281882641_646477
>> > >> > blk_-6035315106100051103_674266 blk_-2356539038067297411_672388
>> > >> > blk_-3881703084497103249_668137 blk_2214397881104950315_646643
>> > >> > blk_-5907671443455357710_673223 blk_-2431880309956605679_669204
>> > >> > blk_6039172613960915602_674206 blk_5053643911633142711_669194
>> > >> > blk_-2636977729205236686_674664
>> > >> >
>> > >> > I assume the file loss is somehow related to this change and the
>> major
>> > >> > compaction that followed because the same scan that is failing
now
>> was
>> > >> > working fine yesterday and that is the only changed that happened
on
>> > the
>> > >> > cluster. Any suggestions what to do now?
>> > >> >
>> > >> > Thanks.
>> > >> >
>> > >> > -eran
>> > >> >
>> > >>
>> > >
>> >
>>
>

Mime
View raw message