hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Errors after major compaction
Date Fri, 01 Jul 2011 06:05:40 GMT
So, Eran, it seems as though two RegionServers were carrying the
region?  One deleted a file (compaction on its side)?  Can you figure
if indeed two servers had same region?  (Check master logs for this
regions assignments).

What version of hbase?

St.Ack


On Thu, Jun 30, 2011 at 3:58 AM, Eran Kutner <eran@gigya.com> wrote:
> Hi,
> I have a cluster of 5 nodes with one large table that currently has around
> 12000 regions. Everything was working fine for relatively long time, until
> now.
> Yesterday I significantly reduced the TTL on the table and initiated major
> compaction. This should have reduced the table size to about 20% of its
> original size.
> Today, I'm getting errors of inaccessible files on HDFS, for example:
> java.io.IOException: Got error in response to OP_READ_BLOCK self=/
> 10.1.104.2:58047, remote=/10.1.104.2:50010 for file
> /hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/events/1971818821800304360
> for block 3674866614142268536_674205
>        at
> org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1487)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1811)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1948)
>        at java.io.DataInputStream.read(DataInputStream.java:132)
>        at
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:105)
>        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>        at
> org.apache.hadoop.io.compress.BlockDecompressorStream.rawReadInt(BlockDecompressorStream.java:128)
>        at
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:68)
>        at
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:75)
>        at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
>        at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102)
>        at
> org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1094)
>        at
> org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:1036)
>        at
> org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.seekTo(HFile.java:1433)
>        at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:139)
>        at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:96)
>        at
> org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:77)
>        at
> org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1341)
>        at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.<init>(HRegion.java:2269)
>        at
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateInternalScanner(HRegion.java:1126)
>        at
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1118)
>        at
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1102)
>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1781)
>        at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
>
> I checked and the file, indeed doesn't exist on HDFS, here is the name node
> logs for this block, apparently because it was deleted:
> 2011-06-19 21:39:36,651 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.allocateBlock:
> /hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/.tmp/2096863423111131624.
> blk_3674866614142268536_674205
> 2011-06-19 21:40:11,954 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated: 10.1.104.2:50010 is added to
> blk_3674866614142268536_674205 size 67108864
> 2011-06-19 21:40:11,954 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated: 10.1.104.3:50010 is added to
> blk_3674866614142268536_674205 size 67108864
> 2011-06-19 21:40:11,955 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated: 10.1.104.5:50010 is added to
> blk_3674866614142268536_674205 size 67108864
> 2011-06-29 20:20:01,662 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
> 10.1.104.2:50010 to delete  blk_3674866614142268536_674205
> 2011-06-29 20:20:13,671 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
> 10.1.104.5:50010 to delete  blk_-4056387895369608597_675174
> blk_-5017882805850873821_672281 blk_702373987100607684_672288
> blk_-5357157478043290010_668506 blk_7118175133735412789_674903
> blk_-3569812563715986384_675231 blk_8296855057240604851_669285
> blk_-6483679172530609101_674268 blk_8738539715363739108_673682
> blk_1744841904626813502_675238 blk_-6035315106100051103_674266
> blk_-1789501623010070237_674908 blk_1944054629336265129_673689
> blk_3674866614142268536_674205 blk_7930425446738143892_647410
> blk_-3007186753042268449_669296 blk_-5482302621772778061_647416
> blk_-3765735404924932181_672004 blk_7476090998956811081_675169
> blk_7862291659285127712_646890 blk_-2666244746343584727_672013
> blk_6039172613960915602_674206 blk_-8470884397893086564_646899
> blk_4558230221166712802_668510
> 2011-06-29 20:20:46,698 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
> 10.1.104.3:50010 to delete  blk_-7851606440036350812_671552
> blk_9214649160203453845_647566 blk_702373987100607684_672288
> blk_5958099369749234073_668143 blk_-5172218034084903173_673109
> blk_-2934555181472719276_646476 blk_-1409986679370073931_672552
> blk_-2786034325506235869_669086 blk_3674866614142268536_674205
> blk_510158930393283118_673225 blk_916244738216205237_677068
> blk_-4317027806407316617_670379 blk_8555705688850972639_673485
> blk_-3765735404924932181_672004 blk_-5482302621772778061_647416
> blk_-2461801145731752623_674605 blk_-8737702908048998927_672549
> blk_-8470884397893086564_646899 blk_4558230221166712802_668510
> blk_-4056387895369608597_675174 blk_-8675430610673886073_647695
> blk_-6642870230256028318_668211 blk_-3890408516362176771_677483
> blk_-3569812563715986384_675231 blk_-5007142629771321873_674548
> blk_-3345355191863431669_667066 blk_8296855057240604851_669285
> blk_-6595462308187757470_672420 blk_-2583945228783203947_674607
> blk_-346988625120916345_677063 blk_4449525876338684218_674496
> blk_2617172363857549730_668201 blk_8738539715363739108_673682
> blk_-208904675456598428_679286 blk_-497549341281882641_646477
> blk_-6035315106100051103_674266 blk_-2356539038067297411_672388
> blk_-3881703084497103249_668137 blk_2214397881104950315_646643
> blk_-5907671443455357710_673223 blk_-2431880309956605679_669204
> blk_6039172613960915602_674206 blk_5053643911633142711_669194
> blk_-2636977729205236686_674664
>
> I assume the file loss is somehow related to this change and the major
> compaction that followed because the same scan that is failing now was
> working fine yesterday and that is the only changed that happened on the
> cluster. Any suggestions what to do now?
>
> Thanks.
>
> -eran
>

Mime
View raw message