accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Problem during compacting a table
Date Wed, 05 Aug 2015 15:24:28 GMT
I'm not really sure what that error message means without doing more 
digging. Copying your email to user@hadoop.apache.org might shed some 
light on what the error means if you want to try that.

mohit.kaushik wrote:
> There errors are shown in logs of Hadoop namenode and slaves...
>
> *Namenode**log*
> /2015-08-05 12:05:14,518 INFO
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment
> at 391508
> 2015-08-05 12:05:14,664 INFO BlockStateChange: BLOCK* ask
> 192.168.10.121:50010 to replicate blk_1073780327_39560 to datanode(s)
> 192.168.10.122:50010
> 2015-08-05 12:05:14,664 INFO BlockStateChange: BLOCK* ask
> 192.168.10.121:50010 to replicate blk_1073780379_39612 to datanode(s)
> 192.168.10.122:50010
> 2015-08-05 12:05:24,621 INFO BlockStateChange: BLOCK* addStoredBlock:
> blockMap updated: 192.168.10.122:50010 is added to blk_1073782847_42080
> size 134217728
> 2015-08-05 12:05:26,665 INFO BlockStateChange: BLOCK* ask
> 192.168.10.121:50010 to replicate blk_1073780611_39844 to datanode(s)
> 192.168.10.122:50010
> 2015-08-05 12:05:27,232 INFO BlockStateChange: BLOCK* addStoredBlock:
> blockMap updated: 192.168.10.122:50010 is added to blk_1073793941_53178
> size 134217728
> 2015-08-05 12:05:27,950 INFO BlockStateChange: BLOCK* addStoredBlock:
> blockMap updated: 192.168.10.122:50010 is added to blk_1073783859_43092
> size 134217728
> 2015-08-05 12:05:28,798 INFO BlockStateChange: BLOCK* addStoredBlock:
> blockMap updated: 192.168.10.122:50010 is added to blk_1073793387_52620
> size 22496
> 2015-08-05 12:05:29,666 INFO BlockStateChange: BLOCK* ask
> 192.168.10.123:50010 to replicate blk_1073780678_39911 to datanode(s)
> 192.168.10.121:50010
> 2015-08-05 12:05:29,666 INFO BlockStateChange: BLOCK* ask
> 192.168.10.121:50010 to replicate blk_1073780682_39915 to datanode(s)
> 192.168.10.122:50010
> 2015-08-05 12:05:32,002 INFO BlockStateChange: BLOCK* addStoredBlock:
> blockMap updated: 192.168.10.122:50010 is added to
> blk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
> primaryNodeIndex=-1,
> replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW],
> ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW],
> ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]}
> size 0
> 2015-08-05 12:05:32,072 INFO BlockStateChange: BLOCK* addStoredBlock:
> blockMap updated: 192.168.10.121:50010 is added to
> blk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
> primaryNodeIndex=-1,
> replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW],
> ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW],
> ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]}
> size 0
> 2015-08-05 12:05:32,129 INFO BlockStateChange: BLOCK* addStoredBlock:
> blockMap updated: 192.168.10.123:50010 is added to
> blk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
> primaryNodeIndex=-1,
> replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW],
> ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW],
> ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]}
> size 0/................and more
>
> *Slave log **(too many)*
> /k_1073794728_53972 on DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because
> the block scanner is disabled.
> 2015-08-05 11:50:30,438 INFO
> org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning
> suspicious block
> BP-2102462487-192.168.10.124-1436956492274:blk_1073794738_53982 on
> DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is
> disabled.
> 2015-08-05 11:50:31,024 INFO
> org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning
> suspicious block
> BP-2102462487-192.168.10.124-1436956492274:blk_1073794728_53972 on
> DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is
> disabled.
> 2015-08-05 11:50:31,027 INFO
> org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning
> suspicious block
> BP-2102462487-192.168.10.124-1436956492274:blk_1073794738_53982 on
> DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is
> disabled.
> 2015-08-05 11:50:31,095 INFO
> org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning
> suspicious block
> BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on
> DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is
> disabled.
> 2015-08-05 11:50:31,105 INFO
> org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning
> suspicious block
> BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on
> DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is
> disabled.
> 2015-08-05 11:50:31,136 INFO
> org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning
> suspicious block
> BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on
> DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is
> disabled.
> 2015-08-05 11:50:31,136 INFO
> org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning
> suspicious block
> BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on
> DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is
> disabled./
>
>
> I am using locality groups so its a *NEED* to compact tables.... plz
> explain how can I get rid of suspicious blocks.
>
> Thanks
>
> On 08/05/2015 10:53 AM, mohit.kaushik wrote:
>> yes, One of my datanode was down because disk was detached for some
>> time and tserver was lost for that node but Its Up and running again.
>>
>> fsck show that the file system is healthy. but with so many msgs
>> reporting under replicated blocks while my replication factor is 3 it
>> shows required is 5.
>>
>> //user/root/.Trash/Current/accumulo/tables/+r/root_tablet/delete+A0000d29.rf+F0000d28.rf:
>> Under replicated
>> BP-2102462487-192.168.10.124-1436956492274:blk_1073796198_55442.
>> Target Replicas is 5 but found 3 replica(s).///
>>
>> Thanks & Regards
>> Mohit Kaushik
>>
>> On 08/04/2015 09:18 PM, John Vines wrote:
>>> It looks like an hdfs issue. Did a datanode go down? Did you turn
>>> replication down to 1? The combination of those two errors would
>>> definitely cause the problems your seeing as the latter disables any
>>> sort of robustness of the underlying filesystem.
>>>
>>> On Tue, Aug 4, 2015 at 8:10 AM mohit.kaushik
>>> <mohit.kaushik@orkash.com <mailto:mohit.kaushik@orkash.com>> wrote:
>>>
>>>     On 08/04/2015 05:35 PM, mohit.kaushik wrote:
>>>>     Hello All,
>>>>
>>>>     I am using Apache Accumulo-1.6.3 with Apache Hadoop-2.7.0 on a 3
>>>>     node cluster. when I give compact command from the shell it
>>>>     gives the folloing warn.
>>>>
>>>>     root@orkash testScan> compact -w
>>>>     2015-08-04 17:10:52,702 [Shell.audit] INFO : root@orkash
>>>>     testScan> compact -w
>>>>     2015-08-04 17:10:52,706 [shell.Shell] INFO : Compacting table ...
>>>>     2015-08-04 17:12:53,986 [impl.ThriftTransportPool] *WARN :
>>>>     Thread "shell" stuck on IO  to orkash4:9999 (0) for at least
>>>>     120034 ms*
>>>>
>>>>
>>>>     Tablet Servers show problem regarding a data block. which is
>>>>     something like HDFS-8659
>>>>     <https://issues.apache.org/jira/browse/HDFS-8659>
>>>>
>>>>     /2015-08-04 15:00:27,825 [hdfs.DFSClient] WARN : Failed to
>>>>     connect to /192.168.10.121:50010 <http://192.168.10.121:50010>
>>>>     for block, add to deadNodes and continue. java.io.IOException:
>>>>     Got error, status message opReadBlock
>>>>     BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911
>>>>     received exception
>>>>     org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException:
>>>>     Replica not found for
>>>>     BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911,
>>>>     for OP_READ_BLOCK, self=/192.168.10.121:38752
>>>>     <http://192.168.10.121:38752>, remote=/192.168.10.121:50010
>>>>     <http://192.168.10.121:50010>, for file
>>>>     /accumulo/tables/h/t-000016s/F000016t.rf, for pool
>>>>     BP-2102462487-192.168.10.124-1436956492274 block 1073780678_39911//
>>>>     //java.io.IOException: Got error, status message opReadBlock
>>>>     BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911
>>>>     received exception
>>>>     org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException:
>>>>     Replica not found for
>>>>     BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911,
>>>>     for OP_READ_BLOCK, self=/192.168.10.121:38752
>>>>     <http://192.168.10.121:38752>, remote=/192.168.10.121:50010
>>>>     <http://192.168.10.121:50010>, for file
>>>>     /accumulo/tables/h/t-000016s/F000016t.rf, for pool
>>>>     BP-2102462487-192.168.10.124-1436956492274 block 1073780678_39911//
>>>>     //        at
>>>>     org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140)//
>>>>     //        at
>>>>     org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:456)//
>>>>     //        at
>>>>     org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:424)//
>>>>     //        at
>>>>     org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:814)//
>>>>     //        at
>>>>     org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:693)//
>>>>     //        at
>>>>     org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:352)//
>>>>     //        at
>>>>     org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)//
>>>>     //        at
>>>>     org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)//
>>>>     //        at
>>>>     org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)//
>>>>     //        at
>>>>     java.io.DataInputStream.read(DataInputStream.java:149)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream$1.run(BoundedRangeFileInputStream.java:104)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream$1.run(BoundedRangeFileInputStream.java:100)//
>>>>     //        at java.security.AccessController.doPrivileged(Native
>>>>     Method)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:100)//
>>>>     //        at
>>>>     org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159)//
>>>>     //        at
>>>>     org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143)//
>>>>     //        at
>>>>     org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)//
>>>>     //        at
>>>>     java.io.BufferedInputStream.fill(BufferedInputStream.java:235)//
>>>>     //        at
>>>>     java.io.BufferedInputStream.read(BufferedInputStream.java:254)//
>>>>     //        at
>>>>     java.io.FilterInputStream.read(FilterInputStream.java:83)//
>>>>     //        at
>>>>     java.io.DataInputStream.readInt(DataInputStream.java:387)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$IndexBlock.readFields(MultiLevelIndex.java:269)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader.getIndexBlock(MultiLevelIndex.java:724)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader.access$100(MultiLevelIndex.java:497)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$Node.getNext(MultiLevelIndex.java:587)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$Node.getNextNode(MultiLevelIndex.java:593)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$IndexIterator.getNextNode(MultiLevelIndex.java:616)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$IndexIterator.next(MultiLevelIndex.java:659)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader._next(RFile.java:559)/
>>>>
>>>>     Regards
>>>>     Mohit Kaushik
>>>>
>>>>     **
>>>>
>>>     And Compaction never completes
>>>

Mime
View raw message