accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mohit.kaushik" <mohit.kaus...@orkash.com>
Subject Re: Problem during compacting a table
Date Wed, 05 Aug 2015 06:50:34 GMT
There errors are shown in logs of Hadoop namenode and slaves...

*Namenode**log*
/2015-08-05 12:05:14,518 INFO 
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment 
at 391508
2015-08-05 12:05:14,664 INFO BlockStateChange: BLOCK* ask 
192.168.10.121:50010 to replicate blk_1073780327_39560 to datanode(s) 
192.168.10.122:50010
2015-08-05 12:05:14,664 INFO BlockStateChange: BLOCK* ask 
192.168.10.121:50010 to replicate blk_1073780379_39612 to datanode(s) 
192.168.10.122:50010
2015-08-05 12:05:24,621 INFO BlockStateChange: BLOCK* addStoredBlock: 
blockMap updated: 192.168.10.122:50010 is added to blk_1073782847_42080 
size 134217728
2015-08-05 12:05:26,665 INFO BlockStateChange: BLOCK* ask 
192.168.10.121:50010 to replicate blk_1073780611_39844 to datanode(s) 
192.168.10.122:50010
2015-08-05 12:05:27,232 INFO BlockStateChange: BLOCK* addStoredBlock: 
blockMap updated: 192.168.10.122:50010 is added to blk_1073793941_53178 
size 134217728
2015-08-05 12:05:27,950 INFO BlockStateChange: BLOCK* addStoredBlock: 
blockMap updated: 192.168.10.122:50010 is added to blk_1073783859_43092 
size 134217728
2015-08-05 12:05:28,798 INFO BlockStateChange: BLOCK* addStoredBlock: 
blockMap updated: 192.168.10.122:50010 is added to blk_1073793387_52620 
size 22496
2015-08-05 12:05:29,666 INFO BlockStateChange: BLOCK* ask 
192.168.10.123:50010 to replicate blk_1073780678_39911 to datanode(s) 
192.168.10.121:50010
2015-08-05 12:05:29,666 INFO BlockStateChange: BLOCK* ask 
192.168.10.121:50010 to replicate blk_1073780682_39915 to datanode(s) 
192.168.10.122:50010
2015-08-05 12:05:32,002 INFO BlockStateChange: BLOCK* addStoredBlock: 
blockMap updated: 192.168.10.122:50010 is added to 
blk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null, 
primaryNodeIndex=-1, 
replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW],

ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW],

ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]}

size 0
2015-08-05 12:05:32,072 INFO BlockStateChange: BLOCK* addStoredBlock: 
blockMap updated: 192.168.10.121:50010 is added to 
blk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null, 
primaryNodeIndex=-1, 
replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW],

ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW],

ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]}

size 0
2015-08-05 12:05:32,129 INFO BlockStateChange: BLOCK* addStoredBlock: 
blockMap updated: 192.168.10.123:50010 is added to 
blk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null, 
primaryNodeIndex=-1, 
replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW],

ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW],

ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]}

size 0/................and more

*Slave log **(too many)*
/k_1073794728_53972 on DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because 
the block scanner is disabled.
2015-08-05 11:50:30,438 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning 
suspicious block 
BP-2102462487-192.168.10.124-1436956492274:blk_1073794738_53982 on 
DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is 
disabled.
2015-08-05 11:50:31,024 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning 
suspicious block 
BP-2102462487-192.168.10.124-1436956492274:blk_1073794728_53972 on 
DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is 
disabled.
2015-08-05 11:50:31,027 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning 
suspicious block 
BP-2102462487-192.168.10.124-1436956492274:blk_1073794738_53982 on 
DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is 
disabled.
2015-08-05 11:50:31,095 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning 
suspicious block 
BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on 
DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is 
disabled.
2015-08-05 11:50:31,105 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning 
suspicious block 
BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on 
DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is 
disabled.
2015-08-05 11:50:31,136 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning 
suspicious block 
BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on 
DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is 
disabled.
2015-08-05 11:50:31,136 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning 
suspicious block 
BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on 
DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is 
disabled./


I am using locality groups so its a *NEED* to compact tables.... plz 
explain how can I get rid of suspicious blocks.

Thanks

On 08/05/2015 10:53 AM, mohit.kaushik wrote:
> yes, One of my datanode was down because disk was detached for some 
> time and tserver was lost for that node but Its Up and running again.
>
> fsck show that the file system is healthy. but with so many msgs 
> reporting under replicated blocks while my replication factor is 3 it 
> shows required is 5.
>
> //user/root/.Trash/Current/accumulo/tables/+r/root_tablet/delete+A0000d29.rf+F0000d28.rf:

> Under replicated 
> BP-2102462487-192.168.10.124-1436956492274:blk_1073796198_55442. 
> Target Replicas is 5 but found 3 replica(s).///
>
> Thanks & Regards
> Mohit Kaushik
>
> On 08/04/2015 09:18 PM, John Vines wrote:
>> It looks like an hdfs issue. Did a datanode go down? Did you turn 
>> replication down to 1? The combination of those two errors would 
>> definitely cause the problems your seeing as the latter disables any 
>> sort of robustness of the underlying filesystem.
>>
>> On Tue, Aug 4, 2015 at 8:10 AM mohit.kaushik 
>> <mohit.kaushik@orkash.com <mailto:mohit.kaushik@orkash.com>> wrote:
>>
>>     On 08/04/2015 05:35 PM, mohit.kaushik wrote:
>>>     Hello All,
>>>
>>>     I am using Apache Accumulo-1.6.3 with Apache Hadoop-2.7.0 on a 3
>>>     node cluster. when I give compact command from the shell it
>>>     gives the folloing warn.
>>>
>>>     root@orkash testScan> compact -w
>>>     2015-08-04 17:10:52,702 [Shell.audit] INFO : root@orkash
>>>     testScan> compact -w
>>>     2015-08-04 17:10:52,706 [shell.Shell] INFO : Compacting table ...
>>>     2015-08-04 17:12:53,986 [impl.ThriftTransportPool] *WARN :
>>>     Thread "shell" stuck on IO  to orkash4:9999 (0) for at least
>>>     120034 ms*
>>>
>>>
>>>     Tablet Servers show problem regarding a data block. which is
>>>     something like HDFS-8659
>>>     <https://issues.apache.org/jira/browse/HDFS-8659>
>>>
>>>     /2015-08-04 15:00:27,825 [hdfs.DFSClient] WARN : Failed to
>>>     connect to /192.168.10.121:50010 <http://192.168.10.121:50010>
>>>     for block, add to deadNodes and continue. java.io.IOException:
>>>     Got error, status message opReadBlock
>>>     BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911
>>>     received exception
>>>     org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException:
>>>     Replica not found for
>>>     BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911,
>>>     for OP_READ_BLOCK, self=/192.168.10.121:38752
>>>     <http://192.168.10.121:38752>, remote=/192.168.10.121:50010
>>>     <http://192.168.10.121:50010>, for file
>>>     /accumulo/tables/h/t-000016s/F000016t.rf, for pool
>>>     BP-2102462487-192.168.10.124-1436956492274 block 1073780678_39911//
>>>     //java.io.IOException: Got error, status message opReadBlock
>>>     BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911
>>>     received exception
>>>     org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException:
>>>     Replica not found for
>>>     BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911,
>>>     for OP_READ_BLOCK, self=/192.168.10.121:38752
>>>     <http://192.168.10.121:38752>, remote=/192.168.10.121:50010
>>>     <http://192.168.10.121:50010>, for file
>>>     /accumulo/tables/h/t-000016s/F000016t.rf, for pool
>>>     BP-2102462487-192.168.10.124-1436956492274 block 1073780678_39911//
>>>     //        at
>>>     org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140)//
>>>     //        at
>>>     org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:456)//
>>>     //        at
>>>     org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:424)//
>>>     //        at
>>>     org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:814)//
>>>     //        at
>>>     org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:693)//
>>>     //        at
>>>     org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:352)//
>>>     //        at
>>>     org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)//
>>>     //        at
>>>     org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)//
>>>     //        at
>>>     org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)//
>>>     //        at
>>>     java.io.DataInputStream.read(DataInputStream.java:149)//
>>>     //        at
>>>     org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream$1.run(BoundedRangeFileInputStream.java:104)//
>>>     //        at
>>>     org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream$1.run(BoundedRangeFileInputStream.java:100)//
>>>     //        at java.security.AccessController.doPrivileged(Native
>>>     Method)//
>>>     //        at
>>>     org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:100)//
>>>     //        at
>>>     org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159)//
>>>     //        at
>>>     org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143)//
>>>     //        at
>>>     org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)//
>>>     //        at
>>>     java.io.BufferedInputStream.fill(BufferedInputStream.java:235)//
>>>     //        at
>>>     java.io.BufferedInputStream.read(BufferedInputStream.java:254)//
>>>     //        at
>>>     java.io.FilterInputStream.read(FilterInputStream.java:83)//
>>>     //        at
>>>     java.io.DataInputStream.readInt(DataInputStream.java:387)//
>>>     //        at
>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$IndexBlock.readFields(MultiLevelIndex.java:269)//
>>>     //        at
>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader.getIndexBlock(MultiLevelIndex.java:724)//
>>>     //        at
>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader.access$100(MultiLevelIndex.java:497)//
>>>     //        at
>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$Node.getNext(MultiLevelIndex.java:587)//
>>>     //        at
>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$Node.getNextNode(MultiLevelIndex.java:593)//
>>>     //        at
>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$IndexIterator.getNextNode(MultiLevelIndex.java:616)//
>>>     //        at
>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$IndexIterator.next(MultiLevelIndex.java:659)//
>>>     //        at
>>>     org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader._next(RFile.java:559)/
>>>
>>>     Regards
>>>     Mohit Kaushik
>>>
>>>     **
>>>
>>     And Compaction never completes
>>

Mime
View raw message