accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mohit.kaushik" <mohit.kaus...@orkash.com>
Subject Re: Problem during compacting a table
Date Wed, 05 Aug 2015 09:28:48 GMT
After a long stuck the compaction is complete for the table but the 
question is still same. why does the shell stuck on io for so long ???
/
2015-08-05 12:28:50,583 [Shell.audit] INFO : root@orkash page_content> 
compact -w
2015-08-05 12:28:50,586 [shell.Shell] INFO : Compacting table ...
2015-08-05 12:30:51,563 [impl.ThriftTransportPool] WARN : Thread "shell" 
stuck on IO  to orkash4:9999 (0) for at least 120031 ms
2015-08-05 13:26:24,301 [impl.ThriftTransportPool] INFO : Thread "shell" 
no longer stuck on IO  to orkash4:9999 (0) sawError = false
2015-08-05 13:26:24,319 [shell.Shell] INFO : Compaction of table 
page_content completed for given range/

On 08/05/2015 12:20 PM, mohit.kaushik wrote:
> There errors are shown in logs of Hadoop namenode and slaves...
>
> *Namenode**log*
> /2015-08-05 12:05:14,518 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment 
> at 391508
> 2015-08-05 12:05:14,664 INFO BlockStateChange: BLOCK* ask 
> 192.168.10.121:50010 to replicate blk_1073780327_39560 to datanode(s) 
> 192.168.10.122:50010
> 2015-08-05 12:05:14,664 INFO BlockStateChange: BLOCK* ask 
> 192.168.10.121:50010 to replicate blk_1073780379_39612 to datanode(s) 
> 192.168.10.122:50010
> 2015-08-05 12:05:24,621 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 192.168.10.122:50010 is added to 
> blk_1073782847_42080 size 134217728
> 2015-08-05 12:05:26,665 INFO BlockStateChange: BLOCK* ask 
> 192.168.10.121:50010 to replicate blk_1073780611_39844 to datanode(s) 
> 192.168.10.122:50010
> 2015-08-05 12:05:27,232 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 192.168.10.122:50010 is added to 
> blk_1073793941_53178 size 134217728
> 2015-08-05 12:05:27,950 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 192.168.10.122:50010 is added to 
> blk_1073783859_43092 size 134217728
> 2015-08-05 12:05:28,798 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 192.168.10.122:50010 is added to 
> blk_1073793387_52620 size 22496
> 2015-08-05 12:05:29,666 INFO BlockStateChange: BLOCK* ask 
> 192.168.10.123:50010 to replicate blk_1073780678_39911 to datanode(s) 
> 192.168.10.121:50010
> 2015-08-05 12:05:29,666 INFO BlockStateChange: BLOCK* ask 
> 192.168.10.121:50010 to replicate blk_1073780682_39915 to datanode(s) 
> 192.168.10.122:50010
> 2015-08-05 12:05:32,002 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 192.168.10.122:50010 is added to 
> blk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW],

> ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW],

> ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]}

> size 0
> 2015-08-05 12:05:32,072 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 192.168.10.121:50010 is added to 
> blk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW],

> ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW],

> ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]}

> size 0
> 2015-08-05 12:05:32,129 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 192.168.10.123:50010 is added to 
> blk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW],

> ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW],

> ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]}

> size 0/................and more
>
> *Slave log **(too many)*
> /k_1073794728_53972 on DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, 
> because the block scanner is disabled.
> 2015-08-05 11:50:30,438 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning 
> suspicious block 
> BP-2102462487-192.168.10.124-1436956492274:blk_1073794738_53982 on 
> DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is 
> disabled.
> 2015-08-05 11:50:31,024 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning 
> suspicious block 
> BP-2102462487-192.168.10.124-1436956492274:blk_1073794728_53972 on 
> DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is 
> disabled.
> 2015-08-05 11:50:31,027 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning 
> suspicious block 
> BP-2102462487-192.168.10.124-1436956492274:blk_1073794738_53982 on 
> DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is 
> disabled.
> 2015-08-05 11:50:31,095 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning 
> suspicious block 
> BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on 
> DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is 
> disabled.
> 2015-08-05 11:50:31,105 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning 
> suspicious block 
> BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on 
> DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is 
> disabled.
> 2015-08-05 11:50:31,136 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning 
> suspicious block 
> BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on 
> DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is 
> disabled.
> 2015-08-05 11:50:31,136 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning 
> suspicious block 
> BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on 
> DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is 
> disabled./
>
>
> I am using locality groups so its a *NEED* to compact tables.... plz 
> explain how can I get rid of suspicious blocks.
>
> Thanks
>
> On 08/05/2015 10:53 AM, mohit.kaushik wrote:
>> yes, One of my datanode was down because disk was detached for some 
>> time and tserver was lost for that node but Its Up and running again.
>>
>> fsck show that the file system is healthy. but with so many msgs 
>> reporting under replicated blocks while my replication factor is 3 it 
>> shows required is 5.
>>
>> //user/root/.Trash/Current/accumulo/tables/+r/root_tablet/delete+A0000d29.rf+F0000d28.rf:

>> Under replicated 
>> BP-2102462487-192.168.10.124-1436956492274:blk_1073796198_55442. 
>> Target Replicas is 5 but found 3 replica(s).///
>>
>> Thanks & Regards
>> Mohit Kaushik
>>
>> On 08/04/2015 09:18 PM, John Vines wrote:
>>> It looks like an hdfs issue. Did a datanode go down? Did you turn 
>>> replication down to 1? The combination of those two errors would 
>>> definitely cause the problems your seeing as the latter disables any 
>>> sort of robustness of the underlying filesystem.
>>>
>>> On Tue, Aug 4, 2015 at 8:10 AM mohit.kaushik 
>>> <mohit.kaushik@orkash.com <mailto:mohit.kaushik@orkash.com>> wrote:
>>>
>>>     On 08/04/2015 05:35 PM, mohit.kaushik wrote:
>>>>     Hello All,
>>>>
>>>>     I am using Apache Accumulo-1.6.3 with Apache Hadoop-2.7.0 on a
>>>>     3 node cluster. when I give compact command from the shell it
>>>>     gives the folloing warn.
>>>>
>>>>     root@orkash testScan> compact -w
>>>>     2015-08-04 17:10:52,702 [Shell.audit] INFO : root@orkash
>>>>     testScan> compact -w
>>>>     2015-08-04 17:10:52,706 [shell.Shell] INFO : Compacting table ...
>>>>     2015-08-04 17:12:53,986 [impl.ThriftTransportPool] *WARN :
>>>>     Thread "shell" stuck on IO  to orkash4:9999 (0) for at least
>>>>     120034 ms*
>>>>
>>>>
>>>>     Tablet Servers show problem regarding a data block. which is
>>>>     something like HDFS-8659
>>>>     <https://issues.apache.org/jira/browse/HDFS-8659>
>>>>
>>>>     /2015-08-04 15:00:27,825 [hdfs.DFSClient] WARN : Failed to
>>>>     connect to /192.168.10.121:50010 <http://192.168.10.121:50010>
>>>>     for block, add to deadNodes and continue. java.io.IOException:
>>>>     Got error, status message opReadBlock
>>>>     BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911
>>>>     received exception
>>>>     org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica
>>>>     not found for
>>>>     BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911, for
>>>>     OP_READ_BLOCK, self=/192.168.10.121:38752
>>>>     <http://192.168.10.121:38752>, remote=/192.168.10.121:50010
>>>>     <http://192.168.10.121:50010>, for file
>>>>     /accumulo/tables/h/t-000016s/F000016t.rf, for pool
>>>>     BP-2102462487-192.168.10.124-1436956492274 block 1073780678_39911//
>>>>     //java.io.IOException: Got error, status message opReadBlock
>>>>     BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911
>>>>     received exception
>>>>     org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica
>>>>     not found for
>>>>     BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911, for
>>>>     OP_READ_BLOCK, self=/192.168.10.121:38752
>>>>     <http://192.168.10.121:38752>, remote=/192.168.10.121:50010
>>>>     <http://192.168.10.121:50010>, for file
>>>>     /accumulo/tables/h/t-000016s/F000016t.rf, for pool
>>>>     BP-2102462487-192.168.10.124-1436956492274 block 1073780678_39911//
>>>>     //        at
>>>>     org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140)//
>>>>     //        at
>>>>     org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:456)//
>>>>     //        at
>>>>     org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:424)//
>>>>     //        at
>>>>     org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:814)//
>>>>     //        at
>>>>     org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:693)//
>>>>     //        at
>>>>     org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:352)//
>>>>     //        at
>>>>     org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)//
>>>>     //        at
>>>>     org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)//
>>>>     //        at
>>>>     org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)//
>>>>     //        at
>>>>     java.io.DataInputStream.read(DataInputStream.java:149)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream$1.run(BoundedRangeFileInputStream.java:104)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream$1.run(BoundedRangeFileInputStream.java:100)//
>>>>     //        at java.security.AccessController.doPrivileged(Native
>>>>     Method)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:100)//
>>>>     //        at
>>>>     org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159)//
>>>>     //        at
>>>>     org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143)//
>>>>     //        at
>>>>     org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)//
>>>>     //        at
>>>>     java.io.BufferedInputStream.fill(BufferedInputStream.java:235)//
>>>>     //        at
>>>>     java.io.BufferedInputStream.read(BufferedInputStream.java:254)//
>>>>     //        at
>>>>     java.io.FilterInputStream.read(FilterInputStream.java:83)//
>>>>     //        at
>>>>     java.io.DataInputStream.readInt(DataInputStream.java:387)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$IndexBlock.readFields(MultiLevelIndex.java:269)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader.getIndexBlock(MultiLevelIndex.java:724)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader.access$100(MultiLevelIndex.java:497)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$Node.getNext(MultiLevelIndex.java:587)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$Node.getNextNode(MultiLevelIndex.java:593)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$IndexIterator.getNextNode(MultiLevelIndex.java:616)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$IndexIterator.next(MultiLevelIndex.java:659)//
>>>>     //        at
>>>>     org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader._next(RFile.java:559)/
>>>>
>>>>     Regards
>>>>     Mohit Kaushik
>>>>
>>>>     **
>>>>
>>>     And Compaction never completes
>>>


-- 
Signature

*Mohit Kaushik*
Software Engineer
A Square,Plot No. 278, Udyog Vihar, Phase 2, Gurgaon 122016, India
*Tel:*+91 (124) 4969352 | *Fax:*+91 (124) 4033553

<http://politicomapper.orkash.com>interactive social intelligence at work...

<https://www.facebook.com/Orkash2012> 
<http://www.linkedin.com/company/orkash-services-private-limited> 
<https://twitter.com/Orkash> <http://www.orkash.com/blog/> 
<http://www.orkash.com>
<http://www.orkash.com> ... ensuring Assurance in complexity and uncertainty

/This message including the attachments, if any, is a confidential 
business communication. If you are not the intended recipient it may be 
unlawful for you to read, copy, distribute, disclose or otherwise use 
the information in this e-mail. If you have received it in error or are 
not the intended recipient, please destroy it and notify the sender 
immediately. Thank you /


Mime
View raw message