hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-15898) 1 TB TeraGen fails to run with the following error
Date Sat, 03 Nov 2018 15:39:00 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Srinivas updated HADOOP-15898:
------------------------------
    Summary: 1 TB TeraGen fails to run with the following error   (was: WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child  : java.io.IOException: java.io.IOException: All datanodes DatanodeInfoWithStorage
[[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK] are bad. Aborting...)

> 1 TB TeraGen fails to run with the following error 
> ---------------------------------------------------
>
>                 Key: HADOOP-15898
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15898
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: performance
>    Affects Versions: 2.6.0
>         Environment: Hadoop 2.6.0-cdh5.5.1
>  
>  
>            Reporter: Srinivas
>            Priority: Major
>              Labels: performance
>             Fix For: 2.6.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> There is a business impact MR job which runs every day @ 2.00 PM PST and data size is
about 1 - 1.5 TB (depends on the business days) . Ideal elapsed time of this job : 4 hrs. 
But the multiple  mappers of this job simultaneously  failing  with the following error
so job will take some times 11 and even 13 hours also like that.  
> Steps to prevent this problem : 1, Migrated the environment to Yarn .2 increased the
ulimit 3. Added extra nodes to the cluster. 4. Disks replacement taking place regularly 
But no luck.
> WARN [DataStreamer for file /analytical_profile/DMP_analytical_profile/Turn/SAUP/2018_11_02_tmp/tmp/part-01357.5789
> block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089]
> org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089
in pipeline DatanodeInfoWithStorage
> [10.0.1.37:50010,DS-ed333d2e-839a-4029-a1c9-b6615c322ed2,DISK],
>  DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK],
> DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK]:(
> bad datanode DatanodeInfoWithStorage[10.0.1.37:50010,DS-ed333d2e-839a-4029-a1c9-b6615c322ed2,DISK]
>  
> WARN [DataStreamer for file /analytical_profile/DMP_analytical_profile/Turn/SAUP/2018_11_02_tmp/tmp/part-01357.5789
block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089] org.apache.hadoop.hdfs.DFSClient:
Error Recovery for block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089
in pipeline DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK],
DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK]:
bad datanode DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK]
>  
> WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException:
java.io.IOException: All datanodes DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK]
are bad. Aborting... at com.turn.platform.cheetah.storage.dmp.analytical_profile.merge.IncrementalProfileMergerMapper.close(IncrementalProfileMergerMapper.java:1185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message