hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-15898) 1 - 1.5 TB Data size fails to run with the following error
Date Sat, 03 Nov 2018 15:43:00 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Srinivas updated HADOOP-15898:
------------------------------
    Description: 
There is a business impact MR job which runs every day @ 2.00 PM PST and data size is about
1 - 1.5 TB (depends on the business days) . Ideal elapsed time of this job : 4 hrs.  But
the multiple  mappers of this job simultaneously  failing  with the following error so
job will take some times 11 and even 13 hours also like that.  

Steps to prevent this problem : 1, Migrated the environment to Yarn .2 increased the ulimit
3. Added extra nodes to the cluster. 4. Disks replacement taking place regularly 5. Monitoring
the cluster and terminating other jobs which impacts this job.  But no luck.

WARN [DataStreamer for file /analytical_profile/DMP_analytical_profile/Turn/SAUP/2018_11_02_tmp/tmp/part-01357.5789
block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089]

org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089
in pipeline DatanodeInfoWithStorage
[10.0.1.37:50010,DS-ed333d2e-839a-4029-a1c9-b6615c322ed2,DISK],

 DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK],

DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK]:(

bad datanode DatanodeInfoWithStorage[10.0.1.37:50010,DS-ed333d2e-839a-4029-a1c9-b6615c322ed2,DISK]

 

WARN [DataStreamer for file /analytical_profile/DMP_analytical_profile/Turn/SAUP/2018_11_02_tmp/tmp/part-01357.5789
block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089] org.apache.hadoop.hdfs.DFSClient:
Error Recovery for block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089
in pipeline DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK],
DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK]:
bad datanode DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK]

 

WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException:
java.io.IOException: All datanodes DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK]
are bad. Aborting... at com.turn.platform.cheetah.storage.dmp.analytical_profile.merge.IncrementalProfileMergerMapper.close(IncrementalProfileMergerMapper.java:1185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)

 

  was:
There is a business impact MR job which runs every day @ 2.00 PM PST and data size is about
1 - 1.5 TB (depends on the business days) . Ideal elapsed time of this job : 4 hrs.  But
the multiple  mappers of this job simultaneously  failing  with the following error so
job will take some times 11 and even 13 hours also like that.  

Steps to prevent this problem : 1, Migrated the environment to Yarn .2 increased the ulimit
3. Added extra nodes to the cluster. 4. Disks replacement taking place regularly  But no
luck.

WARN [DataStreamer for file /analytical_profile/DMP_analytical_profile/Turn/SAUP/2018_11_02_tmp/tmp/part-01357.5789
block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089]

org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089
in pipeline DatanodeInfoWithStorage
[10.0.1.37:50010,DS-ed333d2e-839a-4029-a1c9-b6615c322ed2,DISK],

 DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK],

DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK]:(

bad datanode DatanodeInfoWithStorage[10.0.1.37:50010,DS-ed333d2e-839a-4029-a1c9-b6615c322ed2,DISK]

 

WARN [DataStreamer for file /analytical_profile/DMP_analytical_profile/Turn/SAUP/2018_11_02_tmp/tmp/part-01357.5789
block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089] org.apache.hadoop.hdfs.DFSClient:
Error Recovery for block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089
in pipeline DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK],
DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK]:
bad datanode DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK]

 

WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException:
java.io.IOException: All datanodes DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK]
are bad. Aborting... at com.turn.platform.cheetah.storage.dmp.analytical_profile.merge.IncrementalProfileMergerMapper.close(IncrementalProfileMergerMapper.java:1185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)

 


> 1 - 1.5 TB Data size fails to run with the following error 
> -----------------------------------------------------------
>
>                 Key: HADOOP-15898
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15898
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: performance
>    Affects Versions: 2.6.0
>         Environment: Hadoop 2.6.0-cdh5.5.1
>  
>  
>            Reporter: Srinivas
>            Priority: Major
>              Labels: performance
>             Fix For: 2.6.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> There is a business impact MR job which runs every day @ 2.00 PM PST and data size is
about 1 - 1.5 TB (depends on the business days) . Ideal elapsed time of this job : 4 hrs. 
But the multiple  mappers of this job simultaneously  failing  with the following error
so job will take some times 11 and even 13 hours also like that.  
> Steps to prevent this problem : 1, Migrated the environment to Yarn .2 increased the
ulimit 3. Added extra nodes to the cluster. 4. Disks replacement taking place regularly 5.
Monitoring the cluster and terminating other jobs which impacts this job.  But no luck.
> WARN [DataStreamer for file /analytical_profile/DMP_analytical_profile/Turn/SAUP/2018_11_02_tmp/tmp/part-01357.5789
> block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089]
> org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089
in pipeline DatanodeInfoWithStorage
> [10.0.1.37:50010,DS-ed333d2e-839a-4029-a1c9-b6615c322ed2,DISK],
>  DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK],
> DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK]:(
> bad datanode DatanodeInfoWithStorage[10.0.1.37:50010,DS-ed333d2e-839a-4029-a1c9-b6615c322ed2,DISK]
>  
> WARN [DataStreamer for file /analytical_profile/DMP_analytical_profile/Turn/SAUP/2018_11_02_tmp/tmp/part-01357.5789
block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089] org.apache.hadoop.hdfs.DFSClient:
Error Recovery for block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089
in pipeline DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK],
DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK]:
bad datanode DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK]
>  
> WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException:
java.io.IOException: All datanodes DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK]
are bad. Aborting... at com.turn.platform.cheetah.storage.dmp.analytical_profile.merge.IncrementalProfileMergerMapper.close(IncrementalProfileMergerMapper.java:1185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message