crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Everett Anderson <ever...@nuna.com>
Subject LeaseExpiredExceptions and temp side effect files
Date Fri, 14 Aug 2015 21:10:28 GMT
Hi,

I recently started trying to run our Crunch pipeline on more data and have
been trying out different AWS instance types in anticipation of our storage
and compute needs.

I was using EMR 3.8 (so Hadoop 2.4.0) with Crunch 0.12 (patched with the
CRUNCH-553 <https://issues.apache.org/jira/browse/CRUNCH-553> fix).

Our pipeline finishes fine in these cluster configurations:

   - 50 c3.4xlarge Core, 0 Task
   - 10 c3.8xlarge Core, 0 Task
   - 25 c3.8xlarge Core, 0 Task

However, it always fails on the same data when using 10 cc2.8xlarge Core
instances.

The biggest obvious hardware difference is that the cc2.8xlarges use hard
disks instead of SSDs.

While it's a little hard to track down the exact originating failure, I
think it's from errors like:

2015-08-13 21:34:38,379 ERROR [IPC Server handler 24 on 45711]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task:
attempt_1439499407003_0028_r_000153_1 - exited :
org.apache.crunch.CrunchRuntimeException:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on
/tmp/crunch-970849245/p662/output/_temporary/1/_temporary/attempt_1439499407003_out7_0028_r_000153_1/out7-r-00153:
File does not exist. Holder
DFSClient_attempt_1439499407003_0028_r_000153_1_609888542_1 does not have
any open files.

Those paths look like these side effect files
<https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapred/FileOutputFormat.html#getWorkOutputPath(org.apache.hadoop.mapred.JobConf)>
.

Would Crunch have generated applications that depend on side effect paths
as input across MapReduce applications and something in HDFS is cleaning up
those paths, unaware of the higher level dependencies? AWS configures
Hadoop differently for each instance type, and might have more aggressive
cleanup settings on HDs, though this is very uninformed hypothesis.

A sample full log is attached.

Thanks for any guidance!

- Everett

-- 
*DISCLAIMER:* The contents of this email, including any attachments, may 
contain information that is confidential, proprietary in nature, protected 
health information (PHI), or otherwise protected by law from disclosure, 
and is solely for the use of the intended recipient(s). If you are not the 
intended recipient, you are hereby notified that any use, disclosure or 
copying of this email, including any attachments, is unauthorized and 
strictly prohibited. If you have received this email in error, please 
notify the sender of this email. Please delete this and all copies of this 
email from your system. Any opinions either expressed or implied in this 
email and all attachments, are those of its author only, and do not 
necessarily reflect those of Nuna Health, Inc.

Mime
View raw message