hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Holden Robbins" <h.robb...@paritycomputing.com>
Subject Bugs in 0.16.0?
Date Sat, 01 Mar 2008 19:23:20 GMT
Hello,
 
I'm just starting to dig into Hadoop and testing it's feasibility for large scale development
work.  
I was wondering if anyone else being affected by these issues using hadoop 0.16.0?
I searched Jira, and I'm not sure if I saw anything that specifically fit some of these:
 
1) The symlinks for the distributed cache in the task directory are being created as 'null'
directory links (stated another way, the name of the symbolic link in the directory is the
string literal "null").  Am I doing something wrong to cause this, or do not many people use
this functionality?
 
2) I'm running into an issue where the job is giving errors in the form:
08/03/01 09:44:25 INFO mapred.JobClient: Task Id : task_200803010908_0001_r_000002_0, Status
: FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
08/03/01 09:44:25 WARN mapred.JobClient: Error reading task outputGo-Box4
08/03/01 09:44:25 WARN mapred.JobClient: Error reading task outputGo-Box4

The jobs appear to never finish the reducing once this happens.   The tasks themselves are
long running map tasks (up to 10 minutes per input), as far as I understand from the Jira
posts this  is related to the MAX_FAILED_UNIQUE_FETCHES being hard coded to 4?  Is there a
known work around or fix in the pipeline?
 
Possible related jira post: https://issues.apache.org/jira/browse/HADOOP-2220
Improving the way the shuffling mechanism works may also help? https://issues.apache.org/jira/browse/HADOOP-1339
 
I've tried setting:
<property>
  <name>mapred.reduce.copy.backoff</name>
  <value>1440</value>
  <description>The maximum amount of time (in seconds) a reducer spends on  fetching
one map output before declaring it as failed.</description>
</property>
 which should be 24 minutes, with no effect.
 
 
3) Lastly, it would seem beneficial for jobs that have significant startup overhead and memory
requirements to not be run in separate JVMs for each task.  Along these lines, it looks like
someone submitted a patch for JVM-reuse a while back, but it wasn't commited? https://issues.apache.org/jira/browse/HADOOP-249
 
Probably a question for the dev mailing list, but if I wanted to modify hadoop to allow threading
tasks, rather than running independent JVMs, is there any reason someone hasn't done this
yet?  Or am I overlooking something?
 
 
Thanks,
-Holden

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message