hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <ey...@yahoo-inc.com>
Subject Re: Add user jars to mapreduce
Date Wed, 20 Jan 2010 17:52:20 GMT
Hi Victor,

Thanks for the detailed examination.  I will make sure to remove the URI
prefix in my code for now.

Regards,
Eric

On 1/20/10 5:36 AM, "Victor Hsieh" <victorhsieh@gmail.com> wrote:

> BTW, this issue has been reported:
> http://issues.apache.org/jira/browse/MAPREDUCE-752
> 
> On Wed, Jan 20, 2010 at 7:59 PM, Victor Hsieh <victorhsieh@gmail.com> wrote:
>> Hi Eirc,
>> 
>> (I was new to this mailing list, so I don't have the original email to
>> reply directly.)
>> 
>> I have exact the same problem today, and finally found the reason.
>> 
>> In our case, we add some URI to DistributedCache like you.  But
>> unfortunately the problem was the URI.  When we tried to add several
>> jars by calling addFileToClassPath, these files are actually joined by
>> colons, which is the default path separator in java classpath.  And
>> this is the reason of failure.
>> 
>> For example, if you have hdfs://example.com:9000/a.jar and
>> hdfs://example.com:9000/b.jar to add to classpath, your
>> mapred.job.classpath.files will look like (note these colons!):
>> 
>>  dfs://example.com:9000/a.jar:hdfs://example.com:9000/b.jar
>> 
>> Then when a worker tries to add them to the classpath (search
>> getFileClassPaths in org.apache.hadoop.mapred.TaskRunner.java), it
>> actually adds "dfs", "//example.com", "9000/a.jar", and so on, which
>> is not desired.
>> 
>> Our solution is to remove "hdfs://example.com:9000" part when calling
>> addFileToClassPath.  Hope it helps!
>> 
>> Victor
>> 


Mime
View raw message