hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stuart White" <stuart.whi...@gmail.com>
Subject Re: -libjars with multiple jars broken when client and cluster reside on different OSs?
Date Tue, 30 Dec 2008 22:12:23 GMT
I agree.  Using a List<String> seems to make more sense.

FYI... I opened a jira for this:
https://issues.apache.org/jira/browse/HADOOP-4864

On Tue, Dec 30, 2008 at 3:53 PM, Jason Venner <jason@attributor.com> wrote:

> The path separator is a major issue with a number of items in the
> configuration data set that are multiple items packed together via the path
> separator.
> the class path
> the distributed cache
> the input path set
>
> all suffer from the path.separator issue for 2 reasons:
> 1 being the difference across jvms as indicated in the previous email item
> (I had missed this!)
> 2 separator characters that happen to be embedded in the individual
> elements are not escaped before the item is added to the existing set.
>
> For all of the pain we have with these packed items, it may be simpler to
> serialize a List<String> for multi element items rather than packing them
> with the path.separator system property item.
>
>
>
> Aaron Kimball wrote:
>
>> Hi Stuart,
>>
>> Good sleuthing out that problem :) The correct way to submit patches is to
>> file a ticket on JIRA (https://issues.apache.org/jira/browse/HADOOP).
>> Create
>> an account, create a new issue describing the bug, and then attach the
>> patch
>> file. There'll be a discussion there and others can review your patch and
>> include it in the codebase.
>>
>> Cheers,
>> - Aaron
>>
>> On Fri, Dec 12, 2008 at 12:14 PM, Stuart White <stuart.white1@gmail.com
>> >wrote:
>>
>>
>>
>>> Ok, I'll answer my own question.
>>>
>>> This is caused by the fact that hadoop uses
>>> system.getProperty("path.separator") as the delimiter in the list of
>>> jar files passed via -libjars.
>>>
>>> If your job spans platforms, system.getProperty("path.separator")
>>> returns a different delimiter on the different platforms.
>>>
>>> My solution is to use a comma as the delimiter, rather than the
>>> path.separator.
>>>
>>> I realize comma is, perhaps, a poor choice for a delimiter because it
>>> is valid in filenames on both Windows and Linux, but the -libjars uses
>>> it as the delimiter when listing the additional required jars.  So, I
>>> figured if it's already being used as a delimiter, then it's
>>> reasonable to use it internally as well.
>>>
>>> I've attached a patch (against 0.19.0) that applies this change.
>>>
>>> Now, with this change, I can submit hadoop jobs (requiring multiple
>>> supporting jars) from my Windows laptop (via cygwin) to my 10-node
>>> Linux hadoop cluster.
>>>
>>> Any chance this change could be applied to the hadoop codebase?
>>>
>>>
>>>
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message