hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stuart White (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-4864) -libjars with multiple jars broken when client and cluster reside on different OSs
Date Sat, 13 Dec 2008 03:44:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Stuart White updated HADOOP-4864:
---------------------------------

    Description: 
When submitting a hadoop job from Windows (Cygwin) to a Linux hadoop cluster (or vice versa),
and when you specify multiple additional jar files via the -libjars flag, hadoop throws a
ClassNotFoundException for any classes located in the additional jars specified via the -libjars
flag.

This is caused by the fact that hadoop uses system.getProperty("path.separator") as the delimiter
in the list of jar files passed via -libjars.

My suggested solution is to use a comma as the delimiter, rather than the path.separator.

I realize comma is, perhaps, a poor choice for a delimiter because it is valid in filenames
on both Windows and Linux, but the -libjars flag uses it as the delimiter when listing the
additional required jars.  So, I figured if it's already being used as a delimiter, then it's
reasonable to use it internally as well.


  was:
When submitting a hadoop job from Windows (Cygwin) to a Linux hadoop cluster (or vice versa),
and when you specify multiple additional jar files via the -libjars flag, hadoop throws a
ClassNotFoundException for any classes located in the additional jars specified via the -libjars
flag.

This is caused by the fact that hadoop uses system.getProperty("path.separator") as the delimiter
in the list of jar files passed via -libjars.

If your job spans platforms, system.getProperty("path.separator") returns a different delimiter
on the different platforms.

My suggested solution is to use a comma as the delimiter, rather than the path.separator.

I realize comma is, perhaps, a poor choice for a delimiter because it is valid in filenames
on both Windows and Linux, but the -libjars flag uses it as the delimiter when listing the
additional required jars.  So, I figured if it's already being used as a delimiter, then it's
reasonable to use it internally as well.

I have a patch that applied my suggested change, but I don't see anywhere so upload it.  So,
I'll go ahead and create this JIRA and hope that I will have the opportunity to add a patch
later.

Now, with this change, I can submit hadoop jobs (requiring multiple
supporting jars) from my Windows laptop (via cygwin) to my 10-node
Linux hadoop cluster.

Any chance this change could be applied to the hadoop codebase?

To recreate the problem I'm seeing, do the following:

- Setup a hadoop cluster on linux

- Perform the remaining steps on cygwin, with a hadoop installation
configured to point to the linux cluster.  (set fs.default.name and
mapred.job.tracker)

- Extract the tarball.  Change into created directory.
 tar xvfz Example.tar.gz
 cd Example

- Edit build.properties, set your hadoop.home appropriately, then
build the example.
 ant

- Load the file Example.in into your dfs
 hadoop dfs -copyFromLocal Example.in Example.in

- Execute the provided shell script, passing it testID 1.
 ./Example.sh 1
 This test does not use -libjars, and it completes successfully.

- Next, execute testID 2.
 ./Example.sh 2
 This test uses -libjars with 1 jarfile (Foo.jar), and it completes
successfully.

- Next, execute testID 3.
 ./Example.sh 3
 This test uses -libjars with 1 jarfile (Bar.jar), and it completes
successfully.

- Next, execute testID 4.
 ./Example.sh 4
 This test uses -libjars with 2 jarfiles (Foo.jar and Bar.jar), and
it fails with a ClassNotFoundException.

This behavior only occurs when calling from cygwin to linux or vice
versa.   If both the cluster and the client reside on either linux or
cygwin, the problem does not occur.

I'm continuing to dig to see what I can figure out, but since I'm very
new to hadoop (started using it this week), I thought I'd go ahead and
throw this out there to see if anyone can help.

Thanks!


> -libjars with multiple jars broken when client and cluster reside on different OSs
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-4864
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4864
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: filecache
>    Affects Versions: 0.19.0
>         Environment: When your hadoop job spans OSs.
>            Reporter: Stuart White
>            Priority: Minor
>         Attachments: patch.txt
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When submitting a hadoop job from Windows (Cygwin) to a Linux hadoop cluster (or vice
versa), and when you specify multiple additional jar files via the -libjars flag, hadoop throws
a ClassNotFoundException for any classes located in the additional jars specified via the
-libjars flag.
> This is caused by the fact that hadoop uses system.getProperty("path.separator") as the
delimiter in the list of jar files passed via -libjars.
> My suggested solution is to use a comma as the delimiter, rather than the path.separator.
> I realize comma is, perhaps, a poor choice for a delimiter because it is valid in filenames
on both Windows and Linux, but the -libjars flag uses it as the delimiter when listing the
additional required jars.  So, I figured if it's already being used as a delimiter, then it's
reasonable to use it internally as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message