pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Gordon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PIG-2793) Make Pig Work on Windows without Cygwin
Date Sat, 07 Jul 2012 00:17:34 GMT
John Gordon created PIG-2793:

             Summary: Make Pig Work on Windows without Cygwin
                 Key: PIG-2793
                 URL: https://issues.apache.org/jira/browse/PIG-2793
             Project: Pig
          Issue Type: Bug
          Components: grunt, parser, tools
         Environment: Windows without Cygwin as a whole, but with some key binaries such as
perl, diff, gawk, gzip, sed.
            Reporter: John Gordon

For pig to really work well on Windows, it needs hadoop core changes.  Right now, those are
in progress in branch-1-win.  For this work, I am running Pig on Windows against branch-1-win
and removing Cygwin dependencies as capabilities open up.  Branch-1-win is fairly stable now,
and has opened up enough functionality to see the few things needed in Pig to run E2E on top
of a cross-platform Hadoop core without Cygwin.  This uber-JIRA should track the whole of
the work to get pig running well on Windows without Cygwin.

There are a few types of work that I think are needed right now (will break-out sub-jiras
to track them):

1.) Tests that generate pig script strings with paths in them (e.g. dynamically build load/store
commands) need to have Pig escape ("\") characters encoded -- as they can now occur in both
Hadoop and local paths.

2.) Tests that generate local temporary files with createTempFile, and then try to use those
as HDFS paths need to remove ":" from the generated file name to create valid HDFS paths.

3.) Tests that hand-generate URIs via string concatenation (e.g. "file:" + strFileName) need
to use Util.generateURI instead to get a valid URI for the target platform.

4.) Tests that assume the first line in a script (e.g. #!/bin/sh) auto-resolves interpreters
need to explicitly call the interpreter (e.g. instead of calling "perlscript.pl" they should
call "perl perlscript.pl".

5.) Changes in quotes or command syntax between shells (e.g. " or ', dir or ls) need to be
tuned a little here and there.


1.) The streaming interface needs to be fixed to run without a Cygwin dependency.

2.) The pig.additional.jars separator is currently hardcoded to ":", and should be File.pathSeparator
instead (":" on linux, ";" on Windows) to be able to accept Windows paths (C:\file.jar for

3.) The Grunt "sh" command highly surfaces the behavior of the exec API.  If you use a built-in,
it fails with file not found.  This surfaces a lot of differences in shell implementation
differences (e.g. ls is an exe, but dir is builtin) -- and many of the cases in TestGrunt
end up running (sh bash -c "command").  For portability and ease of use, sh should actually
exec "sh -c <command> on Linux and "cmd /C <command>" on Windows to improve usability
and make it possible to use aliases and bat files on either platform to make the interface
more platform independent to end-users.

4.) (eventual) Update Pig's dependencies to pick up a stable Hadoop core that runs on Windows
from a release branch.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message