hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy V. Ryaboy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
Date Wed, 19 Aug 2009 18:58:14 GMT

    [ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745164#action_12745164

Dmitriy V. Ryaboy commented on PIG-924:

Daniel, you've hit the nail on the head.

This patch is specifically written to enable us to compile against all the versions of hadoop,
and let the user pick which one he wants at runtime (by virtue of including the right hadoop
on the path -- no flags needed).  In fact the default ant task in the shims directory compiles
all the shims at once.

The version string hack is safe, as long as hadoop is built correctly (the zebra version is
not, as it returns "Unknown", hence the last-resort hack of defaulting to 20).
If hadoop came from its own jar I could use reflection to get the jar name, and use that as
a fallback for an Unknown version -- but in pig, hadoop comes from the pig.jar !

Ideally, Pig would compile all the versions of shims into its jars, and the pig jar woud not
include hadoop. Then the user would include the right hadoop on the path (or bin/pig would
do it for him), and everything would happen automagically.  

By bundling hadoop into the jar, however, switching hadoop versions on the fly is next to
impossible (or at least I don't know how) -- we have multiple jars on the classpath, and the
classloader will use whatever is the latest (or is it earliest?). Finding the right resource
becomes fraught with peril.

If existing deployments need a single pig.jar without a hadoop dependency, it might be possible
to create a new target (pig-all) that would create a statically bundled jar; but I think the
default behavior should be to not bundle, build all the shims, and use whatever hadoop is
on the path.

The current patch is written as is so that it can be applied to trunk, enabling people to
compile statically, and only require a change to the ant build files to switch to a dynamic
compile later on (after 0.4, probably)

> Make Pig work with multiple versions of Hadoop
> ----------------------------------------------
>                 Key: PIG-924
>                 URL: https://issues.apache.org/jira/browse/PIG-924
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Dmitriy V. Ryaboy
>         Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch
> The current Pig build scripts package hadoop and other dependencies into the pig.jar
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19,
and 20.  It is possibly to write a dynamic shim that allows Pig to use the correct calls for
any of the above versions of Hadoop. Unfortunately, the building process precludes us from
the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims
are created.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message