hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11127) Improve versioning and compatibility support in native library for downstream hadoop-common users.
Date Thu, 25 Sep 2014 19:20:34 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148170#comment-14148170
] 

Chris Nauroth commented on HADOOP-11127:
----------------------------------------

It's unclear to me how libhadoop could be partitioned into separate logical units.  [~aw],
did you already have something in mind?  One idea that came to mind is to try to separate
functions that are end-user facing vs. functions used solely in the implementation details
of HDFS or YARN.  The intention would be similar to the recent hadoop-client split: give end
users a thinner dependency.  However, it's not immediately clear to me which functions fall
into which category today.  I'm sure there would be some overlap.  I'd anticipate maintenance
challenges with keeping the right functions in the right library.  It doesn't look beneficial
to me, but let me know if you disagree.

I really, really like option #3.  Today, we have a situation where applications rely on the
mechanics of Maven dependencies to guarantee that they're getting compatible versions.  They
can't completely rely on that for hadoop-common.jar though, because it's tightly coupled (necessarily
I believe) to libhadoop.so/hadoop.dll.  This essentially means that hadoop-common.jar is an
incomplete artifact, at least as published to Maven repos.

The snappy-java project demonstrates that it is possible to bundle a native library per supported
OS inside a jar, and extract it at runtime.

https://github.com/xerial/snappy-java

Likewise, our friends over at hadoop-lzo recently added similar functionality, thanks to [~sjlee0].

https://github.com/twitter/hadoop-lzo/pull/81

The trouble for hadoop-common.jar is that I have no idea how to make this play well with the
Apache Hadoop release process.  Apparently, we've made a prior decision to stop shipping a
native build and leave this as a distro concern, such as in BigTop.  We'd have to reverse
that decision.  Then, we'd need to orchestrate the release such that a native build gets triggered
(potentially multiple native builds for multiple supported platforms), and then these disparate
pieces get pulled back to a central builder that can bundle the native artifacts inside the
published hadoop-common.jar.  It's a lot more build complexity, and I don't know how to make
it happen.  Maybe I need to chat with infra?

> Improve versioning and compatibility support in native library for downstream hadoop-common
users.
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-11127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11127
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: native
>            Reporter: Chris Nauroth
>
> There is no compatibility policy enforced on the JNI function signatures implemented
in the native library.  This library typically is deployed to all nodes in a cluster, built
from a specific source code version.  However, downstream applications that want to run in
that cluster might choose to bundle a hadoop-common jar at a different version.  Since there
is no compatibility policy, this can cause link errors at runtime when the native function
signatures expected by hadoop-common.jar do not exist in libhadoop.so/hadoop.dll.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message