hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11127) Improve versioning and compatibility support in native library for downstream hadoop-common users.
Date Thu, 25 Sep 2014 18:21:35 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148082#comment-14148082

Colin Patrick McCabe commented on HADOOP-11127:

[~aw]: on a more serious note, what advantages do you see to splitting {{libhadoop.so}}? 
I can't think of anything off the top of my head, but maybe I'm missing something.  I've never
liked the wide range of configurations we have to support in Hadoop-- it seems like splitting
libhadoop into N pieces would add another 2^N configurations (number of permutations of each
piece being present or absent.)  More configurations means less testing for each config. 
We've seen this with lzo and the other compression libraries... it just creates a flood of
user questions when a library is optional.  If I were starting Hadoop today from scratch,
I might make libhadoop.so mandatory just because users accidentally forgetting to install
or configure it has created so much grief over the years.  But of course that's not an option
for backwards compatibility reasons.

[~cnauroth]: I admit that I think solution #2 is reasonable, since it seems to correspond
to the way that most users I know use Hadoop.  They don't generally run clients of version
N unless they have installed servers of version N first.  They may run clients of version
N-1 against servers of version N, but that would work with solution #2.  I can see how it's
not entirely in keeping with the ideal YARN deployment model, though.

Solution #3 is actually really interesting because it would solve the CLASSPATH / LD_LIBRARY_PATH
issue forever.  We wouldn't have to worry about incorrect configurations as long as the users
had the jars with the native shared libraries inside them.  I still get questions about why
HDFS short-circuit reads don't work for one client or another, and 99% of the time, the answer
is that the path to libhadoop.so is not configured by that client.  This would solve that.
 Since most clients who use the native libraries use RPMs or debs specific to their platforms,
I think switching to this system wouldn't be too difficult from a user point of view.

There's some interesting discussion of loading a native library from a jar here: http://frommyplayground.com/how-to-load-native-jni-library-from-jar/
 Jars are pretty similar to tar files as I understand, so we should be able to just inject
this file into the jar somewhere.  This may add some startup time overhead, but I don't think
it will be that large since libhadoop is less than a meg...

> Improve versioning and compatibility support in native library for downstream hadoop-common
> --------------------------------------------------------------------------------------------------
>                 Key: HADOOP-11127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11127
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: native
>            Reporter: Chris Nauroth
> There is no compatibility policy enforced on the JNI function signatures implemented
in the native library.  This library typically is deployed to all nodes in a cluster, built
from a specific source code version.  However, downstream applications that want to run in
that cluster might choose to bundle a hadoop-common jar at a different version.  Since there
is no compatibility policy, this can cause link errors at runtime when the native function
signatures expected by hadoop-common.jar do not exist in libhadoop.so/hadoop.dll.

This message was sent by Atlassian JIRA

View raw message