hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7405) libhadoop is all or nothing
Date Mon, 20 Jun 2011 18:12:48 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052118#comment-13052118
] 

Allen Wittenauer commented on HADOOP-7405:
------------------------------------------

bq. Since Hadoop Kerberos Mac OS X support was never fully there, it is not possible to compile
libhadoop due to some compiler errors.

The compiler errors are fairly simple to fix on Darwin.  I don't know why, but it seems like
9 times out of 10, we favor BSD functionality when we go with something non-portable. 

bg. Because of this my take is that if we require native code to run Hadoop, we should provide
the full set of native code for each platform we are building for. 

Regardless of what happens in this jira, we need a testsuite for the C code anyway.  OS X
actually proves out that even if the code compiles, it doesn't necessarily mean it works properly.
 (See HADOOP-7367).

bg. A while ago I've opened a HADOOP-7083 to enable running Hadoop with Kerberos ON without
relying on some libhadoop functionality and the argument there was that doing that was a security
risk. 

Right.  It wanted to create a third security mode where some stuff worked and some stuff didn't.
 That's not quite what I'm asking for here and it wouldn't actually fix the problem we're
hitting anyway. The security functionality is orthogonal to the compression functionality.
 That's the base, surface issue.  Since it is in one big chunk, we broke *both*.

(While I guess it wasn't obvious, I should probably state that I'm not looking for a "partially
working" security mode.  The scope of what constitutes a working unit would still need to
be defined.  It is more than reasonable to say that all of the functions that are directly
security related would need to be ported and treated like one block.  Asking libhadoop.so
if it "supports security" seems like a reasonable thing to ask it.)

The problem that we've got is that we have a lot of unrelated code sitting in libhadoop.so.
 Every time we add something we run the risk of regressing features out of platforms other
than Linux since those other platforms are an afterthought.  HADOOP-7206 may actually be a
great example of this:  if we go with a pure native implementation, we won't be able to support
Snappy on anything but Linux with the current state of things.  Lack of compression support
has a *direct* impact on the client.  I'd be surprised if the majority of shops are only using
Linux clients. 

Wouldn't it be great to be able to ask the lib "do you support gzip, do you support snappy,
do you support lzo, do you support security, ..."?  Then we could add code as needed, do ports
as needed, etc.  An alternative would be that we start breaking libhadoop up into at least
related functionality.

I suppose the other outcome might be that we as a community just admit that we don't support
Hadoop on anything but Linux and give up on any semblance of portability.  More and more code
is being added or rewritten in C.  I would be surprised if this trend changes.

> libhadoop is all or nothing
> ---------------------------
>
>                 Key: HADOOP-7405
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7405
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: native
>    Affects Versions: 0.20.203.0, 0.23.0
>         Environment: Everything not Linux
>            Reporter: Allen Wittenauer
>            Priority: Blocker
>              Labels: regression
>
> As a result of a ton of new code in libhadoop being added in 0.20.203/0.22, a lot of
features that used to work no longer do reliably.  The most common problem is native compression,
but other issues such as Mac OS X's group support broke as well.  The native code checks need
to be refactored such that libhadoop.so should report what it supports rather than having
the Java-side assume that if it loads, it is all supported.  This would allow us to stub routines
until they've been vetted, removing the chances of such regressions appearing in the future.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message