hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: DomainSocket issues on Solaris
Date Tue, 06 Oct 2015 10:01:02 GMT

On 6 Oct 2015, at 00:34, Alan Burlison <Alan.Burlison@oracle.com<mailto:Alan.Burlison@oracle.com>>
wrote:

On 05/10/15 18:30, Colin P. McCabe wrote:

1. Don't get DomainSocket working on Solaris.  Rely on the legacy
short-circuit read instead.  It has poorer security guarantees, but
doesn't require domain sockets.  You can add a line of code to the
failing junit tests to skip them on Solaris.

I really don't want to do that as it relegates Solaris to only ever being a second-class citizen.

I know that Solaris matters to you 100%, and we've tried to be as supportive as we can, even
though it's not viewed as important to anyone else. We don't want to make it 2nd class, just
want to get it to be 1st class in a way which doesn't create lots of compatibility problems.


2. Use a separate "timer wheel" thread which implements coarse-grained
timeouts by calling shutdown() on domain sockets that have been active
for too long.  This thread could be global (one per JVM).

From what I can tell that won't stop all the test failures as they are written with the assumption
that per-socket timeouts are available and that they time out exactly when expected.


Is the per-socket timeout assumption used anywhere outside the tests?

3. Implement the poll/select loop you discussed earlier.  As Steve
commented, it would be easier to do this by adding new functions,
rather than by changing existing ones.  I don't think "ifdef skid
marks" are necessary since poll and select are supported on Linux and
so forth as well as Solaris.  You would just need some code in
DomainSocket.java to select the appropriate implementation at runtime
based on the OS.

I could switch the implementation over to use poll everywhere but I haven't done that - Linux
still uses socket timeouts. The issue is that in order to make poll() work I need to maintain
the read/write timeouts alongside the filehandle - I can't store the timeout 'inside' the
filehandle using setsockopt(). That means that the filehandle and the timeouts have to be
stored together somewhere. The logical place to put the timeouts is in the same DomainSocket
instances that holds the filehandle. If the DomainSocket JNI methods were all instance methods
then there wouldn't be a problem, but they aren't, they are static methods where the integer
filehandle is passed in as a parameter. And it wouldn't work if I change the native method
parameter lists to include the timeouts as they need to be read/write. The only non-vile way
I can come up with of doing this is to convert the JNI methods from static into instance methods.
Even if that's the only change I make and I still pass in the filehandle as a parameter, the
signatures will have changed as the 2nd parameter would now be an object reference and not
a class reference.

so we move from

function(fileHandle)

to function(Object), where object->fileHandle and object->timeout are both there?

what about

function(fileHandle, timeout)

where we retain

function(fileHandle) { return function(fileHandle, defaultTimeout)}?

And then never invoke it in our existing code, which now calls the new operation?

or if there's a call

setTimeout(fileHandle, timeout)

which for linux sets the socket timeout —and in solaris updates some map handle->timeout
used in the select() call.


The other option is to effectively write a complete Solaris-only replacement for DomainSocket,
whether switching between that and the current one is done at compile or run-time isn't really
the point. There's a fairly even split between the Java & JNI components of DomainSocket,
so whichever way it's done there will be significant duplication of the overall logic and
most likely code duplication. That means that bug fixes in one place have to be exactly mirrored
in another, and that's unlikely to be sustainable.


It's not going to be maintained, or more precisely: it'll be broken on a regular basis and
you are the one left to handle it.


My goal has been to keep the current logic as unchanged as possible. My prototype does that
by literally prefixing each libc socket operation with a poll() call to check the filehandle
is ready. The rest of the logic in DomainSocket is completely unchanged. That means that the
behaviour between Linux and Solaris should be as identical as is possible.

Since you commented that Solaris is implementing timeout support in
the future, approaches #1 or #2 could be placeholders until that's
finished.

Unfortunately I can't predict when that might happen by, though. In my prototype it probes
for working timeouts at configure time, so when they do become available they'll be used automatically.

I agree that there is no formal libhadoop.so compatibility policy and
that is frustrating.  This has been an issue for those who want to run
jars compiled against multiple different versions of hadoop through
the same YARN instance.  We've discussed it in the past, but never
really come up with a great solution.  The best approach really would
be to bundle libhadoop.so inside the hadoop jar files, so that it
could be integral to the Hadoop version itself.  However, nobody has
done the work to make that happen.  The second-best approach would be
to include the Hadoop version in the libhadoop name itself (so we'd
have libhadoop28.so for hadoop 2.8, and so forth.)  Anyway, I think we
can solve this particular issue without going down that rathole...

As I said, I believe that ship has long since sailed. Changes that have already been let in
have I believe broken the backwards binary compatibility of the Java/JNI interface. Broken
is broken, arguing that this proposal shouldn't be allowed in because it simply adds more
brokenness to the existing brokenness is really missing the point. As far as I can tell, there
already is no backwards compatibility.

--
Alan Burlison
--


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message