hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Burlison <Alan.Burli...@oracle.com>
Subject Re: DomainSocket issues on Solaris
Date Tue, 06 Oct 2015 14:19:42 GMT
On 06/10/2015 11:01, Steve Loughran wrote:

>> I really don't want to do that as it relegates Solaris to only ever
>> being a second-class citizen.
> I know that Solaris matters to you 100%, and we've tried to be as
> supportive as we can, even though it's not viewed as important to
> anyone else. We don't want to make it 2nd class, just want to get it
> to be 1st class in a way which doesn't create lots of compatibility
> problems.

Yes you have been supportive, I recognise that and I'm grateful for it 
:-) Although I'm the main Solaris person that's visible, I'm not the 
only one who is interested. And I fully get the backwards compatibility 
thing, it's one of the main features of Solaris. However keeping 
backwards binary compatibility is something you really have to decide up 
front and design for, it's very difficult to add it as a constraint 
after the fact, as this scenario illustrates. And without internal or 
external library versioning support, its even harder still.

> Is the per-socket timeout assumption used anywhere outside the
> tests?

I've no real idea yet as I haven't yet got to the point where I have a 
'Full Fat JNI' version of Hadoop on Solaris, I do know that around 50% 
of the ~200 test failures I'm seeing are most likely related to timeout 
handling, which is why I'm concentrating on it.

> so we move from
> function(fileHandle)
> to function(Object), where object->fileHandle and object->timeout are both there?

To be precise, the signature change I have at the moment is (for example)

JNIEnv *env, jclass clazz, jint fd)


JNIEnv *env, jobject obj)

filehandle, readTimeout and writeTimeout are then accessed as members of 
the jobject.

> what about
> function(fileHandle, timeout)
> where we retain
> function(fileHandle) { return function(fileHandle, defaultTimeout)}?
> And then never invoke it in our existing code, which now calls the new operation?
> or if there's a call
> setTimeout(fileHandle, timeout)
> which for linux sets the socket timeout —and in solaris updates some
> map handle->timeout used in the select() call.

Yes, I'd thought of that. The problem is the 'some map' bit. Maintaining 
that map would be clunky - file descriptor IDs are not going to be 
sequential and are reused so we'd have to store them in some sort of 
shadow data structure and track each and every close, and that's fiddly.

And the 'default timeout' option is I believe a non-starter, the default 
timeout is 2 minutes and many of the tests set it to a much shorter 
interval and expect it to time out at the specified time.

The problem is that if we store the timeout along the filehandle then we 
need access to an object pointer to retrieve it during the socket call. 
As the existing functions are static ones an object pointer isn't available.

I've looked long and hard at this, I have not come up with a mechanism 
that is both backwards binary compatible and not totally vile.

>> The other option is to effectively write a complete Solaris-only
>> replacement for DomainSocket, whether switching between that and the
>> current one is done at compile or run-time isn't really the point.
>> There's a fairly even split between the Java & JNI components of
>> DomainSocket, so whichever way it's done there will be significant
>> duplication of the overall logic and most likely code duplication.
>> That means that bug fixes in one place have to be exactly mirrored in
>> another, and that's unlikely to be sustainable.
> It's not going to be maintained, or more precisely: it'll be broken
> on a regular basis and you are the one left to handle it.

Exactly, which is why it is a non-starter. Whatever I do to fix this 
needs to be as minimal as possible and needs to disappear on platforms 
which don't need it.

>> Unfortunately I can't predict when that might happen by, though. In
>> my prototype it probes for working timeouts at configure time, so
>> when they do become available they'll be used automatically.
> I agree that there is no formal libhadoop.so compatibility policy and
> that is frustrating.  This has been an issue for those who want to run
> jars compiled against multiple different versions of hadoop through
> the same YARN instance.  We've discussed it in the past, but never
> really come up with a great solution.  The best approach really would
> be to bundle libhadoop.so inside the hadoop jar files, so that it
> could be integral to the Hadoop version itself.  However, nobody has
> done the work to make that happen.  The second-best approach would be
> to include the Hadoop version in the libhadoop name itself (so we'd
> have libhadoop28.so for hadoop 2.8, and so forth.)  Anyway, I think we
> can solve this particular issue without going down that rathole...

Unfortunately I don't think we can, not without further complicating the 
existing complicated code with a lot of scaffolding.

I don't understand how YARN & multiple Hadoop versions interact, but if 
they are all in the same JVM instance then no amount of fiddling with 
shared objects will help as you can't have multiple SOs providing the 
same APIs within the same process - or at least not without a lot of 
complicated, fragile and utterly platform-specific configuration and code.

Alan Burlison

View raw message