hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hmar...@umbc.edu
Subject Re: just because you can, it doesn't mean you should....
Date Thu, 10 Jun 2010 15:39:20 GMT

Okay, I was being facetious earlier with the 'COOL' comment.

This is a very bad idea. Well, not so much bad, but think about the
ramifications of what you are proposing. Putting a 'comm' code lib together
that facilitates comms and 'helps' with architecture issues also creates a
a SPOF (as another gent pointed out); moreover, it creates a nice target
for exploitation as the lib will undoubtedly become a repository of embedded
passwords, alternate dummy accounts, bypass routes, and all sorts of goop
to make things 'easier'. And since is has to be world readable, and easy
to get
access to, it will be very tough to protect - or easy to DoS/DDoS. Anything
and everything from random timing attacks, substitution spoofs, TOUTOCs,
you name it.

This whole thing is already a very nice open highway to distribute
embedded and
tunneled 'items' of a certain unnatural nature, don't try to override what
security you have already by 'punching holes in the firewall' and other
silly stuff.

Long run, what might be better is a discovery agent that provides
continual validation
of paths and service availability specific to Hadoop and sub programs.
That way any
outage or problem can be immediately addressed or brought to the attention
of the
SysAds/Networkers. Like a service monitoring program. Just don't make it
simple for
the 'hats out there to own you in under five minutes flat (especially with
an rpc or soap call
to some lib or flat file - and ssh/ssl abso-lu-tely does not matter, trust
me). You can disagree, and I really don't mean to be a 'buzz kill', but if
you ask your local 'Sherrif',
I think you'll be advised not to pursue this path too heavily.

Have a good computational day...

Best, Hal

> Hadoop has some classes for controlling how sockets are used. See
org.apache.hadoop.net.StandardSocketFactory, SocksSocketFactory.
> The socket factory implementation chosen is controlled by the
> hadoop.rpc.socket.factory.class.default configuration parameter. You
> probably write your own SocketFactory that gives back socket
> implementations
> that tee the conversation to another port, or to a file, etc.
> So, "it's possible," but I don't know that anyone's implemented this. I
think others may have examined Hadoop's protocols via wireshark or other
external tools, but those don't have much insight into Hadoop's
> (Neither, for that matter, would the socket factory. You'd probably need to
> be pretty clever to introspect as to exactly what type of message is
> sent and actually do semantic analysis, etc.)
> Allen's suggestion is probably more "correct," but might incur
> work on your part.
> Cheers,
> - Aaron
> On Thu, Jun 10, 2010 at 3:54 PM, Allen Wittenauer
> <awittenauer@linkedin.com>wrote:
>> On Jun 10, 2010, at 3:25 AM, Ahmad Shahzad wrote:
>> > Reason for doing that is that i want all the communication to happen
>> through
>> > a communication library that resolves every communication problem
>> we
>> > can have e.g firewalls, NAT, non routed paths, multi homing etc etc.
>> By
>> > using that library all the headache of communication will be gone.
>> we
>> > will be able to use hadoop quite easily and there will be no
>> communication
>> > problems.
>> I know Owen pointed you towards using proxies, but anything remotely
complex would probably be better in an interposer library, as then it
>> application agnostic.

View raw message