Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 99386 invoked from network); 10 Jun 2010 15:39:51 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 Jun 2010 15:39:51 -0000 Received: (qmail 78661 invoked by uid 500); 10 Jun 2010 15:39:48 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 78593 invoked by uid 500); 10 Jun 2010 15:39:48 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 78585 invoked by uid 99); 10 Jun 2010 15:39:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Jun 2010 15:39:48 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS,T_FRT_ABSOLUT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of hmarti2@umbc.edu designates 130.85.25.76 as permitted sender) Received: from [130.85.25.76] (HELO mx1.umbc.edu) (130.85.25.76) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Jun 2010 15:39:41 +0000 Received: from smtp.umbc.edu (localhost [127.0.0.1]) by umbc.edu (mx1.umbc.edu) with ESMTP id o5AFdKkq013765 for ; Thu, 10 Jun 2010 11:39:20 -0400 (EDT) Received: from webmail.umbc.edu (webmail1.umbc.edu [130.85.24.52]) by smtp.umbc.edu (mx1-relay.umbc.edu) with ESMTP id o5AFdKv7013762 for ; Thu, 10 Jun 2010 11:39:20 -0400 (EDT) Received: from 71.179.242.63 (SquirrelMail authenticated user hmarti2) by webmail.umbc.edu with HTTP; Thu, 10 Jun 2010 11:39:20 -0400 (EDT) Message-ID: <62689.71.179.242.63.1276184360.squirrel@webmail.umbc.edu> Date: Thu, 10 Jun 2010 11:39:20 -0400 (EDT) Subject: Re: just because you can, it doesn't mean you should.... From: hmarti2@umbc.edu To: common-user@hadoop.apache.org User-Agent: SquirrelMail/1.4.10a MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal References: <5EBAB5A5-CAD4-4D5C-BAFA-1C5F70040904@linkedin.com> In-Reply-To: X-Milter-Key: 1276313960:4efc748e0e547b76a03322e4b390f5be X-ClamAV: OK X-Virus-Checked: Checked by ClamAV on apache.org All, Okay, I was being facetious earlier with the 'COOL' comment. This is a very bad idea. Well, not so much bad, but think about the ramifications of what you are proposing. Putting a 'comm' code lib together that facilitates comms and 'helps' with architecture issues also creates a a SPOF (as another gent pointed out); moreover, it creates a nice target for exploitation as the lib will undoubtedly become a repository of embedded passwords, alternate dummy accounts, bypass routes, and all sorts of goop to make things 'easier'. And since is has to be world readable, and easy to get access to, it will be very tough to protect - or easy to DoS/DDoS. Anything and everything from random timing attacks, substitution spoofs, TOUTOCs, you name it. This whole thing is already a very nice open highway to distribute embedded and tunneled 'items' of a certain unnatural nature, don't try to override what little security you have already by 'punching holes in the firewall' and other silly stuff. Long run, what might be better is a discovery agent that provides continual validation of paths and service availability specific to Hadoop and sub programs. That way any outage or problem can be immediately addressed or brought to the attention of the SysAds/Networkers. Like a service monitoring program. Just don't make it simple for the 'hats out there to own you in under five minutes flat (especially with an rpc or soap call to some lib or flat file - and ssh/ssl abso-lu-tely does not matter, trust me). You can disagree, and I really don't mean to be a 'buzz kill', but if you ask your local 'Sherrif', I think you'll be advised not to pursue this path too heavily. Have a good computational day... Best, Hal > Hadoop has some classes for controlling how sockets are used. See org.apache.hadoop.net.StandardSocketFactory, SocksSocketFactory. > > The socket factory implementation chosen is controlled by the > hadoop.rpc.socket.factory.class.default configuration parameter. You could > probably write your own SocketFactory that gives back socket > implementations > that tee the conversation to another port, or to a file, etc. > > So, "it's possible," but I don't know that anyone's implemented this. I think others may have examined Hadoop's protocols via wireshark or other external tools, but those don't have much insight into Hadoop's internals. > (Neither, for that matter, would the socket factory. You'd probably need to > be pretty clever to introspect as to exactly what type of message is being > sent and actually do semantic analysis, etc.) > > Allen's suggestion is probably more "correct," but might incur additional > work on your part. > > Cheers, > - Aaron > > On Thu, Jun 10, 2010 at 3:54 PM, Allen Wittenauer > wrote: > >> On Jun 10, 2010, at 3:25 AM, Ahmad Shahzad wrote: >> > Reason for doing that is that i want all the communication to happen >> through >> > a communication library that resolves every communication problem that >> we >> > can have e.g firewalls, NAT, non routed paths, multi homing etc etc. >> By >> > using that library all the headache of communication will be gone. So, >> we >> > will be able to use hadoop quite easily and there will be no >> communication >> > problems. >> I know Owen pointed you towards using proxies, but anything remotely complex would probably be better in an interposer library, as then it is >> application agnostic. >