Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 64955 invoked from network); 10 Oct 2008 11:03:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Oct 2008 11:03:12 -0000 Received: (qmail 49527 invoked by uid 500); 10 Oct 2008 11:03:09 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 49496 invoked by uid 500); 10 Oct 2008 11:03:09 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 49485 invoked by uid 99); 10 Oct 2008 11:03:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Oct 2008 04:03:09 -0700 X-ASF-Spam-Status: No, hits=-2.7 required=10.0 tests=DNS_FROM_SECURITYSAGE,RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [192.6.10.2] (HELO colossus.hpl.hp.com) (192.6.10.2) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Oct 2008 11:02:04 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by colossus.hpl.hp.com (Postfix) with ESMTP id 52E9D6BCD9 for ; Fri, 10 Oct 2008 12:02:15 +0100 (BST) X-Virus-Scanned: amavisd-new at hplb.hpl.hp.com Received: from colossus.hpl.hp.com ([127.0.0.1]) by localhost (colossus.hpl.hp.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 0iYFQTsC+WaL for ; Fri, 10 Oct 2008 12:02:14 +0100 (BST) Received: from 0-imap-br1.hpl.hp.com (0-imap-br1.hpl.hp.com [16.25.144.60]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by colossus.hpl.hp.com (Postfix) with ESMTPS id 6C6506BCD7 for ; Fri, 10 Oct 2008 12:01:58 +0100 (BST) MailScanner-NULL-Check: 1224240589.39789@3xVMT54tE+NCDwmTpyn2sg Received: from [16.24.201.249] ([16.24.201.249]) by 0-imap-br1.hpl.hp.com (8.14.1/8.13.4) with ESMTP id m9AAnmZq001411 for ; Fri, 10 Oct 2008 11:49:48 +0100 (BST) Message-ID: <48EF3311.5070706@apache.org> Date: Fri, 10 Oct 2008 11:48:49 +0100 From: Steve Loughran User-Agent: Thunderbird 2.0.0.17 (Windows/20080914) MIME-Version: 1.0 To: core-dev@hadoop.apache.org Subject: Re: RPC versioning References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-HPL-MailScanner-Information: Please contact the ISP for more information X-MailScanner-ID: m9AAnmZq001411 X-HPL-MailScanner: Found to be clean X-HPL-MailScanner-From: stevel@apache.org X-Virus-Checked: Checked by ClamAV on apache.org Sanjay Radia wrote: > > On Oct 7, 2008, at 3:48 AM, Alejandro Abdelnur wrote: >> How about something more simplistic? >> >> * The client handles a single version of Hadoop, the one that it belongs >> * The server handles its version and a a backwards range [vN, vN-M] >> * All client RPCs always start with the Hadoop version they belong to >> * On the server side an RpcSwitch reads the version, checks the >> received RPC is within valid version range, and delegates the call to >> corresponding handler. The vN-1 to vN-M handlers will be adapters to a >> vN handler >> > You will need to solve the following problem also: > Vn-1 Vn-2 ... each changed some field of some parameter class. > You can switch to the right handler but that handler in general only has > a class definition of the class that matches its version. How does he > read the objects with the older class definitions? > You may be able to address this via class loader tricks or by tagging a > version number to the name of the class and keeping the definitions of > each of the older versions (can be made a little simpler than I am > describing though the magic of subclassing ... but something like that > will be needed. > That is starting to look very much like java RMI, which scares me, because although we use RMI a lot, it is incredibly brittle, and once you add OSGi to the mix even worse (as it is no longer enough to include classname and class/interface ID, you need to include classloader info). Once you go down this path, you start looking wistfully at XML formats where you can use XPath to manipulate the messages. From my experience in SOAP-land, having a wire format that is flexible comes to nothing if either end doesn't handle it, or if you change the semantics of the interface either explicitly (operations behave differently) or implicitly (something that didnt look the far end now does; or other transaction behaviours) [1]. Its those semantic changes that really burn your clients, even if you handle all the marshalling issues. If you are going to handle multiple versions on a server -and I don't know if that is the right approach to take with any RPC mechanism- here's what I'd recommend -connecting clients include version numbers in the endpoints they hit. All URLs include a version. -IPC comms is negotiated. The first thing a client does is say "Im v.8 and I'd like a v.8 connection", the server gets to say "no" or return some reference (URL, Endpointer, etc) that can be used for all further comms. That's extra work, but so is supporting multiple versions. Here is what I think is better: Define a hard split between in-cluster protocol and external. In-cluster is how the boxes talk to each other. It can either assume a secured network and stick with NFS-class security (users are who they say they are, everything is trusted), or (optionally) have a paranoid mode where all messages have to be signed with some defences against replay attacks. The current (trust everything) model is a lot easier. Things like management tools and filesystem front-ends would be expected to stay in sync with this stuff; its not private, so much as "internal-unstable", if that were a java keyword. External is the public API for loosely coupled things that can submit work and work with a (remote) filesystem -command line applications -web front ends -IDE plugins -have support for notifications that work through firewalls (atom feeds and/or xmpp events) This front end would have security from the outset, be fairly RESTy and use a wire format that is resilent to change (JSON, XML). The focus here is robustness, security and long-haul comms over high performance. The nice thing about this approach is that the current IPC gets declared the internal API and can evolve as appropriate; its the external API that we need to design to be robust across versions. We can maybe use WebDAV as the main filesystem API, with the MacOS, windows and linux WebDAV filesystems talking to it; the job API is something we can do ourselves, possibly reviewing past work in this area (including OSGI designs), but doing something the REST discuss list would be proud of, or at least not unduly critical of. FWIW I have done WebDAV to other filesystem work many years ago (2000); it's not too bad except that much of its FS semantics is that of Win9x. It has metadata where you could set things like the replication factor using PROPSET/PROPGET while MOVE and COPY operations let you rename and copy files/directories without pulling things client side. The big limitations are that it contains (invalid) assumptions that the files are less than a few GB and so that you can PUT/GET in one-off operations, rather than break up the upload into pieces. I don't think there's an APPEND operation either. [1] Rethinking the Java SOAP Stack http://www.hpl.hp.com/techreports/2005/HPL-2005-83.pdf