Return-Path: Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: (qmail 39772 invoked from network); 1 Feb 2011 16:11:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Feb 2011 16:11:37 -0000 Received: (qmail 43237 invoked by uid 500); 1 Feb 2011 16:11:35 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 42966 invoked by uid 500); 1 Feb 2011 16:11:32 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 42953 invoked by uid 99); 1 Feb 2011 16:11:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Feb 2011 16:11:31 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [212.227.17.10] (HELO moutng.kundenserver.de) (212.227.17.10) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Feb 2011 16:11:25 +0000 Received: from office1.lan.sumadev.de (dslb-084-059-076-183.pools.arcor-ip.net [84.59.76.183]) by mrelayeu.kundenserver.de (node=mreu0) with ESMTP (Nemesis) id 0LkknQ-1QKk203v76-00aRdN; Tue, 01 Feb 2011 17:10:57 +0100 Received: from [192.168.5.106] (dslb-084-059-076-183.pools.arcor-ip.net [84.59.76.183]) by office1.lan.sumadev.de (Postfix) with ESMTPA id BDF455F701 for ; Tue, 1 Feb 2011 17:10:56 +0100 (CET) Subject: [ANN] Plasma MapReduce, PlasmaFS, version 0.3 From: Gerd Stolpmann To: general@hadoop.apache.org Content-Type: text/plain; charset="UTF-8" Date: Tue, 01 Feb 2011 17:10:55 +0100 Message-ID: <1296576655.24058.101.camel@thinkpad> Mime-Version: 1.0 X-Mailer: Evolution 2.28.1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:VzzLd84NUSy8Aze4b5btkGPUdU/ffsbaAcE+ELYn2XG 2sIauN9WiGoUCGUCZWmNWXjyidpHKAkAtXQGQUY5YrHJmerTgq RnJqDo+dx6lQscw2vQ1exh1fv7o3qr/uZeJvWMpWMQSoHBkSIa ME1QJhUzh9qgyPpG1OhPxPm2By/kT7OwZmsnH6/4W3oEaGppN0 vVm2G2cobAmKkY1+G9ubQ== Hi, This is about the release of Plasma-0.3, an alternate and independent implementation of map/reduce with its own dfs. This might also be interesting for Hadoop users and developers, because this project incorporates a number of new ideas. So far, Plasma works on smaller clusters and shows good signs of being scalable. HA support is still very incomplete. -- Plasma consists of two parts (for now), namely Plasma MapReduce, a map/reduce compute framework, and PlasmaFS, the underlying distributed filesystem. Major changes in version 0.3 : * Optimized blocklist representation (extent-based) * Improved block allocator to minimize disk seeks * Allocating datanode access tickets in advance * Sophisticated RAM management * The command-line utility "plasma" supports wildcards Of course, there are also numerous bug fixes and performance improvements. Plasma MapReduce is a distributed implementation of the map/reduce algorithm scheme written in Ocaml. PlasmaFS is the underlying distributed filesystem, also written in Ocaml. Especially the PlasmaFS approach has numerous differences compared to HDFS: * Data blocks are preallocated, and PlasmaFS takes care of block placement * Blocklists are extent-based * Metadata is stored in a PostgreSQL db * 2-phase commit is used to distribute the metadata db * the full set of file access functions is supported, including random writes * file accesses can be transaction-based * shared memory can be used for speeding up the data path to locally stored data blocks * we _think_ it is not possible to corrupt the namenode by accident or by crashes * PlasmaFS volumes can be directly mounted via NFS * PlasmaFS uses ONCRPC as protocol and not home-grown protocols (and one of the next releases will add security via GSS-API) * We got rid of multi-threading There is no need that user programs are written in Ocaml, as Plasma also support a streaming mode. Both pieces of software are bundled together in one download. The project page with further links is http://projects.camlcity.org/projects/plasma.html There is now also a homepage at http://plasma.camlcity.org This is an early alpha release (0.3). A lot of things work already, and you can already run distributed map/reduce jobs. However, it is in no way complete. Plasma is installable via GODI for Ocaml 3.12. For discussions on specifics of Plasma there is a separate mailing list: https://godirepo.camlcity.org/mailman/listinfo/plasma-list Gerd -- ------------------------------------------------------------ Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------