hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerd Stolpmann <i...@gerd-stolpmann.de>
Subject [ANN] Plasma MapReduce, PlasmaFS, version 0.3
Date Tue, 01 Feb 2011 16:10:55 GMT

This is about the release of Plasma-0.3, an alternate and independent
implementation of map/reduce with its own dfs. This might also be
interesting for Hadoop users and developers, because this project
incorporates a number of new ideas. So far, Plasma works on smaller
clusters and shows good signs of being scalable. HA support is still
very incomplete.


Plasma consists of two parts (for now), namely Plasma MapReduce, a
map/reduce compute framework, and PlasmaFS, the underlying distributed

Major changes in version 0.3 :

      * Optimized blocklist representation (extent-based)
      * Improved block allocator to minimize disk seeks
      * Allocating datanode access tickets in advance
      * Sophisticated RAM management
      * The command-line utility "plasma" supports wildcards

Of course, there are also numerous bug fixes and performance

Plasma MapReduce is a distributed implementation of the map/reduce
algorithm scheme written in Ocaml. PlasmaFS is the underlying
distributed filesystem, also written in Ocaml. Especially the PlasmaFS
approach has numerous differences compared to HDFS:

      * Data blocks are preallocated, and PlasmaFS takes care of block
      * Blocklists are extent-based
      * Metadata is stored in a PostgreSQL db
      * 2-phase commit is used to distribute the metadata db
      * the full set of file access functions is supported, including
        random writes
      * file accesses can be transaction-based
      * shared memory can be used for speeding up the data path to
        locally stored data blocks
      * we _think_ it is not possible to corrupt the namenode by
        accident or by crashes
      * PlasmaFS volumes can be directly mounted via NFS
      * PlasmaFS uses ONCRPC as protocol and not home-grown protocols
        (and one of the next releases will add security via GSS-API)
      * We got rid of multi-threading

There is no need that user programs are written in Ocaml, as Plasma also
support a streaming mode.

Both pieces of software are bundled together in one download. The
project page with further links is


There is now also a homepage at


This is an early alpha release (0.3). A lot of things work already, and
you can already run distributed map/reduce jobs. However, it is in no
way complete.

Plasma is installable via GODI for Ocaml 3.12.

For discussions on specifics of Plasma there is a separate mailing list:


Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714

View raw message