hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: [ANN] Plasma MapReduce, PlasmaFS, version 0.4
Date Sat, 24 Mar 2012 02:25:32 GMT
I think your ideas here are useful, and it is sad the community mostly
ignored them.

Extents are powerful and break the HDFS flaw that requires one block ==
one file on a node => contiguous regoins on a node are one block in size.
Functional data structures (a.k.a. immutable or persistent ones) more
easily translate (much less lines of code) to disk-backed structures for
logging state changes.
Features such as append, random write, and truncation are significantly
simpler to implement with the design here.


On 10/12/11 9:31 AM, "Gerd Stolpmann" <info@gerd-stolpmann.de> wrote:

>This is about the release of Plasma-0.4, an alternate and independent
>implementation of map/reduce with its own dfs. This might also be
>interesting for Hadoop users and developers, because this project
>incorporates a number of new ideas. So far, Plasma has proven to work on
>smaller clusters and shows good signs of being scalable. The design of
>PlasmaFS is certainly superior to that of HDFS - I did not want a
>quick'n'dirty solution, so please have a look how to do it right.
>Concerning the features, these two pages compare Plasma and Hadoop:
>I hope you see where the point is.
>I have currently only limited resources for testing my implementation.
>If there is anybody interested in testing on bigger clusters, please let
>me know.
>Plasma consists of two parts (for now), namely Plasma MapReduce, a
>map/reduce compute framework, and PlasmaFS, the underlying distributed
>Plasma MapReduce is a distributed implementation of the map/reduce
>algorithm scheme written in Ocaml. PlasmaFS is the underlying
>distributed filesystem, also written in Ocaml. Especially the PlasmaFS
>approach has numerous differences compared to HDFS:
>      * Data blocks are preallocated, and PlasmaFS takes care of block
>        placement
>      * Blocklists are extent-based
>      * Metadata is stored in a PostgreSQL db (you need an SSD for
>        getting good performance, however)
>      * 2-phase commit is used to distribute the metadata db
>      * the full set of file access functions is supported, including
>        random writes
>      * file accesses can be transaction-based
>      * shared memory can be used for speeding up the data path to
>        locally stored data blocks
>      * we _think_ it is not possible to corrupt the namenode by
>        accident or by crashes
>      * PlasmaFS volumes can be directly mounted via NFS (we support
>        full POSIX semantics, including random writes)
>      * There are symlinks.
>      * PlasmaFS uses ONCRPC as protocol and not home-grown protocols.
>        A security module is available.
>      * We got rid of multi-threading
>There is no need that user programs are written in Ocaml, as files are
>accessible via NFS, and Plasma also supports a streaming mode. (But yes,
>it is nice to program map/reduce in a functional programming language!)
>Both pieces of software are bundled together in one download. The
>project page with further links is
>There is now also a homepage at
>This is an early alpha release (0.4). A lot of things work already, and
>you can already run distributed map/reduce jobs. However, it is in no
>way complete.
>Plasma is installable via GODI for Ocaml 3.12.
>For discussions on specifics of Plasma there is a separate mailing list:
>Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
>Creator of GODI and camlcity.org.
>Contact details:        http://www.camlcity.org/contact.html
>Company homepage:       http://www.gerd-stolpmann.de
>*** Searching for new projects! Need consulting for system
>*** programming in Ocaml? Gerd Stolpmann can help you.

View raw message