hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerd Stolpmann <i...@gerd-stolpmann.de>
Subject [ANN] Plasma MapReduce, PlasmaFS, version 0.4
Date Wed, 12 Oct 2011 16:31:51 GMT
Hi,

This is about the release of Plasma-0.4, an alternate and independent
implementation of map/reduce with its own dfs. This might also be
interesting for Hadoop users and developers, because this project
incorporates a number of new ideas. So far, Plasma has proven to work on
smaller clusters and shows good signs of being scalable. The design of
PlasmaFS is certainly superior to that of HDFS - I did not want a
quick'n'dirty solution, so please have a look how to do it right.

Concerning the features, these two pages compare Plasma and Hadoop:

http://plasma.camlcity.org/plasma/dl/plasma-0.4/doc/html/Plasmafs_and_hdfs.html

http://plasma.camlcity.org/plasma/dl/plasma-0.4/doc/html/Plasmamr_and_hadoop.html

I hope you see where the point is.

I have currently only limited resources for testing my implementation.
If there is anybody interested in testing on bigger clusters, please let
me know.

--

Plasma consists of two parts (for now), namely Plasma MapReduce, a
map/reduce compute framework, and PlasmaFS, the underlying distributed
filesystem.

Plasma MapReduce is a distributed implementation of the map/reduce
algorithm scheme written in Ocaml. PlasmaFS is the underlying
distributed filesystem, also written in Ocaml. Especially the PlasmaFS
approach has numerous differences compared to HDFS:

      * Data blocks are preallocated, and PlasmaFS takes care of block
        placement
      * Blocklists are extent-based
      * Metadata is stored in a PostgreSQL db (you need an SSD for
        getting good performance, however)
      * 2-phase commit is used to distribute the metadata db
      * the full set of file access functions is supported, including
        random writes
      * file accesses can be transaction-based
      * shared memory can be used for speeding up the data path to
        locally stored data blocks
      * we _think_ it is not possible to corrupt the namenode by
        accident or by crashes
      * PlasmaFS volumes can be directly mounted via NFS (we support
        full POSIX semantics, including random writes)
      * There are symlinks.
      * PlasmaFS uses ONCRPC as protocol and not home-grown protocols.
        A security module is available.
      * We got rid of multi-threading

There is no need that user programs are written in Ocaml, as files are
accessible via NFS, and Plasma also supports a streaming mode. (But yes,
it is nice to program map/reduce in a functional programming language!)

Both pieces of software are bundled together in one download. The
project page with further links is

http://projects.camlcity.org/projects/plasma.html

There is now also a homepage at

http://plasma.camlcity.org

This is an early alpha release (0.4). A lot of things work already, and
you can already run distributed map/reduce jobs. However, it is in no
way complete.

Plasma is installable via GODI for Ocaml 3.12.

For discussions on specifics of Plasma there is a separate mailing list:

https://godirepo.camlcity.org/mailman/listinfo/plasma-list

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
*** Searching for new projects! Need consulting for system
*** programming in Ocaml? Gerd Stolpmann can help you.
------------------------------------------------------------


Mime
View raw message