hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerd Stolpmann <i...@gerd-stolpmann.de>
Subject Re: [ANN] Plasma MapReduce, PlasmaFS, version 0.4
Date Thu, 20 Oct 2011 22:55:40 GMT
Am Donnerstag, den 13.10.2011, 13:48 +0100 schrieb Steve Loughran:
> On 12/10/11 17:31, Gerd Stolpmann wrote:
> > Hi,
> >
> > This is about the release of Plasma-0.4, an alternate and independent
> > implementation of map/reduce with its own dfs. This might also be
> > interesting for Hadoop users and developers, because this project
> > incorporates a number of new ideas. So far, Plasma has proven to work on
> > smaller clusters and shows good signs of being scalable. The design of
> > PlasmaFS is certainly superior to that of HDFS - I did not want a
> > quick'n'dirty solution, so please have a look how to do it right.
> >
> > Concerning the features, these two pages compare Plasma and Hadoop:
> >
> > http://plasma.camlcity.org/plasma/dl/plasma-0.4/doc/html/Plasmafs_and_hdfs.html
> >
> - without block checksums your code contains assumptions about HDD 
> integrity that does not stand up to the classic works by Pinhero or 
> Schroeder. Essentially you appear to be assuming that HDDs don't corrupt 
> data, yet both HDD and their interconnects can play up. For a recent 
> summary of Hadoop integrity, I would point you at [Loughran2011]
> http://www.slideshare.net/steve_l/did-you-reallywantthatdata

Interesting points. Checksums are now on my TODO list.

> -Hadoop NNs benefit from SSD too.

Well, "benefit" was here meant as "massive benefit". Plasma syncs after
each commit. Actually, being dependent on SSD technology is a bad point,
although manageable.

> -auth and security has improved recently, though I'd still run it in a 
> private subnet just to be sure
>  > 
> http://plasma.camlcity.org/plasma/dl/plasma-0.4/doc/html/Plasmamr_and_hadoop.html
>  >
>  > I hope you see where the point is.
> Again, support for small block size is relevant in small situations. In 
> larger clusters you will not only have larger block sizes, if you do 
> work on small blocks the sheer number of task trackers reporting back to 
> the JT can overload it.

This is not the whole story. I have recently written an article about
this point:


Of course, if you want to support smaller sized blocks, you have to
change other things, too. For example, PlasmaFS supports block list
compression. Also, Plasma's map/reduce creates way fewer tasks than
Hadoop's, and contacts the namenode less frequently. That's finally what
I meant with "support" - you have to somehow compensate the additional
overhead induced by it.

> > I have currently only limited resources for testing my implementation.
> > If there is anybody interested in testing on bigger clusters, please let
> > me know.
> That's one of the issues with the Plasma design: I'm not sure how well 
> things like Posix semantics, esp. locking and writes with offsets scale. 
> That's why the very large filesystems, HDFS included, tend to drop them. 
> Look at how much effort it took to get Append to work reliably.

You know, I'm a functional programmer, and this really helped here
getting it right. E.g. for offset writes there are three important
conceptual things:

- Offset writes can be implemented by allocating replacement blocks
  (i.e. we do a copy instead of mutating existing blocks). Very FP-ish:
  Create a new version of the same instead of overwriting.
- It's then only a matter of supporting the right data structures.
  Plasma uses a specially casted FP-style immutable data type for
  representing block lists. It's highly efficient, and provably
- Finally, these complex data structures must be made persistent.
  Plasma builds upon transactions that are isolated against each
  other. This also very FP-ish: we just write a new version, and
  replace the old one atomically.

I don't think there is any scalability issue (maybe except that you then
really need complex things like transactions, and these are no longer

> Without evidence of working at scale, I'm not sure how the claim "the 
> design of Plasma is certainly superior to HDFS" is defensible. Sorry.

Well, I cannot prove right now how much all this scales. I agree this is
important, but I am currently not able to jump over this barrier (lack
of hardware). All what I can do is to test with small clusters (the
largest was 4 machines so far), and to draw the conclusions. I'm quite
sure these tests indicates it would also work with 40 machines (by
simulating more clients), but I cannot say where the limit is where you
see a non-linear slowdown.

Also, let me say that I do not agree with the assumption that it is only
scalability that counts. Quality has many dimensions. How useful is a
highly scalable system when it does not support the features you need?
You see I'm focusing on a feature set that is recognizably different
from HDFS's.

> That said, using SunOS RPC/NFS as an FS protocol is nice as it does make 
> mounting straightforward. And as NFS locking isn't guaranteed in NFS, 
> you may be able to get away without it.

NFS is here only a secondary protocol - the primary PlasmaFS protocol
has more features like SQL-style transactions. But of course it has some
similarity with NFS, and supports all the features you finally need for

Adding locking (lockf) wouldn't be so difficult as such, but the
challenge is more to make it fast (locks need to be persistent in a
DFS). The simple locking method basing on exclusive file creation works

Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
*** Searching for new projects! Need consulting for system
*** programming in Ocaml? Gerd Stolpmann can help you.

View raw message