Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of nrutman@gmail.com designates
 209.85.210.48 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=from:mime-version:content-type:subject:date:in-reply-to:to
         :references:message-id:x-mailer;
        b=LxcfwdrtJjeTjNpveYC+OlG1BlxCn7l+fyoqhdirbB++C5d351ct9aXnIZ1yWB+p8o
         kOYcoVqLw2gHvw0B4vmnFR3gQhQNqE4x7cr+j4yjpO/xaJgV8s+hNmLohszoOUBY2O9L
         pBiy0F+BhHoqc6KJsZxMMeMtGpKRacZCraGco=
From: Nathan Rutman <nrutman@gmail.com>
Mime-Version: 1.0 (Apple Message framework v1082)
Content-Type: multipart/alternative; boundary=Apple-Mail-201-105180436
Subject: Re: TestDFSIO on Lustre vs HDFS
Date: Fri, 28 Jan 2011 10:39:20 -0800
In-Reply-To: <AANLkTikQ9bi00zBd9wTU-9tfjEp_UKeO03z5=fibKcuQ@mail.gmail.com>
To: hdfs-user@hadoop.apache.org
References: <4D7507BA-75D1-4F2D-A83D-428A3EB5D579@gmail.com>
 <AANLkTikQ9bi00zBd9wTU-9tfjEp_UKeO03z5=fibKcuQ@mail.gmail.com>
Message-Id: <3CD7F5B9-1702-4EA8-AE0B-0F1C689C027B@gmail.com>


--Apple-Mail-201-105180436
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

Hi Rita, thanks for a great response.

On Jan 27, 2011, at 7:31 PM, Rita wrote:

> Comparing apples and oranges.
Certainly some factors are comparable, others are not.  I was primarily =
interested in performance of Hadoop IO.

> Lustre is great filesystem but has no native fault tolerance. If you =
want POSIX filesystem with high performance than Lustre does it. =
However, if you want to access data in a heterogeneous environment and =
not POSIX complaint then hdfs is the tool.=20
I am so on the same page as you :)

Your storage type should depend on the kind of data your storing, the =
quantity, the reliability, scalabilty, heterogenicity (sic), data access =
pattern, applications you're using, performance requirements, and system =
cost.   My point in posting this stuff is not to say the Lustre should =
be your choice for Hadoop backend in all situations.  It was really to =
show that HDFS was designed for a particular usage pattern and scale, =
and using it outside of that realm may not be the best choice.  I was =
looking to the HDFS community to poke holes in my arguments.
=20
>=20
>=20
> I've read an earlier thread from you, before you choose a filesystem =
some things to consider:

>=20
> Cost: Any exoctic software hardware needed? (Lustre and hdfs can run =
very well on commodity hardware)=20

> Transparency: Any application change needed? Lustre wins in this! With =
hdfs you would have to convert or make changes in the way you access the =
data
> Scalability: Both scale well.
> Implementation cost: The cost of implementing a solution and =
maintaining it. HDFS wins.  It will run on any server which will run =
java. No kernel modules, no kernel configuration, etc...it just works =
out of the box

I'd say that HDFS probably wins on the "exotic hardware" requirements -- =
Lustre failover typically requires standalone RAID boxes, redundant =
servers, and redundant network pathing in order to achieve data access =
reliability.  (It can run without this stuff, but that introduces single =
points of failure.)  Also, to get improved Hadoop performance, the =
network needs to be more expensive than 1gigE.  And Lustre requires more =
sysadmin care and understanding, which adds to total cost of ownership.
But all of that is a "fixed" cost -- it does not scale linearly with =
your storage size. If you double your storage requirement, you'll pay =
~1.2x for RAID parity and spare space with Lustre, but you'll pay 3x for =
HDFS disks.  The Lustre initial costs are higher.  So at some scale =
there will necessarily be a cost crossover.

Some other factors: there is the cost per megabyte, and there is also a =
cost per megabyte per second.  If performance is important to you =
(again, it becomes more of an issue at larger scales), then that also =
must enter the calculation.  Or, if you only care about 100% data =
availability, that also will influence your choice.  Are you just using =
Hadoop or HBase, or do you need to run other distributed software? =20


Thanks all for your time and responses.


>=20
>=20
>=20
>=20
>=20
>=20
>=20
> On Thu, Jan 27, 2011 at 4:44 PM, Nathan Rutman <nrutman@gmail.com> =
wrote:
> In case others are interested, I ran a comparison of TestDFSIO on HDFS =
vs Lustre.
> This is on an 8-node Infiniband-connected cluster.  For the Lustre =
test, we replaced the HTTP transfer during the shuffle phase with a =
simple hardlink to the data (since all data is always visible on all =
nodes with Lustre).
>=20
>=20
> Max Map Thread =3D 80; Max Reduce Thread =3D 1; File Size =3D 512MB; =
Scheduler =3D JobQueue; Buffer Size =3D Default; Number of Nodes =3D 8; =
Drive Speed =3D 80MB/s
>=20
>=20
>=20
> The conclusion is that Lustre TestDFSIO performance is significantly =
better than HDFS when using a fast network (as it theoretically should =
be).  On a slower network (e.g. 1gigE), I would not expect Lustre to =
show much advantage over HDFS.
>=20
>=20
>=20
>=20
> --=20
> --- Get your facts first, then you can distort them as you please.--


--Apple-Mail-201-105180436
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=us-ascii

<html><head></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi =
Rita, thanks for a great response.<div><br><div><div>On Jan 27, 2011, at =
7:31 PM, Rita wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite">Comparing =
apples and oranges.<br></blockquote><div>Certainly some factors are =
comparable, others are not. &nbsp;I was primarily interested in =
performance of Hadoop IO.</div><br><blockquote type=3D"cite">Lustre is =
great filesystem but has no native fault tolerance. If you want POSIX =
filesystem with high performance than Lustre does it. However, if you =
want to access data in a heterogeneous environment and not POSIX =
complaint then hdfs is the tool. <br></blockquote><div>I am so on the =
same page as you :)</div><div><br></div>Your storage type should depend =
on the kind of data your storing, the quantity, the reliability, =
scalabilty, heterogenicity (sic), data access pattern, applications =
you're using, performance requirements, and system cost. &nbsp;&nbsp;My =
point in posting this stuff is not to say the Lustre should be your =
choice for Hadoop backend in all situations. &nbsp;It was really to show =
that HDFS was designed for a particular usage pattern and scale, and =
using it outside of that realm may not be the best choice. &nbsp;I was =
looking to the HDFS community to poke holes in my =
arguments.</div><div>&nbsp;<br><blockquote type=3D"cite">
<br><br>I've read an earlier thread from you, before you choose a =
filesystem some things to consider:</blockquote></div><div><blockquote =
type=3D"cite"><br>Cost: Any exoctic software hardware needed? (Lustre =
and hdfs can run very well on commodity =
hardware)&nbsp;</blockquote></div><div><blockquote =
type=3D"cite">Transparency: Any application change needed? Lustre wins =
in this! With hdfs you would have to convert or make changes in the way =
you access the data<br>
Scalability: Both scale well.<br>Implementation cost: The cost of =
implementing a solution and maintaining it. HDFS wins.&nbsp; It will run =
on any server which will run java. No kernel modules, no kernel =
configuration, etc...it just works out of the =
box<br></blockquote><div><br></div><div>I'd say that HDFS probably wins =
on the "exotic hardware" requirements -- Lustre failover typically =
requires standalone RAID boxes,&nbsp;redundant servers, and redundant =
network pathing in order to achieve data access reliability. &nbsp;(It =
can run without this stuff, but that introduces single points of =
failure.) &nbsp;Also, to get improved Hadoop performance, the network =
needs to be more expensive than 1gigE. &nbsp;And Lustre requires more =
sysadmin care and understanding, which adds to total cost of =
ownership.</div><div>But all of that is a "fixed" cost -- it does not =
scale linearly with your storage size. If you double your storage =
requirement, you'll pay ~1.2x for RAID parity and spare space with =
Lustre, but you'll pay 3x for HDFS disks. &nbsp;The Lustre initial costs =
are higher. &nbsp;So at some scale there will necessarily be a cost =
crossover.</div><div><br></div><div>Some other factors: there is the =
cost per megabyte, and there is also a cost per megabyte per second. =
&nbsp;If performance is important to you (again, it becomes more of an =
issue at larger scales), then that also must enter the calculation. =
&nbsp;Or, if you only care about 100% data availability, that also will =
influence your choice. &nbsp;Are you just using Hadoop or HBase, or do =
you need to run other distributed software? =
&nbsp;</div><div><br></div><div><br></div><div>Thanks all for your time =
and responses.</div><div><br></div><div><br></div><blockquote =
type=3D"cite">
<br><br><br><br><br><br><br><div class=3D"gmail_quote">On Thu, Jan 27, =
2011 at 4:44 PM, Nathan Rutman <span dir=3D"ltr">&lt;<a =
href=3D"mailto:nrutman@gmail.com">nrutman@gmail.com</a>&gt;</span> =
wrote:<br><blockquote class=3D"gmail_quote" style=3D"border-left: 1px =
solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: =
1ex;">
<div style=3D"word-wrap: break-word;"><div style=3D"margin: 0px;">In =
case others are interested, I ran a comparison of TestDFSIO on HDFS vs =
Lustre.</div><div style=3D"margin: 0px;">This is on an 8-node =
Infiniband-connected cluster. &nbsp;For the Lustre test, we replaced the =
HTTP transfer during the shuffle phase with a simple hardlink to the =
data (since all data is always visible on all nodes with Lustre).</div>
<div style=3D"margin: 0px;"><br></div><div style=3D"margin: =
0px;"><br></div><div style=3D"margin: 0px;">Max Map Thread =3D 80; Max =
Reduce Thread =3D 1; File Size =3D 512MB; Scheduler =3D JobQueue; Buffer =
Size =3D Default; Number of Nodes =3D 8; Drive Speed =3D 80MB/s</div>
<div style=3D"margin: 0px;"><span style=3D"font-family: Helvetica; =
font-size: medium;"><br></span></div><div style=3D"margin: 0px;"><span =
style=3D"font-family: Helvetica; font-size: medium;"><img =
src=3D"https://mail.google.com/mail/?ui=3D2&amp;ik=3Db72ae59666&amp;view=3D=
att&amp;th=3D12dc96f724aee391&amp;attid=3D0.1.1&amp;disp=3Demb&amp;zw" =
height=3D"302" width=3D"640"></span></div>
<div style=3D"margin: 0px;"><span style=3D"font-family: Helvetica; =
font-size: medium;"><br></span></div><div style=3D"margin: 0px;"><span =
style=3D"font-family: Helvetica; font-size: medium;">The conclusion is =
that Lustre TestDFSIO performance is significantly better than HDFS when =
using a fast network (as it theoretically should be). &nbsp;On a slower =
network (e.g. 1gigE), I would not expect Lustre to show much advantage =
over HDFS.</span></div>
<div style=3D"margin: 0px;"><span style=3D"font-family: Helvetica; =
font-size: medium;"><br></span></div></div></blockquote></div><br><br =
clear=3D"all"><br>-- <br>--- <span>Get your facts first, then you can =
distort them as you please.</span>--<br>

</blockquote></div><br></div></body></html>=

--Apple-Mail-201-105180436--