Return-Path: Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: (qmail 68508 invoked from network); 28 Jan 2011 18:39:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 Jan 2011 18:39:53 -0000 Received: (qmail 17480 invoked by uid 500); 28 Jan 2011 18:39:53 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 17403 invoked by uid 500); 28 Jan 2011 18:39:52 -0000 Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-user@hadoop.apache.org Delivered-To: mailing list hdfs-user@hadoop.apache.org Received: (qmail 17395 invoked by uid 99); 28 Jan 2011 18:39:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Jan 2011 18:39:52 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of nrutman@gmail.com designates 209.85.210.48 as permitted sender) Received: from [209.85.210.48] (HELO mail-pz0-f48.google.com) (209.85.210.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Jan 2011 18:39:44 +0000 Received: by pzk28 with SMTP id 28so588306pzk.35 for ; Fri, 28 Jan 2011 10:39:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:mime-version:content-type:subject:date :in-reply-to:to:references:message-id:x-mailer; bh=eCLtBFjYf4fEgQpPuKdV+OSffPYGduahLtQlFyOxcHk=; b=p3LaiQCwr7sgo6PIhx8HT997MpIKsT+hrKAX3+pjQaocMw49pl9i9geCPnm9yW0l/2 RDPtnPjMyYatHLSGVhJ/Su7xauuI/4eoYlLqhCYB6R+JCQ0Y8BvxbkILtSw2jqGM1ZNn DYm/irTxvVDAZDNQMtw1LTFB2PtQqo2WcsqrY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:mime-version:content-type:subject:date:in-reply-to:to :references:message-id:x-mailer; b=LxcfwdrtJjeTjNpveYC+OlG1BlxCn7l+fyoqhdirbB++C5d351ct9aXnIZ1yWB+p8o kOYcoVqLw2gHvw0B4vmnFR3gQhQNqE4x7cr+j4yjpO/xaJgV8s+hNmLohszoOUBY2O9L pBiy0F+BhHoqc6KJsZxMMeMtGpKRacZCraGco= Received: by 10.142.139.4 with SMTP id m4mr3397812wfd.162.1296239964620; Fri, 28 Jan 2011 10:39:24 -0800 (PST) Received: from [192.168.1.132] (c-76-105-140-112.hsd1.or.comcast.net [76.105.140.112]) by mx.google.com with ESMTPS id y42sm23650445wfd.22.2011.01.28.10.39.21 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 28 Jan 2011 10:39:22 -0800 (PST) From: Nathan Rutman Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: multipart/alternative; boundary=Apple-Mail-201-105180436 Subject: Re: TestDFSIO on Lustre vs HDFS Date: Fri, 28 Jan 2011 10:39:20 -0800 In-Reply-To: To: hdfs-user@hadoop.apache.org References: <4D7507BA-75D1-4F2D-A83D-428A3EB5D579@gmail.com> Message-Id: <3CD7F5B9-1702-4EA8-AE0B-0F1C689C027B@gmail.com> X-Mailer: Apple Mail (2.1082) --Apple-Mail-201-105180436 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Hi Rita, thanks for a great response. On Jan 27, 2011, at 7:31 PM, Rita wrote: > Comparing apples and oranges. Certainly some factors are comparable, others are not. I was primarily = interested in performance of Hadoop IO. > Lustre is great filesystem but has no native fault tolerance. If you = want POSIX filesystem with high performance than Lustre does it. = However, if you want to access data in a heterogeneous environment and = not POSIX complaint then hdfs is the tool.=20 I am so on the same page as you :) Your storage type should depend on the kind of data your storing, the = quantity, the reliability, scalabilty, heterogenicity (sic), data access = pattern, applications you're using, performance requirements, and system = cost. My point in posting this stuff is not to say the Lustre should = be your choice for Hadoop backend in all situations. It was really to = show that HDFS was designed for a particular usage pattern and scale, = and using it outside of that realm may not be the best choice. I was = looking to the HDFS community to poke holes in my arguments. =20 >=20 >=20 > I've read an earlier thread from you, before you choose a filesystem = some things to consider: >=20 > Cost: Any exoctic software hardware needed? (Lustre and hdfs can run = very well on commodity hardware)=20 > Transparency: Any application change needed? Lustre wins in this! With = hdfs you would have to convert or make changes in the way you access the = data > Scalability: Both scale well. > Implementation cost: The cost of implementing a solution and = maintaining it. HDFS wins. It will run on any server which will run = java. No kernel modules, no kernel configuration, etc...it just works = out of the box I'd say that HDFS probably wins on the "exotic hardware" requirements -- = Lustre failover typically requires standalone RAID boxes, redundant = servers, and redundant network pathing in order to achieve data access = reliability. (It can run without this stuff, but that introduces single = points of failure.) Also, to get improved Hadoop performance, the = network needs to be more expensive than 1gigE. And Lustre requires more = sysadmin care and understanding, which adds to total cost of ownership. But all of that is a "fixed" cost -- it does not scale linearly with = your storage size. If you double your storage requirement, you'll pay = ~1.2x for RAID parity and spare space with Lustre, but you'll pay 3x for = HDFS disks. The Lustre initial costs are higher. So at some scale = there will necessarily be a cost crossover. Some other factors: there is the cost per megabyte, and there is also a = cost per megabyte per second. If performance is important to you = (again, it becomes more of an issue at larger scales), then that also = must enter the calculation. Or, if you only care about 100% data = availability, that also will influence your choice. Are you just using = Hadoop or HBase, or do you need to run other distributed software? =20 Thanks all for your time and responses. >=20 >=20 >=20 >=20 >=20 >=20 >=20 > On Thu, Jan 27, 2011 at 4:44 PM, Nathan Rutman = wrote: > In case others are interested, I ran a comparison of TestDFSIO on HDFS = vs Lustre. > This is on an 8-node Infiniband-connected cluster. For the Lustre = test, we replaced the HTTP transfer during the shuffle phase with a = simple hardlink to the data (since all data is always visible on all = nodes with Lustre). >=20 >=20 > Max Map Thread =3D 80; Max Reduce Thread =3D 1; File Size =3D 512MB; = Scheduler =3D JobQueue; Buffer Size =3D Default; Number of Nodes =3D 8; = Drive Speed =3D 80MB/s >=20 >=20 >=20 > The conclusion is that Lustre TestDFSIO performance is significantly = better than HDFS when using a fast network (as it theoretically should = be). On a slower network (e.g. 1gigE), I would not expect Lustre to = show much advantage over HDFS. >=20 >=20 >=20 >=20 > --=20 > --- Get your facts first, then you can distort them as you please.-- --Apple-Mail-201-105180436 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii Hi = Rita, thanks for a great response.

On Jan 27, 2011, at = 7:31 PM, Rita wrote:

Comparing = apples and oranges.
Certainly some factors are = comparable, others are not.  I was primarily interested in = performance of Hadoop IO.

Lustre is = great filesystem but has no native fault tolerance. If you want POSIX = filesystem with high performance than Lustre does it. However, if you = want to access data in a heterogeneous environment and not POSIX = complaint then hdfs is the tool.
I am so on the = same page as you :)

Your storage type should depend = on the kind of data your storing, the quantity, the reliability, = scalabilty, heterogenicity (sic), data access pattern, applications = you're using, performance requirements, and system cost.   My = point in posting this stuff is not to say the Lustre should be your = choice for Hadoop backend in all situations.  It was really to show = that HDFS was designed for a particular usage pattern and scale, and = using it outside of that realm may not be the best choice.  I was = looking to the HDFS community to poke holes in my = arguments.
 


I've read an earlier thread from you, before you choose a = filesystem some things to consider:

Cost: Any exoctic software hardware needed? (Lustre = and hdfs can run very well on commodity = hardware) 
Transparency: Any application change needed? Lustre wins = in this! With hdfs you would have to convert or make changes in the way = you access the data
Scalability: Both scale well.
Implementation cost: The cost of = implementing a solution and maintaining it. HDFS wins.  It will run = on any server which will run java. No kernel modules, no kernel = configuration, etc...it just works out of the = box

I'd say that HDFS probably wins = on the "exotic hardware" requirements -- Lustre failover typically = requires standalone RAID boxes, redundant servers, and redundant = network pathing in order to achieve data access reliability.  (It = can run without this stuff, but that introduces single points of = failure.)  Also, to get improved Hadoop performance, the network = needs to be more expensive than 1gigE.  And Lustre requires more = sysadmin care and understanding, which adds to total cost of = ownership.
But all of that is a "fixed" cost -- it does not = scale linearly with your storage size. If you double your storage = requirement, you'll pay ~1.2x for RAID parity and spare space with = Lustre, but you'll pay 3x for HDFS disks.  The Lustre initial costs = are higher.  So at some scale there will necessarily be a cost = crossover.

Some other factors: there is the = cost per megabyte, and there is also a cost per megabyte per second. =  If performance is important to you (again, it becomes more of an = issue at larger scales), then that also must enter the calculation. =  Or, if you only care about 100% data availability, that also will = influence your choice.  Are you just using Hadoop or HBase, or do = you need to run other distributed software? =  


Thanks all for your time = and responses.









On Thu, Jan 27, = 2011 at 4:44 PM, Nathan Rutman <nrutman@gmail.com> = wrote:
In = case others are interested, I ran a comparison of TestDFSIO on HDFS vs = Lustre.
This is on an 8-node = Infiniband-connected cluster.  For the Lustre test, we replaced the = HTTP transfer during the shuffle phase with a simple hardlink to the = data (since all data is always visible on all nodes with Lustre).


Max Map Thread =3D 80; Max = Reduce Thread =3D 1; File Size =3D 512MB; Scheduler =3D JobQueue; Buffer = Size =3D Default; Number of Nodes =3D 8; Drive Speed =3D 80MB/s


The conclusion is = that Lustre TestDFSIO performance is significantly better than HDFS when = using a fast network (as it theoretically should be).  On a slower = network (e.g. 1gigE), I would not expect Lustre to show much advantage = over HDFS.




--
--- Get your facts first, then you can = distort them as you please.--

= --Apple-Mail-201-105180436--