Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 11123 invoked from network); 10 Apr 2009 18:52:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Apr 2009 18:52:19 -0000 Received: (qmail 1605 invoked by uid 500); 10 Apr 2009 18:52:17 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 1536 invoked by uid 500); 10 Apr 2009 18:52:17 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 1421 invoked by uid 99); 10 Apr 2009 18:52:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Apr 2009 18:52:17 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of stas.oskin@gmail.com designates 209.85.218.176 as permitted sender) Received: from [209.85.218.176] (HELO mail-bw0-f176.google.com) (209.85.218.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Apr 2009 18:52:10 +0000 Received: by bwz24 with SMTP id 24so1247478bwz.29 for ; Fri, 10 Apr 2009 11:51:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=XN2ywZe+ZW25BKhJDDDbfGuZYaYkBhlbBR0EMD0ArT8=; b=JYgvehNwRgRb2R+8PBnO7NDvs68hPgswyy79UbuPgGAV+3cmU+GrWMUz2aDZPK1WuP i8oj6ESbMIhysCvUXHXKJMO7GOkecvzXd3zee/Kl3zjYL+xvcQF3vx+15r0UsXOKeHzI iYr/Nf0eX7G1fnLVp0osjQe68nDUqXCsUtr00= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=CIjxvMZyK/GJjr+N1SyIy8kvuBLArg5EFGJ914kBnZUPOAmu0EsZsU1tgM901u7eQB LFsFfBNrOGqm0fNV5Z2oHZXy2UU4VpZM59V6Le8tjlLQRD2c7yem4//EItmaZCRwEpmi KNGz18LTLqe0FD1cpn5lpRIx3nnJOeZviPtC0= MIME-Version: 1.0 Received: by 10.223.116.77 with SMTP id l13mr1115871faq.106.1239389508766; Fri, 10 Apr 2009 11:51:48 -0700 (PDT) In-Reply-To: <1038C178-357B-4A3F-90F6-9D0F509733E3@cse.unl.edu> References: <77938bc20904091545x623893f6jef73eaa4cac429f0@mail.gmail.com> <77938bc20904100740r37c25a0dwa7f473ac90f62593@mail.gmail.com> <1038C178-357B-4A3F-90F6-9D0F509733E3@cse.unl.edu> Date: Fri, 10 Apr 2009 21:51:48 +0300 Message-ID: <77938bc20904101151r23f10826ie1559e6fe9192d7@mail.gmail.com> Subject: Re: HDFS read/write speeds, and read optimization From: Stas Oskin To: core-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001636c5afb6688d6c046737db4a X-Virus-Checked: Checked by ClamAV on apache.org --001636c5afb6688d6c046737db4a Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hi. > > Depends on what kind of I/O you do - are you going to be using MapReduce > and co-locating jobs and data? If so, it's possible to get close to those > speeds if you are I/O bound in your job and read right through each chunk. > If you have multiple disks mounted individually, you'll need the number of > streams equal to the number of disks. If you're going to do I/O that's not > through MapReduce, you'll probably be bound by the network interface. > Btw, this what I wanted to ask as well: Is it more efficient to unify the disks into one volume (RAID or LVM), and then present them as a single space? Or it's better to specify each disk separately? Reliability-wise, the latter sounds more correct, as a single/several (up to 3) disks going down won't take the whole node with them. But perhaps there is a performance penalty? --001636c5afb6688d6c046737db4a--