Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 43865 invoked from network); 15 May 2010 20:24:47 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 15 May 2010 20:24:47 -0000 Received: (qmail 29524 invoked by uid 500); 15 May 2010 20:24:46 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 29473 invoked by uid 500); 15 May 2010 20:24:46 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 29465 invoked by uid 99); 15 May 2010 20:24:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 May 2010 20:24:46 +0000 X-ASF-Spam-Status: No, hits=-3.4 required=10.0 tests=AWL,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [195.232.224.72] (HELO mailout03.vodafone.com) (195.232.224.72) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 May 2010 20:24:38 +0000 Received: from mailint03 (localhost [127.0.0.1]) by mailout03 (Postfix) with ESMTP id 882C711634A for ; Sat, 15 May 2010 22:24:15 +0200 (CEST) Received: from avoexs02.internal.vodafone.com (unknown [145.230.4.135]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by mailint03 (Postfix) with ESMTPS id 6E3731161B1 for ; Sat, 15 May 2010 22:24:15 +0200 (CEST) Received: from VF-MBX13.internal.vodafone.com ([145.230.5.24]) by avoexs02.internal.vodafone.com with Microsoft SMTPSVC(6.0.3790.3959); Sat, 15 May 2010 22:24:16 +0200 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----_=_NextPart_001_01CAF46C.974E7D39" Subject: RE: Using HBase on other file systems Date: Sat, 15 May 2010 22:19:57 +0200 Message-ID: <219D8244D980254ABF28AB469AD4E98F0346A073@VF-MBX13.internal.vodafone.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: <219D8244D980254ABF28AB469AD4E98F0346A073@VF-MBX13.internal.vodafone.com> Thread-Topic: Using HBase on other file systems Thread-Index: Acrz0T52Z3raIObtSiquyo1z3CclZgAmr5/A References: <75795.23209.qm@web65514.mail.ac4.yahoo.com> <219D8244D980254ABF28AB469AD4E98F0346A06C@VF-MBX13.internal.vodafone.com> <219D8244D980254ABF28AB469AD4E98F0346A06F@VF-MBX13.internal.vodafone.com> From: "Gibbon, Robert, VF-Group" To: X-OriginalArrivalTime: 15 May 2010 20:24:16.0661 (UTC) FILETIME=[975D7450:01CAF46C] ------_=_NextPart_001_01CAF46C.974E7D39 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Todd thanks for replying. 4x 7200 spindles and no RAID =3D approx 360 = IOPS to/from the backend storage, minimum and per node to run an HBase = cluster. Right? cheers Robert -----Original Message----- From: Todd Lipcon [mailto:todd@cloudera.com] Sent: Sat 5/15/2010 3:51 AM To: hbase-user@hadoop.apache.org Subject: Re: Using HBase on other file systems =20 On Fri, May 14, 2010 at 2:15 PM, Gibbon, Robert, VF-Group < Robert.Gibbon@vodafone.com> wrote: > Hmm. What level of IOPs does Hbase need in order to support a = reasonably > responsive level of service? How much latency in transfer times is > acceptable before the nodes start to fail? Do you use asynchronous IO > queueing? Write-through caching? Prefetching? > > Hi Robert. Have you read the Bigtable paper? It's a good description of = the general IO architecture of BigTable. You can also read the original = paper on Log-structured merge tree storage from back in the 90s. To answer your questions in brief: - Typical clusters run on between 4 and 12x 7200RPM SATA disks. Some = people run on 10k disks to get more random reads per second, but not necessary - latency in transfer times is a matter of what your application needs, = not a matter of what HBase needs. - no, we do not asynchronously queue reads - AIO support is lacking in = Java 6 and even in the current previews of Java7 it is a thin wrapper around threadpools and synchronous IO APIs. - HBases uses log-structured storage, which is somewhat the same as write-through caching in a way. We never do random-writes (in fact = they're impossible in HDFS) -Todd > > On Fri, May 14, 2010 at 12:02 PM, Gibbon, Robert, VF-Group < > Robert.Gibbon@vodafone.com> wrote: > > > > > My thinking is around separation of concerns - at an OU level not = just at > a > > system integration level. Walrus gives me a consistent, usable > abstraction > > layer to transparently substitute the storage implementation - for > example > > from symmetrix <--> isilon or anything in between. Walrus is storage > > subsystem agnostic, so it need not be configured for inconsistency = like > the > > Amazon service it emulates. > > > > Tight coupling for lock-in is a great commercial technique often = seen > with > > suppliers. But it is a bad one. Very bad. > > > > However, reasonably tight coupling between a database (HBase) and its > storage layer (HDFS) is IMHO absolutely necessary to achieve a certain > level > of correctness and performance. In HBase's case we use the Hadoop > FileSystem > interface, so in theory it will work on anyone who has implemented = said > interface, but I wouldn't run a production instance on anything but = HDFS. > > It's worth noting that most commercial databases operate on direct = block > devices rather than on top of filesystems, so that they don't have to = deal > with varying semantics/performance between ext3,ext4,xfs,ufs, myriad = other > single-node filesystems that exist. > > -Todd > > > > > > > > -----Original Message----- > > From: Andrew Purtell [mailto:apurtell@apache.org] > > Sent: Thu 5/13/2010 11:54 PM > > To: hbase-user@hadoop.apache.org > > Subject: RE: Using HBase on other file systems > > > > You really want to run HBase backed by Eucalyptus' Walrus? What do = you > have > > behind that? > > > > > From: Gibbon, Robert, VF-Group > > > Subject: RE: Using HBase on other file systems > > [...] > > > NB. I checked out running HBase over Walrus (an AWS S3 > > > clone): bork - you want me to file a Jira on that? > > > > > > > > > > > > > > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera > > --=20 Todd Lipcon Software Engineer, Cloudera ------_=_NextPart_001_01CAF46C.974E7D39--