Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="----_=_NextPart_001_01CAF46C.974E7D39"
Subject: RE: Using HBase on other file systems
Date: Sat, 15 May 2010 22:19:57 +0200
Message-ID: 
 <219D8244D980254ABF28AB469AD4E98F0346A073@VF-MBX13.internal.vodafone.com>
Thread-Topic: Using HBase on other file systems
Thread-Index: Acrz0T52Z3raIObtSiquyo1z3CclZgAmr5/A
References: <75795.23209.qm@web65514.mail.ac4.yahoo.com>
 <219D8244D980254ABF28AB469AD4E98F0346A06C@VF-MBX13.internal.vodafone.com>
  <AANLkTinwzT1jzPoBudtvG4koJmLtl5p64hfBUDnLcqnl@mail.gmail.com>
  <219D8244D980254ABF28AB469AD4E98F0346A06F@VF-MBX13.internal.vodafone.com>
 <AANLkTikEJgBpEWwSkpeGmT6TlCAJ-X9TFIUxAUuIKZF7@mail.gmail.com>
From: "Gibbon, Robert, VF-Group" <Robert.Gibbon@vodafone.com>
To: <hbase-user@hadoop.apache.org>

------_=_NextPart_001_01CAF46C.974E7D39
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


Todd thanks for replying. 4x 7200 spindles and no RAID =3D approx 360 =
IOPS to/from the backend storage, minimum and per node to run an HBase =
cluster.

Right?

cheers
Robert

-----Original Message-----
From: Todd Lipcon [mailto:todd@cloudera.com]
Sent: Sat 5/15/2010 3:51 AM
To: hbase-user@hadoop.apache.org
Subject: Re: Using HBase on other file systems
=20
On Fri, May 14, 2010 at 2:15 PM, Gibbon, Robert, VF-Group <
Robert.Gibbon@vodafone.com> wrote:

> Hmm. What level of IOPs does Hbase need in order to support a =
reasonably
> responsive level of service? How much latency in transfer times is
> acceptable before the nodes start to fail? Do you use asynchronous IO
> queueing? Write-through caching? Prefetching?
>
>
Hi Robert. Have you read the Bigtable paper? It's a good description of =
the
general IO architecture of BigTable. You can also read the original =
paper on
Log-structured merge tree storage from back in the 90s.

To answer your questions in brief:
- Typical clusters run on between 4 and 12x 7200RPM SATA disks. Some =
people
run on 10k disks to get more random reads per second, but not necessary
- latency in transfer times is a matter of what your application needs, =
not
a matter of what HBase needs.
- no, we do not asynchronously queue reads - AIO support is lacking in =
Java
6 and even in the current previews of Java7 it is a thin wrapper around
threadpools and synchronous IO APIs.
- HBases uses log-structured storage, which is somewhat the same as
write-through caching in a way. We never do random-writes (in fact =
they're
impossible in HDFS)

-Todd


>
> On Fri, May 14, 2010 at 12:02 PM, Gibbon, Robert, VF-Group <
> Robert.Gibbon@vodafone.com> wrote:
>
> >
> > My thinking is around separation of concerns - at an OU level not =
just at
> a
> > system integration level. Walrus gives me a consistent, usable
> abstraction
> > layer to transparently substitute the storage implementation - for
> example
> > from symmetrix <--> isilon or anything in between. Walrus is storage
> > subsystem agnostic, so it need not be configured for inconsistency =
like
> the
> > Amazon service it emulates.
> >
> > Tight coupling for lock-in is a great commercial technique often =
seen
> with
> > suppliers. But it is a bad one. Very bad.
> >
>
> However, reasonably tight coupling between a database (HBase) and its
> storage layer (HDFS) is IMHO absolutely necessary to achieve a certain
> level
> of correctness and performance. In HBase's case we use the Hadoop
> FileSystem
> interface, so in theory it will work on anyone who has implemented =
said
> interface, but I wouldn't run a production instance on anything but =
HDFS.
>
> It's worth noting that most commercial databases operate on direct =
block
> devices rather than on top of filesystems, so that they don't have to =
deal
> with varying semantics/performance between ext3,ext4,xfs,ufs, myriad =
other
> single-node filesystems that exist.
>
> -Todd
>
>
> >
> >
> > -----Original Message-----
> > From: Andrew Purtell [mailto:apurtell@apache.org]
> > Sent: Thu 5/13/2010 11:54 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: RE: Using HBase on other file systems
> >
> > You really want to run HBase backed by Eucalyptus' Walrus? What do =
you
> have
> > behind that?
> >
> > > From: Gibbon, Robert, VF-Group
> > > Subject: RE: Using HBase on other file systems
> > [...]
> > > NB. I checked out running HBase over Walrus (an AWS S3
> > > clone): bork - you want me to file a Jira on that?
> >
> >
> >
> >
> >
> >
> >
> >
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>


--=20
Todd Lipcon
Software Engineer, Cloudera


------_=_NextPart_001_01CAF46C.974E7D39--