Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Subject: RE: Using HBase on other file systems
Date: Thu, 13 May 2010 22:22:54 +0200
Message-ID: 
 <219D8244D980254ABF28AB469AD4E98F0346A065@VF-MBX13.internal.vodafone.com>
Thread-Topic: Using HBase on other file systems
Thread-Index: Acry1QwBCFJBO++MR+COVdFdXoWe+gABDB4+
References: 
 <AANLkTiknQInpnbMPLaxeaaCW-7PvAl11rqpw-NC9b8CH@mail.gmail.com><655468.78099.qm@web65514.mail.ac4.yahoo.com><AANLkTikRZzTjw0bshhmcr_K2f5rOLBPjfXa4J-PiFZnn@mail.gmail.com><AANLkTilooYxgol1PiLYb1wzrcHZAKTe6GfMzF-DNg3hO@mail.gmail.com><AANLkTilP8T_PFoiYkc_RghBU4RKe8cME6-0VV45cIzYW@mail.gmail.com>
 <AANLkTilzqadtb1TvC_Gd_Bb0eQaXHU8axsQ1I8KiTswM@mail.gmail.com>
From: "Gibbon, Robert, VF-Group" <Robert.Gibbon@vodafone.com>
To: <hbase-user@hadoop.apache.org>,
	<hbase-user@hadoop.apache.org>

Yo

I feel the need to speak up.

GlusterFS is pretty configurable. It doesn't rely on HBAs but it does =
support them. Gig or 10G ethernet are also supported options. I would =
love to see HBase become GlusterFS aware, because the architecture is, =
frankly, more flexible than HDFS with fewer SPoF concerns. GlusterFS is =
node aware with the Disco MapReduce framework - why not HBase?

NB. I checked out running HBase over Walrus (an AWS S3 clone): bork - =
you want me to file a Jira on that?


-----Original Message-----
From: Ryan Rawson [mailto:ryanobjc@gmail.com]
Sent: Thu 5/13/2010 9:46 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Using HBase on other file systems
=20
Hey,

I think one of the key features of HDFS is its ability to be run on
standard hardware and integrate nicely in a standardized datacenter
environment.  I never would have got my project off the ground if I
had to convince my company to invest in infiniband switches.

So in the situation you described, you are getting only 50TB of
storage on 20 nodes and the parts list would be something like:
- 20 storage "bricks" w/infiniband and gigE ports
- infiniband switch, min 20 ports - probably better to get more
- 20 more HBase nodes, i'd like to have machines with 16+ GB ram,
ideally 24GB and above

At this point we could compare to my cluster setup which has 67TB of
raw space reported by HDFS:
- 20 HBase+HDFS nodes, 4TB/node, 16core w/24GB ram

In my case I am paying about $3-4k/node (depending on when you bought
them and from who) and I can leverage the gigE switching fabric (lower
cost per port).

So gluster sounds like and interesting but it sounds like at least 2x
as expensive for less space.  Presumably the performance benefits
would make it up, but if the clients aren't connected by infiniband
would you really see it?  At at least $1000/port I'm not sure it's
really worth it.

On Thu, May 13, 2010 at 8:09 AM, Edward Capriolo <edlinuxguru@gmail.com> =
wrote:
> On Thu, May 13, 2010 at 12:26 AM, Jeff Hammerbacher =
<hammer@cloudera.com>wrote:
>
>> Some projects sacrifice stability and manageability for performance =
(see,
>> e.g., =
http://gluster.org/pipermail/gluster-users/2009-October/003193.html
>> ).
>>
>> On Wed, May 12, 2010 at 11:15 AM, Edward Capriolo =
<edlinuxguru@gmail.com
>> >wrote:
>>
>> > On Wed, May 12, 2010 at 1:30 PM, Andrew Purtell =
<apurtell@apache.org>
>> > wrote:
>> >
>> > > Before recommending Gluster I suggest you set up a test cluster =
and
>> then
>> > > randomly kill bricks.
>> > >
>> > > Also as pointed out in another mail, you'll want to colocate
>> TaskTrackers
>> > > on Gluster bricks to get I/O locality, yet there is no way for =
Gluster
>> to
>> > > export stripe locations back to Hadoop.
>> > >
>> > > It seems a poor choice.
>> > >
>> > > =A0 - Andy
>> > >
>> > > > From: Edward Capriolo
>> > > > Subject: Re: Using HBase on other file systems
>> > > > To: "hbase-user@hadoop.apache.org" =
<hbase-user@hadoop.apache.org>
>> > > > Date: Wednesday, May 12, 2010, 6:38 AM
>> > > > On Tuesday, May 11, 2010, Jeff
>> > > > Hammerbacher <hammer@cloudera.com>
>> > > > wrote:
>> > > > > Hey Edward,
>> > > > >
>> > > > > I do think that if you compare GoogleFS to HDFS, GFS
>> > > > looks more full
>> > > > >> featured.
>> > > > >>
>> > > > >
>> > > > > What features are you missing? Multi-writer append was
>> > > > explicitly called out
>> > > > > by Sean Quinlan as a bad idea, and rolled back. From
>> > > > internal conversations
>> > > > > with Google engineers, erasure coding of blocks
>> > > > suffered a similar fate.
>> > > > > Native client access would certainly be nice, but FUSE
>> > > > gets you most of the
>> > > > > way there. Scalability/availability of the NN, RPC
>> > > > QoS, alternative block
>> > > > > placement strategies are second-order features which
>> > > > didn't exist in GFS
>> > > > > until later in its lifecycle of development as well.
>> > > > HDFS is following a
>> > > > > similar path and has JIRA tickets with active
>> > > > discussions. I'd love to hear
>> > > > > your feature requests, and I'll be sure to translate
>> > > > them into JIRA tickets.
>> > > > >
>> > > > > I do believe my logic is reasonable. HBase has a lot
>> > > > of code designed around
>> > > > >> HDFS. =A0We know these tickets that get cited all
>> > > > the time, for better random
>> > > > >> reads, or for sync() support. HBase gets the
>> > > > benefits of HDFS and has to
>> > > > >> deal with its drawbacks. Other key value stores
>> > > > handle storage directly.
>> > > > >>
>> > > > >
>> > > > > Sync() works and will be in the next release, and its
>> > > > absence was simply a
>> > > > > result of the youth of the system. Now that that
>> > > > limitation has been
>> > > > > removed, please point to another place in the code
>> > > > where using HDFS rather
>> > > > > than the local file system is forcing HBase to make
>> > > > compromises. Your
>> > > > > initial attempts on this front (caching, HFile,
>> > > > compactions) were, I hope,
>> > > > > debunked by my previous email. It's also worth noting
>> > > > that Cassandra does
>> > > > > all three, despite managing its own storage.
>> > > > >
>> > > > > I'm trying to learn from this exchange and always
>> > > > enjoy understanding new
>> > > > > systems. Here's what I have so far from your
>> > > > arguments:
>> > > > > 1) HBase inherits both the advantages and
>> > > > disadvantages of HDFS. I clearly
>> > > > > agree on the general point; I'm pressing you to name
>> > > > some specific
>> > > > > disadvantages, in hopes of helping prioritize our
>> > > > development of HDFS. So
>> > > > > far, you've named things which are either a) not
>> > > > actually disadvantages b)
>> > > > > no longer true. If you can come up with the
>> > > > disadvantages, we'll certainly
>> > > > > take them into account. I've certainly got a number of
>> > > > them on our roadmap.
>> > > > > 2) If you don't want to use HDFS, you won't want to
>> > > > use HBase. Also
>> > > > > certainly true, but I'm not sure there's not much to
>> > > > learn from this
>> > > > > assertion. I'd once again ask: why would you not want
>> > > > to use HDFS, and what
>> > > > > is your choice in its stead?
>> > > > >
>> > > > > Thanks,
>> > > > > Jeff
>> > > > >
>> > > >
>> > > > Jeff,
>> > > >
>> > > > Let me first mention that you have mentioned some thing as
>> > > > fixed, that
>> > > > are only fixed in trunk. I consider trunk futureware and I
>> > > > do not like
>> > > > to have tempral conversations. Even when trunk becomes
>> > > > current there
>> > > > is no guarentee that the entire problem is solved. After
>> > > > all appends
>> > > > were fixed in .19 or not , or again?
>> > > >
>> > > > I rescanned the gfs white paper to support my argument that
>> > > > hdfs is
>> > > > stripped down. Found
>> > > > Writes at offset ARE supported
>> > > > Checkpoints
>> > > > Application level checkpoints
>> > > > Snapshot
>> > > > Shadow read only master
>> > > >
>> > > > hdfs chose features it wanted and ignored others that is
>> > > > why I called
>> > > > it a pure map reduce implementation.
>> > > >
>> > > > My main point, is that hbase by nature needs high speed
>> > > > random read
>> > > > and random write. Hdfs by nature is bad at these things. If
>> > > > you can
>> > > > not keep a high cache hit rate via large block cache via
>> > > > ram hbase is
>> > > > going to slam hdfs doing large block reads for small parts
>> > > > of files.
>> > > >
>> > > > So you ask. Me what I would use instead. I do not think
>> > > > there is a
>> > > > viable alternative in the 100 tb and up range but I do
>> > > > think for
>> > > > people in the 20 tb range somethink like gluster that is
>> > > > very
>> > > > performance focused might deliver amazing results in some
>> > > > applications.
>> > > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > I did not recommend anything
>> >
>> > "people in the 20 tb range somethink like gluster that is very
>> > performance focused might deliver amazing results in some
>> > applications."
>> >
>> > I used words like "something. like. might."
>> >
>> > It may just be an interesting avenue of research.
>> >
>> > And since you mentioned
>> >
>> > "also as pointed out in another mail, you'll want to colocate
>> TaskTrackers
>> > on Gluster bricks to get I/O locality, yet there is no way for =
Gluster to
>> > export stripe locations back to Hadoop."
>> >
>> > 1) I am sure if someone was so included they could find a way to =
export
>> > that
>> > information from Gluster.
>> >
>> > 2) I think you meant DataNode not TaskTracker. In any case, I =
remember
>> > reading on list that a RegionServer is not guarenteed to be =
colocated
>> with
>> > a
>> > datanode, especially after a restart. Someone was going to open a =
ticket
>> > for
>> > it.
>> >
>>
>
> Posting a single link from the mailing list is anecdotal. I can point =
to
> many posts on the Hadoop-user, HBase user, and every product and the =
world
> and come to the determination that the product is unstable as a =
result. =A0(I
> am a member of gluster-users fyi)
>
> As for gluster, =A0people are pushing it to do much more then hadoop. =
Most are
> implementing cachining and posix locks on gluster as it works as a =
true
> filesystem, not a userspace filesystem with limited semantics like =
HDFS, so
> it is going to be more complex and have more problems, but you can do =
with
> it things you can not do with hadoop.
>
> I am not claiming that GlusterFS is more/less buggy performs =
better/worse
> then HDFS.
>
> What I am hypothisizing is: GlusterFS might have sweet-spot. 20 =
Gluster
> Bricks connected by infiniban, with a total storage capacity of =A050 =
TB.
> Throw hbase on that infiniban-bad boy and maybe get amazing =
perfomance. Just
> maybe.
>
> Sure HBase&Hadoop will almost assuredly scale better on the high end, =
but
> take into my account my hypothesis and use case. Maybe I have a fixed
> datasize but want the best performance possible. It is all about the =
sweat
> spot for your needs. I think HDFS is great, better then great, but I =
do not
> think it is the apex of storage technology, and perfect for every use =
case.
> I am not going to stop researching, theorizing, and trying alternative
> systems and implementations.
>