hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eran Kutner <e...@gigya-inc.com>
Subject Re: Questions while evaluating HBase
Date Thu, 04 Mar 2010 21:26:54 GMT
Thanks J-D, that's very helpful.

Your information about Thrift is great. Our primary development language is
C# so using Thrift will allow us to connect our existing code with HBase and
the penalty seems low enough to be worth it.

Unfortunately that is also our achilles' heel, we are far from being Java
experts and it will probably take us a lot of time to become experts so we
can debug and fix problems like you do. My thinking was to build two
independent clusters with cyclic replication so if one crashes we can switch
to the other one while we figure out how to fix the first. However, doing
that requires solid replication capabilities. Can I understand from your
description that you have cyclic, selective replication working in
production already? I see that it's scheduled to be released on 0.21, is it
possible to get it to work on 0.20?

As for the issue with shutting down the master node, what I see is that
running "hbase-daemon.sh stop master" continues printing dots forever.
Looking at the code for that script, it is trying to run "./hbase master
stop". If I run that command manually it seems to ignore the stop parameter
and trying to load another instance of the server which fails in my case
because the server is already running and the JMX port is busy. There is
nothing in the log and the out file only has the exception thrown by the JMX
trying to bind to the busy socket.

Thanks again, I really appreciate the information.

-eran


On Thu, Mar 4, 2010 at 20:28, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> Inline.
>
> J-D
>
> >   1. I assume you've seen this benchmark by Yahoo (
> >   http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf and
> >   http://www.brianfrankcooper.net/pubs/ycsb.pdf). They show three main
> >   problems: latency goes up quite significantly when doing more
> operations,
> >   operations/sec are capped at about half of the other tested platforms
> and
> >   adding new nodes interrupts the normal operation of the cluster for a
> while.
> >   Do you consider these results a problem and if so are there any plans
> to
> >   address them?
>
> Please see our answer
>
> http://www.search-hadoop.com/m?id=7c962aed1002091610q14f2d6f0gc420ddade319fe60@mail.gmail.com
>
> >   2. While running our tests (most were done using 0.20.2) we've had a
> few
> >   incidents where a table went into "transition" without ever going out
> of it.
> >   We had to restart the cluster to release the stuck tables. Is this a
> common
> >   issue?
>
> 0.20.3 has a much better story, 0.20.4 will include even more reliability
> fixes.
>
> >   3. If I understand correctly then any major upgrade requires completely
> >   shutting down the cluster while doing the upgrade as well as deploying
> a new
> >   version of the application compiled with the new version client? Did I
> get
> >   it correctly? Is there any strategy for upgrading while the cluster is
> still
> >   running?
>
> Lots of different reasons why: Hadoop RPC is versionned, a new Hadoop
> major version requires filesystem upgrades, etc...
>
> So for HBase, you currently can do rolling restarts between minor
> versions until told otherwise (in the release notes). See
> http://wiki.apache.org/hadoop/Hbase/RollingRestart
>
> Also Hadoop RPC will probably be replaced in the future with Avro and
> by then all releases should be backward compatible (we hope).
>
> >   4. This is more a bug report than a question but it seems that in
> 0.20.3
> >   the master server doesn't stop cleanly and has to be killed manually.
> Is
> >   someone else seeing it too?
>
> Can you provide more details? Logs and stack traces appreciated.
>
> >   5. Are there any performance benchmarks for the Thrift gateway? Do you
> >   have an estimate of the performance penalty of using the gateway
> compared to
> >   using the native API?
>
> The good thing with thrift servers is that those they have long lived
> clients so their cache is always full and HotSpot does it's magic. In
> our tests (we use Thrift servers in production here at StumbleUpon),
> it's maybe adding 1 or 2 ms per request...
>
> >   6. Right now, my biggest concern about HBase is its administration
> >   complexity and cost. If anyone can share their experience that would be
> a
> >   huge help. How many serves do you have in the cluster? How much ongoing
> >   effort does it take to administrate it? What uptime levels are you
> seeing
> >   (including upgrades)? Do you have any good strategy for running one
> cluster
> >   across two data centers, or replicating between two clusters in two
> >   different DCs? Did you have any serious problems/crashes/downtime with
> >   HBase?
>
> HBase does require a knowledgeable admin, but which DB doesn't if used
> on a very large scale? We have a full time DBA here for our mysql
> clusters but the difference is that those are easier to find than
> HBase admins, right? So some stats that we can make public:
>
> - We have a production cluster, another one for processing and a few
> other for dev and testing (we have 3 HBase committers on staff so...
> we need machines!). The production clusters have somewhat beefy nodes,
> i7s with 24GB of RAM and 4x1TB in JBOD. None has more than 40 nodes.
>
> - Cluster replication is actually a feature I'm working on. See
> http://issues.apache.org/jira/browse/HBASE-1295. We currently have 2
> clusters replicating to each other, each hosted in a different city
> and around 50M rows are sent each day (we aren't replicating
> everything tho).
>
> - We did have some good crashes, we even run unofficial releases
> sometimes, but since we are very knowledgeable we are able to fix
> those and we always get them committed.
>
> - I can't disclose our uptime since it would give hints about uptime
> of one of our product. I can say tho that it's getting better with
> every release but eh, HBase is still very bleeding edge.
>
> >
> >
> > Thanks a lot,
> > Eran Kutner
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message