hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cristofer Weber <cristofer.we...@neogrid.com>
Subject RES: [maybe off-topic?] article: Solving Big Data Challenges for Enterprise Application Performance Management
Date Thu, 30 Aug 2012 23:28:43 GMT
Being one of the guys who are "selling" the HBase idea at work (I've presented a PoC this week,
by the way!), I know that sometime I will have to explain the conclusions from articles like
this one, and this kind of conclusion probably will be really hard to explain. I will try
to reach the authors to check which kind of failures they faced and the performance improvements
that they made in their clusters, but this will not change the publication, sadly. 

On the other hand, I think that I can help in a way or another, documenting undocumented features,
collecting more data on effects of changes over default values and relating this changes to
different HBase use cases, etc. It's hard to start contributing to Open Source projects sophisticated
as HBase is, but can be a bit easer to contribute documenting features and running experiments,
and I think that there are other ones wondering if they can contribute to HBase as well, but
- speaking about myself - a lot of guidance is needed. Hope to get this guidance here ;-)

Best regards,
De: saint.ack@gmail.com [saint.ack@gmail.com] em Nome de Stack [stack@duboce.net]
Enviado: quinta-feira, 30 de agosto de 2012 19:04
Para: user@hbase.apache.org
Assunto: Re: [maybe off-topic?] article: Solving Big Data Challenges for Enterprise Application
Performance Management

On Thu, Aug 30, 2012 at 7:51 AM, Cristofer Weber
<cristofer.weber@neogrid.com> wrote:
> About HMasters, yes, it's not clear.
> In section 6.1 they say that “Since we focused on a setup with a maximum of 12 nodes,
we did not assign the master node and jobtracker to separate nodes instead we deployed them
with data nodes."
> But in section 4.1 they say that "The configuration was done using a dedicated node
for the running master processes (NameNode and SecondaryNameNode), therefore for all the benchmarks
the specified number of servers correspond to nodes running slave processes (DataNodes and
TaskTrackers) as well as HBase’s region server processes."
> About configurations, the first paragraph on "6. EXPERIENCES" contains this: "In our
initial test runs, we ran every system with the default configuration, and then tried to
improve the performance by changing various tuning parameters. We dedicated at least a week
for configuring and tuning each system (concentrating on one system at a time) to get a fair
> I agree that would be nice to see this experiment with 0.94.1, but 0.90.4 was released
a year ago, so I understand that this version was the official version when these experiments
were conducted.

Its a bit tough going back in time fixing 0.90.4 results.  The
"...failed frequently in non-deterministic ways..." is an ugly mark to
have hanging over hbase in a paper like this that will probably be
around a while.  I wonder what the cause was (I don't think that
typical of 0.90.4 IIRC).

On how to improve read performance, if its not in here,
http://hbase.apache.org/book.html#performance, in the refguide, then
the tuning option might as well not exist (Anyone see anything

We consistently do bad in these tests though our operational, actual
experience seems much better than what is shown in these benchmarks.
As has been said elsewhere on this thread, the takeaway is improved
defaults and auto-tuning but the only time we get interested in
addressing these issues is the once a year when one of these reports
come out; otherwise, we seem to have other priorities when messing in
hbase code base.

View raw message