hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cristofer Weber <cristofer.we...@neogrid.com>
Subject RES: [maybe off-topic?] article: Solving Big Data Challenges for Enterprise Application Performance Management
Date Thu, 30 Aug 2012 14:51:53 GMT
About HMasters, yes, it's not clear. 

In section 6.1 they say that “Since we focused on a setup with a maximum of 12 nodes, we
did not assign the master node and jobtracker to separate nodes instead we deployed them with
data nodes." 

But in section 4.1 they say that "The configuration was done using a dedicated node for the
running master processes (NameNode and SecondaryNameNode), therefore for all the benchmarks
the specified number of servers correspond to nodes running slave processes (DataNodes and
TaskTrackers) as well as HBase’s region server processes."

About configurations, the first paragraph on "6. EXPERIENCES" contains this: "In our initial
test runs, we ran every system with the default configuration, and then tried to improve
the performance by changing various tuning parameters. We dedicated at least a week for configuring
and tuning each system (concentrating on one system at a time) to get a fair comparison."

I agree that would be nice to see this experiment with 0.94.1, but 0.90.4 was released a year
ago, so I understand that this version was the official version when these experiments were

Best regards,

-----Mensagem original-----
De: Dave Wang [mailto:dsw@cloudera.com] 
Enviada em: quinta-feira, 30 de agosto de 2012 10:49
Para: user@hbase.apache.org
Assunto: Re: [maybe off-topic?] article: Solving Big Data Challenges for Enterprise Application
Performance Management

My reading of the paper is that they are actually not clear about whether or not HMasters
were deployed on datanodes.

I'm going to guess that they just used default configurations for HBase and YCSB, but the
paper again is not specific enough.

Why were they using 0.90.4 in 2012?  Would have been nice to see some of the more recent work
done in the area of performance.

One thing the paper does touch on is the relative difficulty of standing up the cluster, which
has not changed since 0.90.4.  I think that's definitely something that could be improved

- Dave

On Thu, Aug 30, 2012 at 6:27 AM, Cristofer Weber < cristofer.weber@neogrid.com> wrote:

> Just read this article, "Solving Big Data Challenges for Enterprise 
> Application Performance Management." published this month @ Volume 5, 
> No.12 of Proceedings of the VLDB Endowment, where they measured 6 
> different databases - Project Voldemort, Redis, HBase, Cassandra, 
> MySQL Cluster and VoltDB - with YCSB on two different kind of 
> clusters, Memory-bound and Disk-bound,  and I'm in doubt about results for HBase since:
> *         HBase version was 0.90.4
> *         Master nodes were deployed together with data nodes
> *         They didn't reported tuning parameters
> There's also a paragraph where they reported that HBase failed 
> frequently in non-deterministic ways while running YCSB.
> My intention with this e-mail is to look for opinions from you, who 
> are more experienced with HBase, on where this experiment's setup 
> could be changed to improve read operations, since in this setup HBase 
> did not performed as well as Cassandra and Project Voldemort.
> Here's the article:
> http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf and Volume 5
> home: http://vldb.org/pvldb/vol5.html
View raw message