hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cristofer Weber <cristofer.we...@neogrid.com>
Subject RES: HBase and unit tests
Date Fri, 31 Aug 2012 12:33:44 GMT
Hi Nicolas!

For the other adapters (Cassandra, Cassandra + Thrift, Cassandra + Astyanax, etc) they managed
to run tests as Internal and External for unit tests and also have a profile for Performance
and Concurrent tests, where External and Performance/Concurrent runs over a live database
instance and only with Internal tests it is expected to start a database per test case, remaining
the same tests as in External. HBase adapter already have External and Performance/Concurrent
so I'm trying to provide the Internal set where the objective is to test Titan|HBase interaction.

And my goal is to achieve better times than Cassandra :-)

Singleton seems to be a good option, but I have to check if Maven Surefire can keep same process
between JUnit Test Cases. 

Because Titan work with adapters for different databases and manage table/CF creation when
not exists, I think it will not be possible to prefix table names per test without changing
some core components of Titan, and it seems to be too invasive to change this now, and deletion
is fast enough so we can keep same table.


Best regards,

-----Mensagem original-----
De: n keywal [mailto:nkeywal@gmail.com] 
Enviada em: sexta-feira, 31 de agosto de 2012 07:59
Para: user@hbase.apache.org
Assunto: Re: HBase and unit tests

Hi Cristopher,

HBase starts a minicluster for many of its tests because we have a lot of destructive tests.
Or the non destructive tests would be impacted by the destructive tests. When writing a client
application, you usually don't need to do that: you can rely on the same instance for all
your tests.

As well, it's useful to write the tests in a way compatible with a real cluster or a pseudo
distributed one. Sometimes, when the test fails, you want to have a look at what the code
wrote or found in HBase: you won't have this in a mini cluster. And it saves a start.

I don't know if there is a blog entry on this; but it's not very difficult to do (but as usual
not that easy when you start). I've personally done it with a singleton class + prefixing
the table names by a random key (to allow multiple tests in parallel on the same cluster without
relying on
cleanup) + getProperty to decide between starting a mini cluster or connecting to a cluster.



On Fri, Aug 31, 2012 at 12:28 PM, Cristofer Weber < cristofer.weber@neogrid.com> wrote:

> Hi Sonal, Stack and Ulrich!
> Yes, I should provide more details :$
> I reached the links you provided when I was searching for a way to 
> start HBase with JUnit. From default, the only params I have changed 
> are Zookeeper port and the amount of nodes, which is 1 in my case. 
> Based on logs I suspect that most of time are spent with HDFS and 
> that's why I asked if there is a way to start a standalone instance of 
> HBase. The amount of data written at each test case would probably fit 
> in memstore anyway, and table cleansing between each test method is managed by a loop
of deletes.
> At least 15 seconds are spent on starting the mini cluster for each 
> test case.
> Right now I reminded that I should turn off WAL when running unit 
> tests :-), but this will not reflect on startup time.
> Thanks!!
> Best regards,
> Cristofer
> ________________________________________
> De: Ulrich Staudinger [ustaudinger@gmail.com]
> Enviado: sexta-feira, 31 de agosto de 2012 2:21
> Para: user@hbase.apache.org
> Assunto: Re: HBase and unit tests
> As a general advice, although you probably do take care of this, 
> instantiate the mini cluster only once in your junit test constructor 
> and not in every test method. at the end of each test, either cleanup 
> your hbase or use a different "area" per test.
> best regards,
> ulrich
> --
> connect on xing or linkedin. sent from my tablet.
> On 31.08.2012, at 06:46, Stack <stack@duboce.net> wrote:
> > On Thu, Aug 30, 2012 at 4:44 PM, Cristofer Weber 
> > <cristofer.weber@neogrid.com> wrote:
> >> Hi there!
> >>
> >> After I started studying HBase, I've searched for open source 
> >> projects
> backed by HBase and I found Titan distributed graph database (you 
> probably heard about it). As soon as I read in their documentation 
> that HBase adapter is experimental and suboptimal (disclaimer here:
> https://github.com/thinkaurelius/titan/wiki/Using-HBase) I volunteered 
> to help improving this adapter and since then I made a few changes to 
> improve on running tests (reduced from hours to minutes) and also an 
> improvement on search feature.
> >>
> >> Now I'm trying to break the dependency on a pre-installed HBase for
> unit tests and found miniCluster inside HBase tests, but minicluster 
> demands too much time to start and I don't know if tweaking on configs 
> will improve significantly. Is there a way to start a 'lightweight' 
> instance, like programatically starting a standalone instance?
> >>
> >
> > How much is 'too much time' Cristofer?  Do you want a standalone 
> > cluster
> at all?
> > St.Ack
> > P.S. If digging in this area, you might find the blog post by the 
> > sematextians of use:
> >
> http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestin
> gutility-for-local-testing-development/

View raw message