hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: scanner is returning everything in parent region plus one of the daughters?
Date Mon, 15 Jun 2009 17:08:42 GMT
On Sun, Jun 14, 2009 at 10:54 AM, Andrew Purtell <apurtell@apache.org>wrote:

> Hi J-D,
> I agree on all your points. Regarding test hosting, I wonder if anyone
> has resources available to dedicate on a long term basis. I have a 4 node
> testbed which could conceivably run some suite once per day and generate
> some automated report, but I can't guarantee the availability of it. We
> might also consider EC2, as long as the tests are all self contained, all
> I/O between instances only, no data in/out or S3 charges. Using the usage
> calculator (http://calculator.s3.amazonaws.com/calc5.html), it seems that
> 5 extra large instances running for 5 hours once per day will cost $140/
> month. 10 of them would cost $280, etc. That is not a large figure.

IIRC, Amazon donated the Hadoop project time.  Let me see if I can find out
more about state of this resource and if we can get in on it.

Yes, to what J-D says.  Lets do some thinking and dev. around testing.  The
bulk of our unit tests are starting up mini clusters and trying stuff.
Often they are susceptible to failure when run on different hardwares.  The
pattern should be more testing of individual components.  We need to work on
mock objects to help make testing easier.

Also, our unit tests are crusty.   The bulk were written for another time
for earlier interfaces.  They have been carried down through time but their
effectiveness wanes.

I'd like to suggest that we develop testing tiers: unit tests that are run
on every checkin and up on hudson and integration tests that are run on big
checkins and before releases (these can be done as unit tests if it makes
sense but maybe we need to work out some kinda scripting framework).  The
latter we might run on a period up on ec2 or so as Andrew suggests.

> Further, this 'test.rb' thing is a distillation of some of the HBase usage
> of my crawler application, the write path. I may also simulate some of the
> scan/read path, the document processing bits. It would be great if we can
> get other contributions of test cases that simulate real world
> applications. Maybe there are examples to draw on from stuff running at
> Powerset, Streamy, Openspaces, etc.


Have you looked at TestSplit in the reionserver package?  Is it very
different from test.rb content (I suppose latter is run from client-side)?


>   - Andy
> ________________________________
> From: Jean-Daniel Cryans <jdcryans@apache.org>
> To: hbase-dev@hadoop.apache.org
> Sent: Sunday, June 14, 2009 9:59:26 AM
> Subject: Re: scanner is returning everything in parent region plus one of
> the  daughters?
> Andrew,
> +1 I think it's a great idea.
> Building on that, I think we should have system-level tests to make
> sure we don't break performance and reliability. For example, an
> intensive and simultaneous read/write test of a couple of millions of
> rows. We could even think of killing a region server or two during
> that test (and a master of course). Currently, I don't think it's
> easily doable on Hudson so someone would have to host it on a small
> cluster.
> J-D
> On Sun, Jun 14, 2009 at 12:52 PM, Andrew Purtell<apurtell@apache.org>
> wrote:
> > This possibly belongs in one of the new existing/open issues put up over
> the
> > past few days:
> >
> > Insert 1000 rows with random row keys, and induce a split (see test.rb
> > attached to HBASE-1500). I would expect that no more than 1000 rows
> should
> > be returned from a row count. However, the following is a series of row
> > counts obtained after running the test, with total reinitialization in
> > between, 5 times:
> >
> >    1516
> >    1492
> >    1497
> >    1509
> >    1501
> >
> > Also the shell provides an additional clue:
> >
> >    Current count: 1000, row: ffdcee2a75742697b375edef62fa4b75
> >
> >    1516 row(s) in 2.9530 seconds
> >
> > Looks like the parent region is fully iterated first, then in addition
> > one of the daughters?
> >
> > Also, as these issues come up, kindly consider adding test cases to the
> > test suite to catch these regressions. It seems the current coverage for
> > scanners is letting big issues pass unnoticed.
> >
> > One thing we could do right away is commit my 'test.rb' reimplemented
> > as Java/JUnit into the suite, with some additional logic to test that
> > the scanners return the count of unique row keys inserted. If no -1 I
> > will go ahead and do that.
> >
> >  - Andy
> >
> >
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message