hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: scanner is returning everything in parent region plus one of the daughters?
Date Mon, 15 Jun 2009 01:54:24 GMT
Hey,

Yes, 1304 has revealed weaknesses in the automated tests.  It would be nice
if they were fully covering all edge cases and concurrent scenarios, but
such as it goes.

I'm not sure we need to be renting EC2 time... I have clusters, and so do
pset folks, and we do run tests and verification on them.  It's just that
1304 hit, just in time to have to prep a hundred slides and 3 talks.  It was
hoped there were few bugs, but 1304 really caused some neato bugs.

I appreciate test.rb - but moving forward I think all tests should remain in
Java.  Dynamic scripting languages on the JVM are very difficult to debug
top to bottom.  JUnit is best really :-)

-ryan

On Sun, Jun 14, 2009 at 10:54 AM, Andrew Purtell <apurtell@apache.org>wrote:

> Hi J-D,
>
> I agree on all your points. Regarding test hosting, I wonder if anyone
> has resources available to dedicate on a long term basis. I have a 4 node
> testbed which could conceivably run some suite once per day and generate
> some automated report, but I can't guarantee the availability of it. We
> might also consider EC2, as long as the tests are all self contained, all
> I/O between instances only, no data in/out or S3 charges. Using the usage
> calculator (http://calculator.s3.amazonaws.com/calc5.html), it seems that
> 5 extra large instances running for 5 hours once per day will cost $140/
> month. 10 of them would cost $280, etc. That is not a large figure.
>
> Further, this 'test.rb' thing is a distillation of some of the HBase usage
> of my crawler application, the write path. I may also simulate some of the
> scan/read path, the document processing bits. It would be great if we can
> get other contributions of test cases that simulate real world
> applications. Maybe there are examples to draw on from stuff running at
> Powerset, Streamy, Openspaces, etc.
>
>   - Andy
>
>
>
>
> ________________________________
> From: Jean-Daniel Cryans <jdcryans@apache.org>
> To: hbase-dev@hadoop.apache.org
> Sent: Sunday, June 14, 2009 9:59:26 AM
> Subject: Re: scanner is returning everything in parent region plus one of
> the  daughters?
>
> Andrew,
>
> +1 I think it's a great idea.
>
> Building on that, I think we should have system-level tests to make
> sure we don't break performance and reliability. For example, an
> intensive and simultaneous read/write test of a couple of millions of
> rows. We could even think of killing a region server or two during
> that test (and a master of course). Currently, I don't think it's
> easily doable on Hudson so someone would have to host it on a small
> cluster.
>
> J-D
>
> On Sun, Jun 14, 2009 at 12:52 PM, Andrew Purtell<apurtell@apache.org>
> wrote:
> > This possibly belongs in one of the new existing/open issues put up over
> the
> > past few days:
> >
> > Insert 1000 rows with random row keys, and induce a split (see test.rb
> > attached to HBASE-1500). I would expect that no more than 1000 rows
> should
> > be returned from a row count. However, the following is a series of row
> > counts obtained after running the test, with total reinitialization in
> > between, 5 times:
> >
> >    1516
> >    1492
> >    1497
> >    1509
> >    1501
> >
> > Also the shell provides an additional clue:
> >
> >    Current count: 1000, row: ffdcee2a75742697b375edef62fa4b75
> >
> >    1516 row(s) in 2.9530 seconds
> >
> > Looks like the parent region is fully iterated first, then in addition
> > one of the daughters?
> >
> > Also, as these issues come up, kindly consider adding test cases to the
> > test suite to catch these regressions. It seems the current coverage for
> > scanners is letting big issues pass unnoticed.
> >
> > One thing we could do right away is commit my 'test.rb' reimplemented
> > as Java/JUnit into the suite, with some additional logic to test that
> > the scanners return the count of unique row keys inserted. If no -1 I
> > will go ahead and do that.
> >
> >  - Andy
> >
> >
> >
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message