hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: scanner is returning everything in parent region plus one of the daughters?
Date Sun, 14 Jun 2009 17:54:37 GMT
Hi J-D,

I agree on all your points. Regarding test hosting, I wonder if anyone
has resources available to dedicate on a long term basis. I have a 4 node
testbed which could conceivably run some suite once per day and generate
some automated report, but I can't guarantee the availability of it. We
might also consider EC2, as long as the tests are all self contained, all
I/O between instances only, no data in/out or S3 charges. Using the usage
calculator (http://calculator.s3.amazonaws.com/calc5.html), it seems that
5 extra large instances running for 5 hours once per day will cost $140/
month. 10 of them would cost $280, etc. That is not a large figure. 

Further, this 'test.rb' thing is a distillation of some of the HBase usage
of my crawler application, the write path. I may also simulate some of the
scan/read path, the document processing bits. It would be great if we can
get other contributions of test cases that simulate real world
applications. Maybe there are examples to draw on from stuff running at 
Powerset, Streamy, Openspaces, etc. 

   - Andy




________________________________
From: Jean-Daniel Cryans <jdcryans@apache.org>
To: hbase-dev@hadoop.apache.org
Sent: Sunday, June 14, 2009 9:59:26 AM
Subject: Re: scanner is returning everything in parent region plus one of the  daughters?

Andrew,

+1 I think it's a great idea.

Building on that, I think we should have system-level tests to make
sure we don't break performance and reliability. For example, an
intensive and simultaneous read/write test of a couple of millions of
rows. We could even think of killing a region server or two during
that test (and a master of course). Currently, I don't think it's
easily doable on Hudson so someone would have to host it on a small
cluster.

J-D

On Sun, Jun 14, 2009 at 12:52 PM, Andrew Purtell<apurtell@apache.org> wrote:
> This possibly belongs in one of the new existing/open issues put up over the
> past few days:
>
> Insert 1000 rows with random row keys, and induce a split (see test.rb
> attached to HBASE-1500). I would expect that no more than 1000 rows should
> be returned from a row count. However, the following is a series of row
> counts obtained after running the test, with total reinitialization in
> between, 5 times:
>
>    1516
>    1492
>    1497
>    1509
>    1501
>
> Also the shell provides an additional clue:
>
>    Current count: 1000, row: ffdcee2a75742697b375edef62fa4b75
>
>    1516 row(s) in 2.9530 seconds
>
> Looks like the parent region is fully iterated first, then in addition
> one of the daughters?
>
> Also, as these issues come up, kindly consider adding test cases to the
> test suite to catch these regressions. It seems the current coverage for
> scanners is letting big issues pass unnoticed.
>
> One thing we could do right away is commit my 'test.rb' reimplemented
> as Java/JUnit into the suite, with some additional logic to test that
> the scanners return the count of unique row keys inserted. If no -1 I
> will go ahead and do that.
>
>  - Andy
>
>
>



      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message