hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ondřej Stašek <ondrej.sta...@firma.seznam.cz>
Subject Re: Problems with scan after lot of Puts
Date Fri, 01 Jun 2012 08:07:28 GMT
Hallo J-D.

I'm currently tied to 0.90.6-cdh3u4. And this 1-row-skip seems to be the 
result of some strange RS restart. My test job is running now for 
several hours without error. I'll try to investigate it further and come 
up with some result.

Regards

   Ondrej Stasek

On 31.5.2012 19:45, Jean-Daniel Cryans wrote:
> There's  concurrent thread on the mailing list that refers to
> atomicity issues in 0.90 and issues with scans, may I suggest you run
> the test on 0.92.1 or 0.94.0? I did my testing on 0.94 and didn't get
> any issues after fixing the scanner.
>
> J-D
>
> On Thu, May 31, 2012 at 3:05 AM, Ondřej Stašek
> <ondrej.stasek@firma.seznam.cz>  wrote:
>> Hallo J-D.
>>
>>   Thanks for reply. I've modified my code to use scanner copies -
>> table.getScanner(new Scan(scan)) and run it again. Even after that I got an
>> error:
>>
>> 12/05/31 10:42:39 INFO hbase.TestPutScan: Run 5 put 1000000 rows
>> 12/05/31 10:44:09 INFO hbase.TestPutScan: Run 5 scan + del every 10th row
>> 12/05/31 10:44:33 ERROR hbase.TestPutScan: Expected value: value 0402040
>> 0000005, got: value 0402041 0000004
>>
>> It seems that 1 row was skipped during scan. Strange.
>>
>> I'll keep testing.
>>
>>   Ondrej Stasek
>>
>>
>> On 30.5.2012 21:05, Jean-Daniel Cryans wrote:
>>> There you go:
>>>
>>> 12/05/30 18:54:17 DEBUG client.MetaScanner: Scanning .META. starting
>>> at row=testtable,,00000000000000 for max=10 rows using
>>>
>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@f593af
>>> 12/05/30 18:54:17 DEBUG
>>> client.HConnectionManager$HConnectionImplementation: Cached location
>>> for
>>> testtable,test_row_0496107,1338404055995.e9c7a4ca97eb2be372445af4d3772031.
>>> is sv4r25s44:62023
>>> 12/05/30 18:54:17 DEBUG
>>> client.HConnectionManager$HConnectionImplementation: Removed
>>> testtable,,1338404055995.9389fe5538f19a6f2df27e3958dcb434. for
>>> tableName=testtable from cache because of test_row_0012550
>>> 12/05/30 18:54:17 DEBUG
>>> client.HConnectionManager$HConnectionImplementation: Cached location
>>> for testtable,,1338404055995.9389fe5538f19a6f2df27e3958dcb434. is
>>> sv4r25s44:62023
>>> 12/05/30 18:57:47 INFO hbase.TestPutScan: Run 5 scan
>>> 12/05/30 18:57:47 ERROR hbase.TestPutScan: Expected value: value
>>> 0000001 0000005, got: value 0496107 0000005
>>>
>>> That's a split so the ClientScanner did a reset on the start row. So
>>> I'm going to fix your code and see if I can get anything else.
>>>
>>> J-D
>>>
>>> On Wed, May 30, 2012 at 11:56 AM, Jean-Daniel Cryans
>>> <jdcryans@apache.org>    wrote:
>>>> I'm running it here, but I just remembered about this issue:
>>>>
>>>> "HTable.ClientScanner needs to clone the Scan object"
>>>> https://issues.apache.org/jira/browse/HBASE-4891
>>>>
>>>> And since you are reusing that Scan object, you could definitely hit this
>>>> issue.
>>>>
>>>> J-D
>>>>
>>>> On Tue, May 29, 2012 at 11:37 PM, Ondřej Stašek
>>>> <ondrej.stasek@firma.seznam.cz>    wrote:
>>>>> Here it is:
>>>>>
>>>>> http://pastebin.com/0AgsQjur
>>>>>
>>>>>
>>>>> On 29.5.2012 22:44, Jean-Daniel Cryans wrote:
>>>>>> Care to share that TestPutScan? Just attach it in a pastebin
>>>>>>
>>>>>> Thx,
>>>>>>
>>>>>> J-D
>>>>>>
>>>>>> On Tue, May 29, 2012 at 6:13 AM, Ondřej Stašek
>>>>>> <ondrej.stasek@firma.seznam.cz>      wrote:
>>>>>>> My program writes changes to HBase table by issuing lots of Puts
>>>>>>> (autoCommit
>>>>>>> turned off, flush on end) and afterwards uses ResultScanner on
whole
>>>>>>> table
>>>>>>> to read all rows and act upon them. My problem is that on several
>>>>>>> occasions
>>>>>>> scan does not return expected rows. Either scan does not start
on the
>>>>>>> beginning of table or somewhere during scan I got old data (not
those
>>>>>>> written by Puts before).
>>>>>>>
>>>>>>> I have even written simple test application to simulate this
behavior:
>>>>>>> 1. write 1M simple numbered rows to a table
>>>>>>> 2. scan through table to test output, delete every 10th row
>>>>>>> 3. scan again after delete
>>>>>>> 4. repeat until error found
>>>>>>>
>>>>>>> Sample output:
>>>>>>>
>>>>>>> 12/05/29 00:32:12 INFO hbase.TestPutScan: Run 342 put 1000000
rows
>>>>>>> 12/05/29 00:32:35 INFO hbase.TestPutScan: Run 342 scan + del
every
>>>>>>> 10th
>>>>>>> row
>>>>>>> 12/05/29 00:33:29 INFO hbase.TestPutScan: Run 342 scan
>>>>>>> 12/05/29 00:33:29 ERROR hbase.TestPutScan: Expected value: value
>>>>>>> 0000001
>>>>>>> 0000342, got: value 0281999 0000342
>>>>>>>
>>>>>>> This means, that program expected to get first row, but got 281999th.
>>>>>>>
>>>>>>> This test ran on "minicluster" of 2 regionservers runing Cloudera's
>>>>>>> cdh3u4
>>>>>>> distribution.
>>>>>>>
>>>>>>> Today I got 3 errors like that and from RS's log it seems that
in the
>>>>>>> same
>>>>>>> time hbase balancer issued reassign command for this table region
>>>>>>> (table
>>>>>>> have only 1 region).
>>>>>>>
>>>>>>> Any pointers on what to check or what to send you to help resolve
this
>>>>>>> issue?
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Ondrej Stasek
>>>>>>>
>>>>> --
>>>>> Ondřej Stašek
>>>>> Programátor senior
>>>>> Seznam.cz, a.s.
>>>>> Nádražní 159/21
>>>>> 370 01 České Budějovice 6
>>>>>
>>>>> tel.: +420 386 325 467
>>>>> gsm: +420 603 857 602
>>>>> icq: 164660005
>>>>> ondrej.stasek@firma.seznam.cz
>>>>> http://www.seznam.cz
>>>>>

Mime
View raw message