accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: Accumulo on MapR Continued - LargeRowTest
Date Mon, 16 Apr 2012 22:32:32 GMT
I added a little program to the git repo that seek the rfiles directly
for a row.  I want to see if its possible to reproduce the problem
outside of the tablet server.   The program is called
LargeRowDirectQuery.  Looking at what you sent, the scan failed on the
61st generated row.  In the tablet server logs, I could see that it
was trying to read from
/user/mapr/accumulo-SE-test-04-15370/tables/2/t-0000007/F000000w.rf.
So you could run the following command after that test failed.

  accumulo org.apache.accumulo.server.test.functional.LargeRowDirectQuery
61 /user/mapr/accumulo-SE-test-04-15370/tables/2/t-0000007/F000000w.rf

Let me know how running this goes.

Keith

On Fri, Apr 13, 2012 at 8:22 PM, Keys Botzum <kbotzum@maprtech.com> wrote:
> Keith,
>
> Once again, thank you for your help. I appreciate your taking the time to create a debug
version with more trace.
>
> Attached is everything I think you wanted:
>
> output from running ./run.py -t large row -v 10 -d
>
>
>
>
> contents of temporary log directory:
>
>
>
>
> If needed, I can provide whatever else you might want. As I'm using your build, compression
is back on. If for some reason that makes it harder for you to debug this, let me know and
I can run it again with compression off.
>
> Thanks,
> Keys
> ________________________________
> Keys Botzum
> Senior Principal Technologist
> WW Systems Engineering
> kbotzum@maprtech.com
> 443-718-0098
> MapR Technologies
> http://www.mapr.com
>
>
>
> On Apr 13, 2012, at 7:04 PM, Keith Turner wrote:
>
>> Keys
>>
>> I created a version of Accumulo 1.4.0 with more debugging for this
>> problem on github.  If you have changes, you can send me a pull
>> request.
>>
>>    https://github.com/keith-turner/accumulo-1.4.0-MapR
>>
>>
>> If you pull this down and run it should print info in the tablet
>> server and test.  I would really like to the see the Verify call count
>> that the test prints, because verify is called multiple times in the
>> test.  So far I do not know which one of these verify calls is failing
>> for you.
>>
>>    ./run.py -t largerow -v 10 -d
>>      .
>>      .
>>      .
>>    Verify Call Count 6
>>      .
>>      .
>>      .
>>    Creating Range at row 23 initial bytes are: YlX58$iWq'57eW:[cd@?@?OF.<GHgN
)
>>    key = YlX58$iWq'57eW:[cd@?@?OF.<GHgN
>> )vF2;h$?Ja%aO&]LNeFdTQQP/o1#)%t1W... TRUNCATED : [] 1334170440123
>> false
>>
>>
>> The above is the last scan for row YIX58..., with the added debugging
>> I can go to the tserver log and see the following info about this
>> read.  I can find when a scan was started for this range and see
>> everything that rfile did (except for index reads).
>>
>>    13 18:44:55,438 [tabletserver.TabletServer] DEBUG: Starting scan,
>> range= [YlX58$iWq'57eW:[cd@?@?OF.<GHgN
>> )vF2;h$?Ja%aO&]LNeFdTQQP/o1#)%t1W... TRUNCATED : []
>> 9223372036854775807 false,YlX58$iWq'57eW:[cd@?@?OF.<GHgN
>> )vF2;h$?Ja%aO&]LNeFdTQQP/o1#)%t1W... TRUNCATED : []
>> 9223372036854775807 false)
>>    13 18:44:55,465 [rfile.RFile] DEBUG: Getting block offset=6480397
>> csize=107994 rsize=131093 entries=1 key=YlX58$iWq'57eW:[cd@?@?OF.<GHgN
>> )vF2;h$?Ja%aO&]LNeFdTQQP/o1#)%t1W... TRUNCATED : [] 1334170440123
>> false
>>    13 18:44:55,466 [rfile.RelativeKey] DEBUG: entering fastSkip()
>>    13 18:44:55,467 [rfile.RelativeKey] DEBUG: fieldsSame = 0
>>    13 18:44:55,467 [rfile.RelativeKey] DEBUG: len = 131072
>>    13 18:44:55,469 [rfile.RelativeKey] DEBUG: data =
>> YlX58$iWq'57eW:[cd@?@?OF.<GHgN )
>>    13 18:44:55,470 [rfile.RelativeKey] DEBUG: len = 0
>>    13 18:44:55,470 [rfile.RelativeKey] DEBUG: data =
>>    13 18:44:55,470 [rfile.RelativeKey] DEBUG: len = 0
>>    13 18:44:55,470 [rfile.RelativeKey] DEBUG: data =
>>    13 18:44:55,470 [rfile.RelativeKey] DEBUG: len = 0
>>    13 18:44:55,470 [rfile.RelativeKey] DEBUG: data =
>>    13 18:44:55,470 [rfile.RelativeKey] DEBUG: Read ts 1334170440123
>>    13 18:44:55,470 [rfile.RelativeKey] DEBUG: len = 2
>>    13 18:44:55,470 [rfile.RelativeKey] DEBUG: data = 23
>>    13 18:44:55,472 [rfile.RFile] DEBUG: Getting block offset=6588391
>> csize=107991 rsize=131093 entries=1
>> key=Z"?7-,mE:5Di&ou.4/4.i+9zGo0K8%%TsSt#!&a!&s
>> :OKl:2"cp>]yT(ZePrtEh... TRUNCATED : [] 1334170440149 false
>>    13 18:44:55,476 [rfile.RelativeKey] DEBUG: fieldsSame = 0
>>    13 18:44:55,477 [rfile.RelativeKey] DEBUG: len = 131072
>>    13 18:44:55,478 [rfile.RelativeKey] DEBUG: data =
>> Z"?7-,mE:5Di&ou.4/4.i+9zGo0K8%%T
>>    13 18:44:55,478 [rfile.RelativeKey] DEBUG: len = 0
>>    13 18:44:55,478 [rfile.RelativeKey] DEBUG: data =
>>    13 18:44:55,478 [rfile.RelativeKey] DEBUG: len = 0
>>    13 18:44:55,478 [rfile.RelativeKey] DEBUG: data =
>>    13 18:44:55,478 [rfile.RelativeKey] DEBUG: len = 0
>>    13 18:44:55,478 [rfile.RelativeKey] DEBUG: data =
>>    13 18:44:55,478 [rfile.RelativeKey] DEBUG: Read ts 1334170440149
>>    13 18:44:55,479 [data.Value] DEBUG: len = 2
>>    13 18:44:55,479 [data.Value] DEBUG: val = 49
>>    13 18:44:55,479 [tabletserver.TabletServer] DEBUG: ScanSess tid
>> 144.51.26.32:63594 2 1 entries in 0.04 secs, nbTimes = [40 40 40.00 1]
>>
>> So maybe you can run this and we can see it what it looks like for the
>> failed scan.
>>
>> Keith
>
>

Mime
View raw message