accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3140) Compaction did not run durning RW test
Date Wed, 17 Sep 2014 23:39:33 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138209#comment-14138209
] 

Keith Turner commented on ACCUMULO-3140:
----------------------------------------

I think I figured this out.   I looked at the other tablets in the table and saw that all
of them compacted. While looking at the other other tablets, I noticed that the first compaction
for another tablet started before the minor compaction finished on the one tablet that did
not major compact.  Below are the logs, 

{noformat}
2014-09-16 18:35:04,049 [tabletserver.Compactor] DEBUG: Compaction 1w;000050;00004c 83,164
read | 81,599 written | 128,936 entries/sec |  0.645 secs
2014-09-16 18:35:04,053 [tabletserver.Tablet] DEBUG: Logs for memory compacted: 1w;000050;00004c
10.1.2.26+9997/1bf8ebed-e73e-460b-b54f-0b29b3d3c19c

2014-09-16 18:35:04,495 [tabletserver.Tablet] DEBUG: Starting MajC 1w;00000c;000008 (USER)
[/t-00001ls/F000036m.rf] --> /t-00001ls/A000047a.rf_tmp  [name:RegExFilter, priority:21,
class:org.apache.accumulo.core.iterators.user.RegExFilter, properties:{matchSubstring=false,
negate=true, colqRegex=^[0-9a-f][d].*, orFields=false}]

2014-09-16 18:35:04,501 [tabletserver.Tablet] TABLET_HIST: 1w;000050;00004c MinC [memory]
-> /t-00001mf/F0000476.rf
2014-09-16 18:35:04,501 [tabletserver.Tablet] DEBUG: MinC finish lock 0.00 secs 1w;000050;00004c
{noformat}

The major compaction for table 1w should not start until all 1w tablets have minor compacted/flushed.
 The code waits for flush/minor compaction and then does a major compaction.   The code that
waits for minor compaction/flush waits for all flush ids in metadata table to be greater than
an expected number.   I think this code is working correctly.  

A tablet does the folllowing after finishing a minor comaction/flush.

 # update metadata table
 # write finish compaction to walog
 # update in memory data structs

This allowed the master to see the flush/minc was finished before the tablet updated the in
memory data structs.   In the logs step 2 took almost .5 second.  Looking at the code and
the log messages, I think this allowed the following happened.

 # Master initiates flush/minc
 # Tserver X writes to metadata for tablet Y updating its files and compaction id 
 # Master sees flush/minc is finished and initates major compactions
 # Tserver X decides to do no major compaction for tablet Y because it has no files (according
to in memory information)
 # Tserver X finishes walog update for tablet Y that records flush/minc finished
 # Tserver X adds a file to tablet Y's in mem data structs



> Compaction did not run durning RW test
> --------------------------------------
>
>                 Key: ACCUMULO-3140
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3140
>             Project: Accumulo
>          Issue Type: Bug
>         Environment: 1.5.2 RC1, Hadoop 2.3.0, Zookeeper 3.4.5, CentOS 6, 20 node EC2
>            Reporter: Keith Turner
>             Fix For: 1.5.3, 1.6.1, 1.7.0
>
>
> Saw the following failure while running RW test against 1.5.2 RC1 
> {noformat}
> java.lang.Exception: Error running node Shard.xml
>         at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:285)
>         at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:63)
>         at org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:122)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.accumulo.start.Main$1.run(Main.java:107)
>         at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.Exception: Error running node Verify
>         at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:285)
>         at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:254)
>         ... 8 more
> Caused by: java.lang.Exception: index rebuild mismatch 000050 100z:bda1000000000000 []
1410899561685 false 000050 100z:9d20000000000000 [] 1410892435393 false ST_index_ip_10_1_2_29_ec2_internal_3328_1410892364707
ST_index_ip_10_1_2_29_ec2_internal_3328_1410892364707_tmp
>         at org.apache.accumulo.test.randomwalk.shard.VerifyIndex.visit(VerifyIndex.java:55)
>         at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:254)
>         ... 9 more
> {noformat}
> Determined that document ID {{9d20000000000000}} existed in the index, but not the document
table.  I found in the RW logs that a filtering compaction with the pattern {noformat}^[0-9a-f][d].*{noformat}
should have removed this document from the index.  However, the compaction did not run on
the relevant tablet {{1w;000050;00004c}}.   The test shortly after ran a filtering compaction
with the pattern {noformat}^[0-9a-f][1].*{noformat}, which did cause a corresponding compaction.
 Below are the tserver and RW logs interleaved by time.  Document {{9d20000000000000}} was
indexed in shard {{000050}}.
> {noformat}
> TSERVER 2014-09-16 18:32:50,125 [tabletserver.Tablet] TABLET_HIST: 1w<;00004c split
1w;000050;00004c 1w<;000050
> TSERVER 2014-09-16 18:32:50,126 [tabletserver.Tablet] TABLET_HIST: 1w;000050;00004c opened

> TSERVER 2014-09-16 18:32:57,288 [tabletserver.TabletServer] INFO : Adding 1 logs for
extent 1w;000050;00004c as alias 187
> RWLOG   16 18:33:55,294 [shard.Insert] DEBUG: Inserted document 9d20000000000000
> TSERVER 2014-09-16 18:35:02,985 [tabletserver.MinorCompactor] DEBUG: Begin minor compaction
/accumulo/tables/1w/t-00001mf/F0000476.rf_tmp 1w;000050;00004c
> TSERVER 2014-09-16 18:35:04,049 [tabletserver.Compactor] DEBUG: Compaction 1w;000050;00004c
83,164 read | 81,599 written | 128,936 entries/sec |  0.645 secs
> TSERVER 2014-09-16 18:35:04,053 [tabletserver.Tablet] DEBUG: Logs for memory compacted:
1w;000050;00004c 10.1.2.26+9997/1bf8ebed-e73e-460b-b54f-0b29b3d3c19c
> TSERVER 2014-09-16 18:35:04,501 [tabletserver.Tablet] TABLET_HIST: 1w;000050;00004c MinC
[memory] -> /t-00001mf/F0000476.rf
> TSERVER 2014-09-16 18:35:04,501 [tabletserver.Tablet] DEBUG: MinC finish lock 0.00 secs
1w;000050;00004c
> RWLOG   16 18:35:14,641 [shard.CompactFilter] DEBUG: Filtered documents using compaction
iterators ^[0-9a-f][d].* 32451 19802
> TSERVER 2014-09-16 18:35:41,433 [tabletserver.Tablet] DEBUG: Starting MajC 1w;000050;00004c
(USER) [/t-00001mf/F0000476.rf] --> /t-00001mf/A000048e.rf_tmp  [name:RegExFilter, priority:21,
class:org.apache.accumulo.core.iterators.user.RegExFilter, properties:{matchSubstring=false,
negate=true, colqRegex=^[0-9a-f][1].*, orFields=false}]
> TSERVER 2014-09-16 18:35:41,960 [tabletserver.Compactor] DEBUG: Compaction 1w;000050;00004c
81,599 read | 73,110 written | 187,583 entries/sec |  0.435 secs
> TSERVER 2014-09-16 18:35:42,079 [tabletserver.Tablet] TABLET_HIST: 1w;000050;00004c MajC
[/t-00001mf/F0000476.rf] --> /t-00001mf/A000048e.rf
> RWLOG   16 18:35:43,854 [shard.CompactFilter] DEBUG: Filtered documents using compaction
iterators ^[0-9a-f][1].* 18648 10103
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message