accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-1831) Write ahead logs from upgrade prematurely GCed
Date Mon, 18 Nov 2013 22:19:21 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825847#comment-13825847
] 

Keith Turner commented on ACCUMULO-1831:
----------------------------------------

There is code in {{GarbageCollectWriteAheadLogs.removeFiles()}} for removing sorted wal.

> Write ahead logs from upgrade prematurely GCed
> ----------------------------------------------
>
>                 Key: ACCUMULO-1831
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1831
>             Project: Accumulo
>          Issue Type: Sub-task
>          Components: master, tserver
>            Reporter: Keith Turner
>            Assignee: Eric Newton
>            Priority: Blocker
>             Fix For: 1.6.0
>
>
> I was running {{test/system/upgrade_test.sh dirty}} and the test hung.  Upon inspection,
the wals from 1.5 were deleted before all tablets were recovered.   
> Some tablets from 1.5 recovered fine.
> {noformat}
> 2013-10-29 20:29:26,475 [log.SortedLogRecovery] INFO : Recovery complete for !!R<<
using hdfs://nnhost:6093/rktl/accumulo-upt/recovery/754f171b-c260-42dd-b17e-bd15064608c7
> {noformat}
> Then the GC kicked in and deleted files before tablets were finished recovering.
> {noformat}
> 2013-10-29 20:29:30,421 [gc.GarbageCollectWriteAheadLogs] DEBUG: Removing WAL for offline
server hdfs://nnhost:6093/rktl/accumulo-upt/wal/127.0.0.1+9997/754f171b-c260-42dd-b17e-bd15064608c7
> 2013-10-29 20:29:30,428 [gc.GarbageCollectWriteAheadLogs] DEBUG: Removing sorted WAL
hdfs://nnhost:6093/rktl/accumulo-upt/recovery/754f171b-c260-42dd-b17e-bd15064608c7
> {noformat}
> Tablet failed to recover.
> {noformat}
> 2013-10-29 20:29:30,858 [tabletserver.TabletServer] WARN : exception trying to assign
tablet 1<;row_0000180000 /default_tablet
> java.lang.RuntimeException: java.io.IOException: Unable to find recovery files for extent
1<;row_0000180000 logEntry: 1<; 754f171b-c260-42dd-b17e-bd15064608c7 (19)
>         at org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1398)
>         at org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1233)
>         at org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1088)
>         at org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1076)
> {noformat}
> I had set my gc delay to 30 secs while testing another issue and thats why I ran into
this issue.   
> Looking at the code, I do not think its properly converting relative paths from 1.5 to
absolute paths.   I think the code should convert everything to relative paths (just UUIDs)
to avoid problems caused by differing configurations.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message