accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-1940) Data file in !METADATA differs from in memory data
Date Fri, 29 Nov 2013 18:38:35 GMT


Eric Newton commented on ACCUMULO-1940:

The memory manager can start a minor compaction after it learns of the tablet's existence.
 It learns this when the tablet server reports how much memory it is using.  Unfortunately,
it does this right after a recovery in the Tablet's constructor.  In this case, the memory
manager started a minor compaction before the tablet was online.  This caused the AssignmentManager's
minor compaction to fail.  The AssignmentManager reloaded the tablet at a later time, and
no data was lost, but the updates to the METADATA table made by the MemoryManager's MinC were
not seen until the consistency check.

> Data file in !METADATA differs from in memory data
> --------------------------------------------------
>                 Key: ACCUMULO-1940
>                 URL:
>             Project: Accumulo
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.5.0
>            Reporter: Josh Elser
>             Fix For: 1.4.5, 1.5.1, 1.6.0
> Found during CI run with agitation.
> Got the first two error messages 5 times (assuming in a retry on failure block):
> {noformat}
> Failed to do close consistency check for tablet c;79d0ab;7870a
> 	java.lang.RuntimeException: Data file in !METADATA differ from in memory data c;79d0ab;7870a
 {/t-0005h1j/A0005n8k.rf=797350457 19198312, /t-0005h1j/C0005skm.rf=798078368 19322025, /t-0005h1j/C0005tet.rf=89783168
2196349, /t-0005h1j/C0005u20.rf=90979448 2227972, /t-0005h1j/F0005u0v.rf=23410023 582233,
/t-0005h1j/F0005u2p.rf=21958551 547159, /t-0005h1j/F0005u3g.rf=14395121 358893}  {/t-0005h1j/A0005n8k.rf=797350457
19198312, /t-0005h1j/C0005skm.rf=798078368 19322025, /t-0005h1j/C0005tet.rf=89783168 2196349,
/t-0005h1j/C0005u20.rf=90979448 2227972, /t-0005h1j/F0005u2p.rf=21958551 547159, /t-0005h1j/F0005u3g.rf=14395121
> 		at org.apache.accumulo.server.tabletserver.Tablet.closeConsistencyCheck(
> 		at org.apache.accumulo.server.tabletserver.Tablet.completeClose(
> 		at org.apache.accumulo.server.tabletserver.Tablet.close(
> 		at org.apache.accumulo.server.tabletserver.TabletServer$
> 		at
> 		at
> 		at java.util.concurrent.ThreadPoolExecutor.runWorker(
> 		at java.util.concurrent.ThreadPoolExecutor$
> 		at
> 		at
> 		at
> {noformat}
> Then, we logged that we failed the consistency check
> {noformat}
> Consistency check fails, retrying java.lang.RuntimeException: Failed to do close consistency
check for tablet c;79d0ab;7870a
> {noformat}
> In the end, we gave up and closed it anyways.
> {noformat}
> Tablet closed consistency check has failed for c;79d0ab;7870a giving up and closing
> {noformat}
> Before all of this happened, we tried to bring this tablet online after a failure on
a new tserver. During the minc as part of the recovery process, we failed to get the lease
on the .rf_tmp file we tried to create. We failed this a couple of times, but eventually got
the tmp file we needed and the recovery process completed and we could bring the tablet online.
The difference between the in-memory version and the !METADATA version was this one flushed
rfile that we created during this recovery process.
> The problem eventually fixed itself because the tablet was migrated to a different server
and we just took what was (correctly) in the !METADATA table.
> There still is an unknown issue of how we missed the flush RFile in the DatafileManager's

This message was sent by Atlassian JIRA

View raw message