ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Veselovsky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-2938) IgfsBackupsDualAsyncSelfTest.testAppendParentMissing and IgfsBackupsDualAsyncSelfTest.testAppendParentMissingPartially fail sometimes on master
Date Thu, 21 Apr 2016 19:35:26 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252520#comment-15252520
] 

Ivan Veselovsky commented on IGNITE-2938:
-----------------------------------------

The problem description: 
IgfsDataManager#dataBlock() method has the following structure (simplified code):
{code}
  Future<byte[]> f = dataCache.getAsync(blockKey);

  f = f.chain({
       byte[] result = f0.get();

       if (result == null) {
          result = secondaryReader.read(...);
       }

       putSafe(blockKey, result);

       return result;
  })

  return f;
{code} 

, where {code}putSafe{code} is a kind of async put to data cache executed in a special pool.

This code invoked from 2 places: 
1) IgfsMetaManager#appendDual() -- in this case it reads the last incomplete block of the
file to append data to it.
2) IgfsInputStreamImpl -- in this case we just read all the blocks of the file.

Original problem observed in test is that the datacache.put() operation invoked asynchronously
from putSafe() can overwrite the new block written during the append action.

The following statement seem to be true:
Only the last incomplete block of the file can cause such conflict. This is true because all
the other blocks can be written only once from IGFS side, because the IGFS API only allows
to append file, no random access write is possible. Also, IGFS secondary fs model does not
assume that a file can be modified "externally" on the secondary file system. If that happens,
the file should be completely re-read, and this new file will have new Ids in all caches.
So, all the blocks except the last, are constant. But we should bear in mind that they can
disappear from data cache as a result of eviction, and in such case they will be re-read from
secondary file system again. 

So, the following change is proposed to fix the problem:
0) Introduce a new filed of IgfsEntryInfo#version (exact name TBD), that would uniquely identify
the file data state.
1) in case of append operation we may put the last incomplete block to the data cache *only*
before we acquired write lock (meta#invokeLock()) on the file. 
Namely , the pseudo-code of putSafe task should look like this:
  {code}  
  try (Transaction tx = metaCache.startTx()) {
    IgfsFileInfo info = metaCahce.get(fileId); 
    long currentVersion = info.version();
    IgniteUuid lockId = info.lockId();
    if (currentVersion == initialVersion && lockId == null) {
       dataCache.putIfAbsent(blockKey, block);   // "put if absent" because there is no use
case when we should overwrite anything there.
    } 
  }
{code}
currentVersion == initialVersion condition is because if a file was appended several times,
the block we have is obsolete -- we should drop it.
lockId == null conditiuon is because after we acquired the lock, the new block may be written
at any time. We do "putIfAbsent", so it seems, we may put even after the lock is acquired.
But this is not the case because after append operation has put a new block, this block may
be evicted immediately, and the block is absent (null) again. In this case we'll get again
a wrong situation , when an old block will be written over the new one.

In IgfsInputStreamInpl read implementation we should apply exactly the same async put logic
for the last incomplete block. 
(The last block index can always be easily calculated provided we have IgfsEntryInfo: this
is fileLength / blockSize if fileLength % blockSize != 0.)

When we read other blocks (except the last incomplete one) in IgfsInputStreamInpl, we should
*only* check that the file still exists. Otherwise we'll just cache a garbage.
We don't need any version or lockId checks because these blocks cannot be modified (if they
are modified externally , then we should re-create the IGFS file completely).
So, this code should look like 
{code}
  try (Transaction tx = metaCache.startTx()) {
    IgfsFileInfo info = metaCahce.get(fileId); 
    if (info != null) {
      IgniteUuid lockId = info.lockId();
      // Note that we don't check if lockId is null or not, because these blocks must be constant:
      if (lockId != DELETE_LOCK_ID) {
         dataCache.putIfAbsent(blockKey, block); //We cannot assert result there because concurrent
InputStream might have performed the same put already.
      } 
   }
  }
{code}

This logic

> IgfsBackupsDualAsyncSelfTest.testAppendParentMissing and IgfsBackupsDualAsyncSelfTest.testAppendParentMissingPartially
fail sometimes on master
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-2938
>                 URL: https://issues.apache.org/jira/browse/IGNITE-2938
>             Project: Ignite
>          Issue Type: Test
>          Components: IGFS
>    Affects Versions: 1.5.0.final
>            Reporter: Ivan Veselovsky
>            Assignee: Ivan Veselovsky
>             Fix For: 1.6
>
>
> Tests 
>   IgfsBackupsDualAsyncSelfTest.testAppendParentMissing	
>   IgfsBackupsDualAsyncSelfTest.testAppendParentMissingPartially	
> fail from time to time on master -- need to investigate. 
> It looks like that started to happen after fix  https://issues.apache.org/jira/browse/IGNITE-1631
.
> The failure happens with probability ~1/50 :
> {code}
> [18:14:50,772][INFO ][main][root] >>> Starting test: IgfsBackupsDualAsyncSelfTest#testAppendParentMissingPartially
<<<
> [18:14:50,792][ERROR][main][root] Test failed.
> java.io.IOException: Inconsistent file's data block (incorrectly written?) [path=/dir/subdir/file,
blockIdx=0, blockSize=128, expectedBlockSize=256, fileBlockSize=524288, fileLen=256]
> 	at org.apache.ignite.internal.processors.igfs.IgfsInputStreamImpl.block(IgfsInputStreamImpl.java:485)
> 	at org.apache.ignite.internal.processors.igfs.IgfsInputStreamImpl.blockFragmentizerSafe(IgfsInputStreamImpl.java:399)
> 	at org.apache.ignite.internal.processors.igfs.IgfsInputStreamImpl.readFromStore(IgfsInputStreamImpl.java:373)
> 	at org.apache.ignite.internal.processors.igfs.IgfsInputStreamImpl.readFully(IgfsInputStreamImpl.java:222)
> 	at org.apache.ignite.internal.processors.igfs.IgfsInputStreamImpl.readFully(IgfsInputStreamImpl.java:216)
> 	at org.apache.ignite.internal.processors.igfs.IgfsAbstractSelfTest.checkFileContent(IgfsAbstractSelfTest.java:2999)
> 	at org.apache.ignite.internal.processors.igfs.IgfsAbstractSelfTest.checkFile(IgfsAbstractSelfTest.java:2969)
> 	at org.apache.ignite.internal.processors.igfs.IgfsDualAbstractSelfTest.testAppendParentMissingPartially(IgfsDualAbstractSelfTest.java:1364)
> 	at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at junit.framework.TestCase.runTest(TestCase.java:176)
> 	at org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:1759)
> 	at org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:118)
> 	at org.apache.ignite.testframework.junits.GridAbstractTest$4.run(GridAbstractTest.java:1697)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message