hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mickael Olivier (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6515) testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)
Date Wed, 02 Jul 2014 08:28:24 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049734#comment-14049734
] 

Mickael Olivier commented on HDFS-6515:
---------------------------------------

As said on https://issues.apache.org/jira/browse/HDFS-6608, the bug might be related to the
hard-coded limit maxBytes = 65536 bytes, assigned at the begining of the TestFsDatasetCache.java
file as follows : 

// Most Linux installs allow a default of 64KB locked memory
private static final long CACHE_CAPACITY = 64 * 1024;
conf.setLong(DFSConfigKeys.DFS_DATANODE_MAX_LOCKED_MEMORY_KEY,
        CACHE_CAPACITY);

Then on FsDatasetCache we have

this.maxBytes = dataset.datanode.getDnConf().getMaxLockedMemory();
This call gets back the value.

So I actually tried to come with something like 
private static final long CACHE_CAPACITY = 16 * 64 * 1024;

Forking I indeed retrieve in the logs maxBytes : 1048576\0A !
But the count value is now capped at 4096, which is weird. So I finally get

verifyExpectedCacheUsage: have 20480/327680 bytes cached; 5/5 blocks cached. memlock limit
= 1125899906842624.  Waiting...\0A

each time the supplier tries to check the cache is used as expected.
Though it seems all 5 blocks are cached, the osPageSize on the size of the cache is still
4096. Which is what should be changed !

public long round(long count) {
      long newCount = 
          (count + (osPageSize - 1)) / osPageSize;
      return newCount * osPageSize;
    }

private final long osPageSize =
        NativeIO.POSIX.getCacheManipulator().getOperatingSystemPageSize();

That should give 65536 again so when reserving 512 bytes, we should have newCount = 1, returning
65536 bytes to reserve.
Why is that not the case ? (First step is  @@reserve:: count : 4096 | next : 4096 | maxBytes
: 1048576\0A)


> testPageRounder   (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-6515
>                 URL: https://issues.apache.org/jira/browse/HDFS-6515
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.4.0
>         Environment: Linux on PPC64
>            Reporter: Tony Reix
>            Priority: Blocker
>              Labels: test
>
> I have an issue with test :
>    testPageRounder
>   (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)
> on Linux/PowerPC.
> On Linux/Intel, test runs fine.
> On Linux/PowerPC, I have:
> testPageRounder(org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)  Time elapsed:
64.037 sec  <<< ERROR!
> java.lang.Exception: test timed out after 60000 milliseconds
> Looking at details, I see that some "Failed to cache " messages appear in the traces.
Only 10 on Intel, but 186 on PPC64.
> On PPC64, it looks like some thread is waiting for something that never happens, generating
a TimeOut.
> I'm now using IBM JVM, however I've just checked that the issue also appears with OpenJDK.
> I'm now using Hadoop latest, however, the issue appeared within Hadoop 2.4.0 .
> I need help for understanding what the test is doing, what traces are expected, in order
to understand what/where is the root cause.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message