hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mickael Olivier (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-6515) testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)
Date Fri, 04 Jul 2014 12:07:34 GMT

     [ https://issues.apache.org/jira/browse/HDFS-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mickael Olivier updated HDFS-6515:
----------------------------------

                 Tags: testPageRounder, FsDatasetCache
     Target Version/s: 3.0.0
    Affects Version/s: 3.0.0
         Release Note: Tested with Hadoop 3.0.0 SNAPSHOT, on RHEL 6.5, on Ubuntu 14.0, on
Fedora 19, using mvn -Dtest=TestFsDatasetCache#testPageRounder -X test
               Status: Patch Available  (was: Open)

This patch does 2 things that should NOT modify the behavior before applying it when used
with systems with a PAGE_SIZE of 4096 :

1 - Change in TestFsDatasetCache.java
\- private static final long CACHE_CAPACITY = 64 * 1024;
+ private static final long CACHE_CAPACITY = 16 * PAGE_SIZE;

2 - Change in NativeIO.java, class NoMlockCacheManipulator

\- public long getOperatingSystemPageSize() { return 4096; }
+ public long getOperatingSystemPageSize() { return NativeIO.getOperatingSystemPageSize();
}

The first change is motivated by the fact that on systems with a page size of, e.g. 65536
bytes, we could only reserve one page in the cache for testing. 

The second is motivated by the fact that on systems with a page size of, e.g. 65536 bytes,
saying it is 4096 leaded method verifyExpectedCacheUsage to fail even when the suited number
of blocks was reserved (i.e. leading to a timeout)

> testPageRounder   (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-6515
>                 URL: https://issues.apache.org/jira/browse/HDFS-6515
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.4.0, 3.0.0
>         Environment: Linux on PPC64
>            Reporter: Tony Reix
>            Priority: Blocker
>              Labels: test
>
> I have an issue with test :
>    testPageRounder
>   (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)
> on Linux/PowerPC.
> On Linux/Intel, test runs fine.
> On Linux/PowerPC, I have:
> testPageRounder(org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)  Time elapsed:
64.037 sec  <<< ERROR!
> java.lang.Exception: test timed out after 60000 milliseconds
> Looking at details, I see that some "Failed to cache " messages appear in the traces.
Only 10 on Intel, but 186 on PPC64.
> On PPC64, it looks like some thread is waiting for something that never happens, generating
a TimeOut.
> I'm now using IBM JVM, however I've just checked that the issue also appears with OpenJDK.
> I'm now using Hadoop latest, however, the issue appeared within Hadoop 2.4.0 .
> I need help for understanding what the test is doing, what traces are expected, in order
to understand what/where is the root cause.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message