hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-9263) tests are using /test/build/data; breaking Jenkins
Date Thu, 22 Oct 2015 21:28:27 GMT

     [ https://issues.apache.org/jira/browse/HDFS-9263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Chris Nauroth updated HDFS-9263:
    Attachment: HDFS-9263-002.patch

[~stevel@apache.org], regarding the side discussion on HADOOP-11880, I have traced the problem
to {{TestMiniDFSCluster}}, and the problem only occurs while running with the HDFS-9263 patch
applied.  I hope you don't mind, but I'm attaching a v002 patch with a small modification
to fix it.

My only change is in {{GenericTestUtils}}.  Prepare to smack forehead.  Here is the patch
v001 code:

  public static final String DEFAULT_TEST_DATA_DIR =
      "target " + File.pathSeparator + "test" + File.pathSeparator + "data";

Here is my change in v002:

  public static final String DEFAULT_TEST_DATA_DIR =
      "target" + File.separator + "test" + File.separator + "data";

I removed the extra space character at the end of the "target" string literal, and I switched
from {{File.pathSeparator}} (i.e. classpath separator, ':' on *nixes) to {{File.separator}}
(i.e.file system path separator, '/' on *nixes).  I constantly mix up those 2 myself.  I wish
they had clearer names.

As to why {{TestMiniDFSCluster}} exposed this, one of the tests in that suite specifically
removes the {{test.build.data}} property to check if the mini-cluster can still start using
defaults.  After running that test suite, I could see it created the funny paths containing
spaces and colons.  For code that passes the path through a {{URI}}, it would end up encoding
the space to %20 too.

bq. If we not only consolidate test dir setup, but do it in a way that isolates it for each
test suite, we get that isolation.

I'm on board with the consolidation aspect, but it's still unclear to me that there is a benefit
of adding another random string into the path.  I suppose if the sub-directory was named to
match the test suite, then that has some benefit for post-mortem analysis after a test failure.
 You could go back and inspect metadata and blocks, and you'd know that you were looking at
files specific to that test suite.

OTOH, this has the side effect of using many more directories, and they won't get cleaned
up in between runs of different suites.  Typically, the data gets wiped between suite runs,
either explicitly via {{FileUtil#fullyDelete}}, or implicitly via things like a NameNode format.
 I tried a full test run of hadoop-hdfs, and then I saw this:

> du -hs ~/git/hadoop-common/hadoop-hdfs-project/hadoop-hdfs/target/test/data
6.1G	/home/cnauroth/git/hadoop-common/hadoop-hdfs-project/hadoop-hdfs/target/test/data

That's more disk consumption than I'm used to seeing from a test run.  I'm pretty sure I'd
need to reallocate volumes on some of my wimpier VMs to accommodate this.

> tests are using /test/build/data; breaking Jenkins
> --------------------------------------------------
>                 Key: HDFS-9263
>                 URL: https://issues.apache.org/jira/browse/HDFS-9263
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 3.0.0
>         Environment: Jenkins
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Blocker
>         Attachments: HDFS-9263-001.patch, HDFS-9263-002.patch
> Some of the HDFS tests are using the path {{test/build/data}} to store files, so leaking
files which fail the new post-build RAT test checks on Jenkins (and dirtying all development
systems with paths which {{mvn clean}} will miss.
> fix

This message was sent by Atlassian JIRA

View raw message