hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mai Haohui <ricet...@gmail.com>
Subject Re: upstream jenkins build broken?
Date Fri, 13 Mar 2015 20:38:53 GMT
Any updates on this issues? It seems that all HDFS jenkins builds are
still failing.

Regards,
Haohui

On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B <vinayakumarb@apache.org> wrote:
> I think the problem started from here.
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/
>
> As Chris mentioned TestDataNodeVolumeFailure is changing the permission.
> But in this patch, ReplicationMonitor got NPE and it got terminate signal,
> due to which MiniDFSCluster.shutdown() throwing Exception.
>
> But, TestDataNodeVolumeFailure#teardown() is restoring those permission
> after shutting down cluster. So in this case IMO, permissions were never
> restored.
>
>
>   @After
>   public void tearDown() throws Exception {
>     if(data_fail != null) {
>       FileUtil.setWritable(data_fail, true);
>     }
>     if(failedDir != null) {
>       FileUtil.setWritable(failedDir, true);
>     }
>     if(cluster != null) {
>       cluster.shutdown();
>     }
>     for (int i = 0; i < 3; i++) {
>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)), true);
>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)), true);
>     }
>   }
>
>
> Regards,
> Vinay
>
> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <vinayakumarb@apache.org>
> wrote:
>
>> When I see the history of these kind of builds, All these are failed on
>> node H9.
>>
>> I think some or the other uncommitted patch would have created the problem
>> and left it there.
>>
>>
>> Regards,
>> Vinay
>>
>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <busbey@cloudera.com> wrote:
>>
>>> You could rely on a destructive git clean call instead of maven to do the
>>> directory removal.
>>>
>>> --
>>> Sean
>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cmccabe@alumni.cmu.edu> wrote:
>>>
>>> > Is there a maven plugin or setting we can use to simply remove
>>> > directories that have no executable permissions on them?  Clearly we
>>> > have the permission to do this from a technical point of view (since
>>> > we created the directories as the jenkins user), it's simply that the
>>> > code refuses to do it.
>>> >
>>> > Otherwise I guess we can just fix those tests...
>>> >
>>> > Colin
>>> >
>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <lei@cloudera.com> wrote:
>>> > > Thanks a lot for looking into HDFS-7722, Chris.
>>> > >
>>> > > In HDFS-7722:
>>> > > TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>> > TearDown().
>>> > > TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>> > >
>>> > > Also I ran mvn test several times on my machine and all tests passed.
>>> > >
>>> > > However, since in DiskChecker#checkDirAccess():
>>> > >
>>> > > private static void checkDirAccess(File dir) throws
>>> DiskErrorException {
>>> > >   if (!dir.isDirectory()) {
>>> > >     throw new DiskErrorException("Not a directory: "
>>> > >                                  + dir.toString());
>>> > >   }
>>> > >
>>> > >   checkAccessByFileMethods(dir);
>>> > > }
>>> > >
>>> > > One potentially safer alternative is replacing data dir with a regular
>>> > > file to stimulate disk failures.
>>> > >
>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
>>> cnauroth@hortonworks.com>
>>> > wrote:
>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>> > >> TestDataNodeVolumeFailureReporting, and
>>> > >> TestDataNodeVolumeFailureToleration all remove executable permissions
>>> > from
>>> > >> directories like the one Colin mentioned to simulate disk failures
at
>>> > data
>>> > >> nodes.  I reviewed the code for all of those, and they all appear
to
>>> be
>>> > >> doing the necessary work to restore executable permissions at the
>>> end of
>>> > >> the test.  The only recent uncommitted patch I¹ve seen that makes
>>> > changes
>>> > >> in these test suites is HDFS-7722.  That patch still looks fine
>>> > though.  I
>>> > >> don¹t know if there are other uncommitted patches that changed
these
>>> > test
>>> > >> suites.
>>> > >>
>>> > >> I suppose it¹s also possible that the JUnit process unexpectedly
died
>>> > >> after removing executable permissions but before restoring them.
>>> That
>>> > >> always would have been a weakness of these test suites, regardless
of
>>> > any
>>> > >> recent changes.
>>> > >>
>>> > >> Chris Nauroth
>>> > >> Hortonworks
>>> > >> http://hortonworks.com/
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <atm@cloudera.com>
wrote:
>>> > >>
>>> > >>>Hey Colin,
>>> > >>>
>>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's going
on
>>> with
>>> > >>>these boxes. He took a look and concluded that some perms are
being
>>> set
>>> > in
>>> > >>>those directories by our unit tests which are precluding those
files
>>> > from
>>> > >>>getting deleted. He's going to clean up the boxes for us, but
we
>>> should
>>> > >>>expect this to keep happening until we can fix the test in question
>>> to
>>> > >>>properly clean up after itself.
>>> > >>>
>>> > >>>To help narrow down which commit it was that started this, Andrew
>>> sent
>>> > me
>>> > >>>this info:
>>> > >>>
>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>> >
>>> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
>>> > has
>>> > >>>500 perms, so I'm guessing that's the problem. Been that way
since
>>> 9:32
>>> > >>>UTC
>>> > >>>on March 5th."
>>> > >>>
>>> > >>>--
>>> > >>>Aaron T. Myers
>>> > >>>Software Engineer, Cloudera
>>> > >>>
>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cmccabe@apache.org
>>> >
>>> > >>>wrote:
>>> > >>>
>>> > >>>> Hi all,
>>> > >>>>
>>> > >>>> A very quick (and not thorough) survey shows that I can't
find any
>>> > >>>> jenkins jobs that succeeded from the last 24 hours.  Most
of them
>>> seem
>>> > >>>> to be failing with some variant of this message:
>>> > >>>>
>>> > >>>> [ERROR] Failed to execute goal
>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>> (default-clean)
>>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed
to delete
>>> > >>>>
>>> > >>>>
>>> >
>>> >
>>> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>> > >>>> -> [Help 1]
>>> > >>>>
>>> > >>>> Any ideas how this happened?  Bad disk, unit test setting
wrong
>>> > >>>> permissions?
>>> > >>>>
>>> > >>>> Colin
>>> > >>>>
>>> > >>
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Lei (Eddy) Xu
>>> > > Software Engineer, Cloudera
>>> >
>>>
>>
>>

Mime
View raw message