hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Haohui Mai <whe...@apache.org>
Subject Re: upstream jenkins build broken?
Date Mon, 16 Mar 2015 20:44:35 GMT
+1 for git clean.

Colin, can you please get it in ASAP? Currently due to the jenkins
issues, we cannot close the 2.7 blockers.

Thanks,
Haohui



On Mon, Mar 16, 2015 at 11:54 AM, Colin P. McCabe <cmccabe@apache.org> wrote:
> If all it takes is someone creating a test that makes a directory
> without -x, this is going to happen over and over.
>
> Let's just fix the problem at the root by running "git clean -fqdx" in
> our jenkins scripts.  If there's no objections I will add this in and
> un-break the builds.
>
> best,
> Colin
>
> On Fri, Mar 13, 2015 at 1:48 PM, Lei Xu <lei@cloudera.com> wrote:
>> I filed HDFS-7917 to change the way to simulate disk failures.
>>
>> But I think we still need infrastructure folks to help with jenkins
>> scripts to clean the dirs left today.
>>
>> On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ricetons@gmail.com> wrote:
>>> Any updates on this issues? It seems that all HDFS jenkins builds are
>>> still failing.
>>>
>>> Regards,
>>> Haohui
>>>
>>> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B <vinayakumarb@apache.org>
wrote:
>>>> I think the problem started from here.
>>>>
>>>> https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/
>>>>
>>>> As Chris mentioned TestDataNodeVolumeFailure is changing the permission.
>>>> But in this patch, ReplicationMonitor got NPE and it got terminate signal,
>>>> due to which MiniDFSCluster.shutdown() throwing Exception.
>>>>
>>>> But, TestDataNodeVolumeFailure#teardown() is restoring those permission
>>>> after shutting down cluster. So in this case IMO, permissions were never
>>>> restored.
>>>>
>>>>
>>>>   @After
>>>>   public void tearDown() throws Exception {
>>>>     if(data_fail != null) {
>>>>       FileUtil.setWritable(data_fail, true);
>>>>     }
>>>>     if(failedDir != null) {
>>>>       FileUtil.setWritable(failedDir, true);
>>>>     }
>>>>     if(cluster != null) {
>>>>       cluster.shutdown();
>>>>     }
>>>>     for (int i = 0; i < 3; i++) {
>>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)), true);
>>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)), true);
>>>>     }
>>>>   }
>>>>
>>>>
>>>> Regards,
>>>> Vinay
>>>>
>>>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <vinayakumarb@apache.org>
>>>> wrote:
>>>>
>>>>> When I see the history of these kind of builds, All these are failed
on
>>>>> node H9.
>>>>>
>>>>> I think some or the other uncommitted patch would have created the problem
>>>>> and left it there.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Vinay
>>>>>
>>>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <busbey@cloudera.com>
wrote:
>>>>>
>>>>>> You could rely on a destructive git clean call instead of maven to
do the
>>>>>> directory removal.
>>>>>>
>>>>>> --
>>>>>> Sean
>>>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cmccabe@alumni.cmu.edu>
wrote:
>>>>>>
>>>>>> > Is there a maven plugin or setting we can use to simply remove
>>>>>> > directories that have no executable permissions on them?  Clearly
we
>>>>>> > have the permission to do this from a technical point of view
(since
>>>>>> > we created the directories as the jenkins user), it's simply
that the
>>>>>> > code refuses to do it.
>>>>>> >
>>>>>> > Otherwise I guess we can just fix those tests...
>>>>>> >
>>>>>> > Colin
>>>>>> >
>>>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <lei@cloudera.com>
wrote:
>>>>>> > > Thanks a lot for looking into HDFS-7722, Chris.
>>>>>> > >
>>>>>> > > In HDFS-7722:
>>>>>> > > TestDataNodeVolumeFailureXXX tests reset data dir permissions
in
>>>>>> > TearDown().
>>>>>> > > TestDataNodeHotSwapVolumes reset permissions in a finally
clause.
>>>>>> > >
>>>>>> > > Also I ran mvn test several times on my machine and all
tests passed.
>>>>>> > >
>>>>>> > > However, since in DiskChecker#checkDirAccess():
>>>>>> > >
>>>>>> > > private static void checkDirAccess(File dir) throws
>>>>>> DiskErrorException {
>>>>>> > >   if (!dir.isDirectory()) {
>>>>>> > >     throw new DiskErrorException("Not a directory: "
>>>>>> > >                                  + dir.toString());
>>>>>> > >   }
>>>>>> > >
>>>>>> > >   checkAccessByFileMethods(dir);
>>>>>> > > }
>>>>>> > >
>>>>>> > > One potentially safer alternative is replacing data dir
with a regular
>>>>>> > > file to stimulate disk failures.
>>>>>> > >
>>>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
>>>>>> cnauroth@hortonworks.com>
>>>>>> > wrote:
>>>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>>> > >> TestDataNodeVolumeFailureReporting, and
>>>>>> > >> TestDataNodeVolumeFailureToleration all remove executable
permissions
>>>>>> > from
>>>>>> > >> directories like the one Colin mentioned to simulate
disk failures at
>>>>>> > data
>>>>>> > >> nodes.  I reviewed the code for all of those, and they
all appear to
>>>>>> be
>>>>>> > >> doing the necessary work to restore executable permissions
at the
>>>>>> end of
>>>>>> > >> the test.  The only recent uncommitted patch I¹ve
seen that makes
>>>>>> > changes
>>>>>> > >> in these test suites is HDFS-7722.  That patch still
looks fine
>>>>>> > though.  I
>>>>>> > >> don¹t know if there are other uncommitted patches
that changed these
>>>>>> > test
>>>>>> > >> suites.
>>>>>> > >>
>>>>>> > >> I suppose it¹s also possible that the JUnit process
unexpectedly died
>>>>>> > >> after removing executable permissions but before restoring
them.
>>>>>> That
>>>>>> > >> always would have been a weakness of these test suites,
regardless of
>>>>>> > any
>>>>>> > >> recent changes.
>>>>>> > >>
>>>>>> > >> Chris Nauroth
>>>>>> > >> Hortonworks
>>>>>> > >> http://hortonworks.com/
>>>>>> > >>
>>>>>> > >>
>>>>>> > >>
>>>>>> > >>
>>>>>> > >>
>>>>>> > >>
>>>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <atm@cloudera.com>
wrote:
>>>>>> > >>
>>>>>> > >>>Hey Colin,
>>>>>> > >>>
>>>>>> > >>>I asked Andrew Bayer, who works with Apache Infra,
what's going on
>>>>>> with
>>>>>> > >>>these boxes. He took a look and concluded that some
perms are being
>>>>>> set
>>>>>> > in
>>>>>> > >>>those directories by our unit tests which are precluding
those files
>>>>>> > from
>>>>>> > >>>getting deleted. He's going to clean up the boxes
for us, but we
>>>>>> should
>>>>>> > >>>expect this to keep happening until we can fix the
test in question
>>>>>> to
>>>>>> > >>>properly clean up after itself.
>>>>>> > >>>
>>>>>> > >>>To help narrow down which commit it was that started
this, Andrew
>>>>>> sent
>>>>>> > me
>>>>>> > >>>this info:
>>>>>> > >>>
>>>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>>> >
>>>>>> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
>>>>>> > has
>>>>>> > >>>500 perms, so I'm guessing that's the problem. Been
that way since
>>>>>> 9:32
>>>>>> > >>>UTC
>>>>>> > >>>on March 5th."
>>>>>> > >>>
>>>>>> > >>>--
>>>>>> > >>>Aaron T. Myers
>>>>>> > >>>Software Engineer, Cloudera
>>>>>> > >>>
>>>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
<cmccabe@apache.org
>>>>>> >
>>>>>> > >>>wrote:
>>>>>> > >>>
>>>>>> > >>>> Hi all,
>>>>>> > >>>>
>>>>>> > >>>> A very quick (and not thorough) survey shows
that I can't find any
>>>>>> > >>>> jenkins jobs that succeeded from the last 24
hours.  Most of them
>>>>>> seem
>>>>>> > >>>> to be failing with some variant of this message:
>>>>>> > >>>>
>>>>>> > >>>> [ERROR] Failed to execute goal
>>>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>>> (default-clean)
>>>>>> > >>>> on project hadoop-hdfs: Failed to clean project:
Failed to delete
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> >
>>>>>> >
>>>>>> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
>>>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>> > >>>> -> [Help 1]
>>>>>> > >>>>
>>>>>> > >>>> Any ideas how this happened?  Bad disk, unit
test setting wrong
>>>>>> > >>>> permissions?
>>>>>> > >>>>
>>>>>> > >>>> Colin
>>>>>> > >>>>
>>>>>> > >>
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > > --
>>>>>> > > Lei (Eddy) Xu
>>>>>> > > Software Engineer, Cloudera
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>
>>
>>
>> --
>> Lei (Eddy) Xu
>> Software Engineer, Cloudera

Mime
View raw message