hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinayakumar B <vinayakum...@apache.org>
Subject Re: upstream jenkins build broken?
Date Tue, 17 Mar 2015 07:53:40 GMT
Seems like all builds of Precommit-HDFS-Build failing with below problem.

FATAL: Command "git clean -fdx" returned status code 1:
stdout:
stderr: hudson.plugins.git.GitException
<http://stacktrace.jenkins-ci.org/search?query=hudson.plugins.git.GitException>:
Command "git clean -fdx" returned status code 1:
stdout:
stderr:



Can someone remove "git clean -fdx" from build configurations of
Precommit-HDFS-Build ?


Regards,
Vinay

On Tue, Mar 17, 2015 at 12:59 PM, Vinayakumar B <vinayakumarb@apache.org>
wrote:

> I have simulated the problem in my env and verified that, both 'git clean
> -xdf' and 'mvn clean' will not remove the directory.
> mvn fails where as git simply ignores (not even display any warning) the
> problem.
>
>
>
> Regards,
> Vinay
>
> On Tue, Mar 17, 2015 at 2:32 AM, Sean Busbey <busbey@cloudera.com> wrote:
>
>> Can someone point me to an example build that is broken?
>>
>> On Mon, Mar 16, 2015 at 3:52 PM, Sean Busbey <busbey@cloudera.com> wrote:
>>
>> > I'm on it. HADOOP-11721
>> >
>> > On Mon, Mar 16, 2015 at 3:44 PM, Haohui Mai <wheat9@apache.org> wrote:
>> >
>> >> +1 for git clean.
>> >>
>> >> Colin, can you please get it in ASAP? Currently due to the jenkins
>> >> issues, we cannot close the 2.7 blockers.
>> >>
>> >> Thanks,
>> >> Haohui
>> >>
>> >>
>> >>
>> >> On Mon, Mar 16, 2015 at 11:54 AM, Colin P. McCabe <cmccabe@apache.org>
>> >> wrote:
>> >> > If all it takes is someone creating a test that makes a directory
>> >> > without -x, this is going to happen over and over.
>> >> >
>> >> > Let's just fix the problem at the root by running "git clean -fqdx"
>> in
>> >> > our jenkins scripts.  If there's no objections I will add this in and
>> >> > un-break the builds.
>> >> >
>> >> > best,
>> >> > Colin
>> >> >
>> >> > On Fri, Mar 13, 2015 at 1:48 PM, Lei Xu <lei@cloudera.com> wrote:
>> >> >> I filed HDFS-7917 to change the way to simulate disk failures.
>> >> >>
>> >> >> But I think we still need infrastructure folks to help with jenkins
>> >> >> scripts to clean the dirs left today.
>> >> >>
>> >> >> On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ricetons@gmail.com>
>> >> wrote:
>> >> >>> Any updates on this issues? It seems that all HDFS jenkins
builds
>> are
>> >> >>> still failing.
>> >> >>>
>> >> >>> Regards,
>> >> >>> Haohui
>> >> >>>
>> >> >>> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B <
>> >> vinayakumarb@apache.org> wrote:
>> >> >>>> I think the problem started from here.
>> >> >>>>
>> >> >>>>
>> >>
>> https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/
>> >> >>>>
>> >> >>>> As Chris mentioned TestDataNodeVolumeFailure is changing
the
>> >> permission.
>> >> >>>> But in this patch, ReplicationMonitor got NPE and it got
terminate
>> >> signal,
>> >> >>>> due to which MiniDFSCluster.shutdown() throwing Exception.
>> >> >>>>
>> >> >>>> But, TestDataNodeVolumeFailure#teardown() is restoring
those
>> >> permission
>> >> >>>> after shutting down cluster. So in this case IMO, permissions
were
>> >> never
>> >> >>>> restored.
>> >> >>>>
>> >> >>>>
>> >> >>>>   @After
>> >> >>>>   public void tearDown() throws Exception {
>> >> >>>>     if(data_fail != null) {
>> >> >>>>       FileUtil.setWritable(data_fail, true);
>> >> >>>>     }
>> >> >>>>     if(failedDir != null) {
>> >> >>>>       FileUtil.setWritable(failedDir, true);
>> >> >>>>     }
>> >> >>>>     if(cluster != null) {
>> >> >>>>       cluster.shutdown();
>> >> >>>>     }
>> >> >>>>     for (int i = 0; i < 3; i++) {
>> >> >>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)),
>> >> true);
>> >> >>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)),
>> >> true);
>> >> >>>>     }
>> >> >>>>   }
>> >> >>>>
>> >> >>>>
>> >> >>>> Regards,
>> >> >>>> Vinay
>> >> >>>>
>> >> >>>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <
>> >> vinayakumarb@apache.org>
>> >> >>>> wrote:
>> >> >>>>
>> >> >>>>> When I see the history of these kind of builds, All
these are
>> >> failed on
>> >> >>>>> node H9.
>> >> >>>>>
>> >> >>>>> I think some or the other uncommitted patch would have
created
>> the
>> >> problem
>> >> >>>>> and left it there.
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> Regards,
>> >> >>>>> Vinay
>> >> >>>>>
>> >> >>>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <
>> busbey@cloudera.com>
>> >> wrote:
>> >> >>>>>
>> >> >>>>>> You could rely on a destructive git clean call
instead of maven
>> to
>> >> do the
>> >> >>>>>> directory removal.
>> >> >>>>>>
>> >> >>>>>> --
>> >> >>>>>> Sean
>> >> >>>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cmccabe@alumni.cmu.edu
>> >
>> >> wrote:
>> >> >>>>>>
>> >> >>>>>> > Is there a maven plugin or setting we can
use to simply remove
>> >> >>>>>> > directories that have no executable permissions
on them?
>> >> Clearly we
>> >> >>>>>> > have the permission to do this from a technical
point of view
>> >> (since
>> >> >>>>>> > we created the directories as the jenkins
user), it's simply
>> >> that the
>> >> >>>>>> > code refuses to do it.
>> >> >>>>>> >
>> >> >>>>>> > Otherwise I guess we can just fix those tests...
>> >> >>>>>> >
>> >> >>>>>> > Colin
>> >> >>>>>> >
>> >> >>>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <lei@cloudera.com>
>> >> wrote:
>> >> >>>>>> > > Thanks a lot for looking into HDFS-7722,
Chris.
>> >> >>>>>> > >
>> >> >>>>>> > > In HDFS-7722:
>> >> >>>>>> > > TestDataNodeVolumeFailureXXX tests reset
data dir
>> permissions
>> >> in
>> >> >>>>>> > TearDown().
>> >> >>>>>> > > TestDataNodeHotSwapVolumes reset permissions
in a finally
>> >> clause.
>> >> >>>>>> > >
>> >> >>>>>> > > Also I ran mvn test several times on
my machine and all
>> tests
>> >> passed.
>> >> >>>>>> > >
>> >> >>>>>> > > However, since in DiskChecker#checkDirAccess():
>> >> >>>>>> > >
>> >> >>>>>> > > private static void checkDirAccess(File
dir) throws
>> >> >>>>>> DiskErrorException {
>> >> >>>>>> > >   if (!dir.isDirectory()) {
>> >> >>>>>> > >     throw new DiskErrorException("Not
a directory: "
>> >> >>>>>> > >                                  + dir.toString());
>> >> >>>>>> > >   }
>> >> >>>>>> > >
>> >> >>>>>> > >   checkAccessByFileMethods(dir);
>> >> >>>>>> > > }
>> >> >>>>>> > >
>> >> >>>>>> > > One potentially safer alternative is
replacing data dir
>> with a
>> >> regular
>> >> >>>>>> > > file to stimulate disk failures.
>> >> >>>>>> > >
>> >> >>>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris
Nauroth <
>> >> >>>>>> cnauroth@hortonworks.com>
>> >> >>>>>> > wrote:
>> >> >>>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> >> >>>>>> > >> TestDataNodeVolumeFailureReporting,
and
>> >> >>>>>> > >> TestDataNodeVolumeFailureToleration
all remove executable
>> >> permissions
>> >> >>>>>> > from
>> >> >>>>>> > >> directories like the one Colin mentioned
to simulate disk
>> >> failures at
>> >> >>>>>> > data
>> >> >>>>>> > >> nodes.  I reviewed the code for all
of those, and they all
>> >> appear to
>> >> >>>>>> be
>> >> >>>>>> > >> doing the necessary work to restore
executable permissions
>> at
>> >> the
>> >> >>>>>> end of
>> >> >>>>>> > >> the test.  The only recent uncommitted
patch I¹ve seen that
>> >> makes
>> >> >>>>>> > changes
>> >> >>>>>> > >> in these test suites is HDFS-7722.
 That patch still looks
>> >> fine
>> >> >>>>>> > though.  I
>> >> >>>>>> > >> don¹t know if there are other uncommitted
patches that
>> >> changed these
>> >> >>>>>> > test
>> >> >>>>>> > >> suites.
>> >> >>>>>> > >>
>> >> >>>>>> > >> I suppose it¹s also possible that
the JUnit process
>> >> unexpectedly died
>> >> >>>>>> > >> after removing executable permissions
but before restoring
>> >> them.
>> >> >>>>>> That
>> >> >>>>>> > >> always would have been a weakness
of these test suites,
>> >> regardless of
>> >> >>>>>> > any
>> >> >>>>>> > >> recent changes.
>> >> >>>>>> > >>
>> >> >>>>>> > >> Chris Nauroth
>> >> >>>>>> > >> Hortonworks
>> >> >>>>>> > >> http://hortonworks.com/
>> >> >>>>>> > >>
>> >> >>>>>> > >>
>> >> >>>>>> > >>
>> >> >>>>>> > >>
>> >> >>>>>> > >>
>> >> >>>>>> > >>
>> >> >>>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers"
<atm@cloudera.com>
>> >> wrote:
>> >> >>>>>> > >>
>> >> >>>>>> > >>>Hey Colin,
>> >> >>>>>> > >>>
>> >> >>>>>> > >>>I asked Andrew Bayer, who works
with Apache Infra, what's
>> >> going on
>> >> >>>>>> with
>> >> >>>>>> > >>>these boxes. He took a look and
concluded that some perms
>> are
>> >> being
>> >> >>>>>> set
>> >> >>>>>> > in
>> >> >>>>>> > >>>those directories by our unit
tests which are precluding
>> >> those files
>> >> >>>>>> > from
>> >> >>>>>> > >>>getting deleted. He's going to
clean up the boxes for us,
>> but
>> >> we
>> >> >>>>>> should
>> >> >>>>>> > >>>expect this to keep happening
until we can fix the test in
>> >> question
>> >> >>>>>> to
>> >> >>>>>> > >>>properly clean up after itself.
>> >> >>>>>> > >>>
>> >> >>>>>> > >>>To help narrow down which commit
it was that started this,
>> >> Andrew
>> >> >>>>>> sent
>> >> >>>>>> > me
>> >> >>>>>> > >>>this info:
>> >> >>>>>> > >>>
>> >> >>>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> >> >>>>>> >
>> >> >>>>>>
>> >>
>> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
>> >> >>>>>> > has
>> >> >>>>>> > >>>500 perms, so I'm guessing that's
the problem. Been that
>> way
>> >> since
>> >> >>>>>> 9:32
>> >> >>>>>> > >>>UTC
>> >> >>>>>> > >>>on March 5th."
>> >> >>>>>> > >>>
>> >> >>>>>> > >>>--
>> >> >>>>>> > >>>Aaron T. Myers
>> >> >>>>>> > >>>Software Engineer, Cloudera
>> >> >>>>>> > >>>
>> >> >>>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM,
Colin P. McCabe <
>> >> cmccabe@apache.org
>> >> >>>>>> >
>> >> >>>>>> > >>>wrote:
>> >> >>>>>> > >>>
>> >> >>>>>> > >>>> Hi all,
>> >> >>>>>> > >>>>
>> >> >>>>>> > >>>> A very quick (and not thorough)
survey shows that I can't
>> >> find any
>> >> >>>>>> > >>>> jenkins jobs that succeeded
from the last 24 hours.  Most
>> >> of them
>> >> >>>>>> seem
>> >> >>>>>> > >>>> to be failing with some variant
of this message:
>> >> >>>>>> > >>>>
>> >> >>>>>> > >>>> [ERROR] Failed to execute
goal
>> >> >>>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>> >> >>>>>> (default-clean)
>> >> >>>>>> > >>>> on project hadoop-hdfs: Failed
to clean project: Failed
>> to
>> >> delete
>> >> >>>>>> > >>>>
>> >> >>>>>> > >>>>
>> >> >>>>>> >
>> >> >>>>>> >
>> >> >>>>>>
>> >>
>> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
>> >> >>>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> >> >>>>>> > >>>> -> [Help 1]
>> >> >>>>>> > >>>>
>> >> >>>>>> > >>>> Any ideas how this happened?
 Bad disk, unit test setting
>> >> wrong
>> >> >>>>>> > >>>> permissions?
>> >> >>>>>> > >>>>
>> >> >>>>>> > >>>> Colin
>> >> >>>>>> > >>>>
>> >> >>>>>> > >>
>> >> >>>>>> > >
>> >> >>>>>> > >
>> >> >>>>>> > >
>> >> >>>>>> > > --
>> >> >>>>>> > > Lei (Eddy) Xu
>> >> >>>>>> > > Software Engineer, Cloudera
>> >> >>>>>> >
>> >> >>>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Lei (Eddy) Xu
>> >> >> Software Engineer, Cloudera
>> >>
>> >
>> >
>> >
>> > --
>> > Sean
>> >
>>
>>
>>
>> --
>> Sean
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message