Return-Path: X-Original-To: apmail-mesos-user-archive@www.apache.org Delivered-To: apmail-mesos-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6686D1074D for ; Thu, 7 Nov 2013 20:31:58 +0000 (UTC) Received: (qmail 68750 invoked by uid 500); 7 Nov 2013 20:31:58 -0000 Delivered-To: apmail-mesos-user-archive@mesos.apache.org Received: (qmail 68732 invoked by uid 500); 7 Nov 2013 20:31:58 -0000 Mailing-List: contact user-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mesos.apache.org Delivered-To: mailing list user@mesos.apache.org Received: (qmail 68718 invoked by uid 99); 7 Nov 2013 20:31:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Nov 2013 20:31:58 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of wsorenson@hubspot.com designates 74.125.149.201 as permitted sender) Received: from [74.125.149.201] (HELO na3sys009aog109.obsmtp.com) (74.125.149.201) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 07 Nov 2013 20:31:51 +0000 Received: from mail-pd0-f179.google.com ([209.85.192.179]) (using TLSv1) by na3sys009aob109.postini.com ([74.125.148.12]) with SMTP ID DSNKUnv4oYUUtucoFxcjyEpsQbrtMUQ6I0ju@postini.com; Thu, 07 Nov 2013 12:31:30 PST Received: by mail-pd0-f179.google.com with SMTP id y10so1101107pdj.38 for ; Thu, 07 Nov 2013 12:31:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=hR7R94+jsYP4CWuZ01pxjN384D1FTa/wUMwRvVWmgvk=; b=gubvOj5eE2pKBVwyinXqP491QSQLWdBSa13t5cmSSwLw9OXUb9jamVC/R8A/xj6d3y XeNSTplSDINUfq46qnjH5UXA6bh1Rf7vpqvrEor1MwPM3r4dticW+zEWR4W5QwD9lOU6 nTCysRTr5d/JKSzGZJHZj00b3D7MK4jVBbnp0NXpqWK+rTk949bWs5+gaJKj9I3eMb0C /wfieZYGsYWk+W5vQN1bX/iPwh4Zp1u7tG5bRkAHtN2SjnPBAUEJ5s6vhoy9YafwdHvx MXj8N+XQOVtXlvvjb/xH/G3KucK9AH/mkxVfxQ4iXdiIZLLgrCJZEK+bz0P5/CHsFFfu CmHw== X-Gm-Message-State: ALoCoQmrgLP0Y9k29CdipBlGprLHi5qXlGVsiQDLSzHEEJnuRiBjH0zqWAKKdkWyxPGTTSWk3OaLk4R2j+zmv8Ho9y3wlmPXdZ9Vx4cJlX7hQxqqrPoHUHGcuGczg7EtRloMUOj34Lb84AUGvtn07uYlbRBxGSd33QD/rkY0gnd6uADoCwvKiic= X-Received: by 10.68.139.233 with SMTP id rb9mr10973127pbb.29.1383856289884; Thu, 07 Nov 2013 12:31:29 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.68.139.233 with SMTP id rb9mr10973118pbb.29.1383856289761; Thu, 07 Nov 2013 12:31:29 -0800 (PST) Received: by 10.68.209.35 with HTTP; Thu, 7 Nov 2013 12:31:29 -0800 (PST) In-Reply-To: References: Date: Thu, 7 Nov 2013 15:31:29 -0500 Message-ID: Subject: Re: Jenkins mesos plugin failing From: Whitney Sorenson To: user@mesos.apache.org Content-Type: multipart/alternative; boundary=001a11c3eb2c92630104ea9c269c X-Virus-Checked: Checked by ClamAV on apache.org --001a11c3eb2c92630104ea9c269c Content-Type: text/plain; charset=ISO-8859-1 I should also point out the scheduler didn't seem to survive a reboot of Jenkins - I had to delete the mesos cloud and reenter the parameters. On Thu, Nov 7, 2013 at 3:26 PM, Whitney Sorenson wrote: > Looks like we're using authentication on our slaves. So you either need to > pass > > -jnlpCredentials user:pass > > on the command line, or change around the permissions in Jenkins to allow > anonymous users to connect/run jobs. > > I'm not sure if it would make sense or not to add the user/pass in the > Jenkins plugin configuration screen or if it should be fetched another way. > > > > > On Thu, Nov 7, 2013 at 2:52 PM, Vinod Kone wrote: > >> Great. Let us know once you figure it out. Maybe I can add a FAQ to the >> plugin's README to help others (or you can contribute too :)). >> >> >> On Thu, Nov 7, 2013 at 11:40 AM, Whitney Sorenson wrote: >> >>> I added the jenkins user on the slave - this was the missing piece. I'll >>> add this to my PR for the readme. Got much further now; now I'm getting a >>> 403 on the fetch: >>> >>> /jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp: >>> 403 Forbidden at >>> hudson.remoting.Launcher.parseJnlpArguments(Launcher.java:261) at >>> hudson.remoting.Launcher.run(Launcher.java:215) >>> >>> and corresponding log on jenkins master: >>> >>> Nov 7, 2013 2:38:39 PM winstone.Logger logInternal INFO: While serving >>> http://localhost:8080/jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp: >>> hudson.security.AccessDeniedException2: anonymous is missing the >>> Slave/Connect permission >>> >>> Going to look into what this means. >>> >>> >>> >>> On Thu, Nov 7, 2013 at 2:21 PM, Vinod Kone wrote: >>> >>>> I looked at the code and it looks there are few places the executor >>>> might fail before it fetches the URI. Most of them have to do with >>>> incorrect permissions. The code was written to have any errors reported >>>> either in slave log or console or executor logs (there might be a bug here >>>> if we are in fact swallowing errors). IIUC, the executor log directory is >>>> empty in your case which suggests the executor died before it could even >>>> create "stdout" or "stderr" files in its sandbox (Is this true?). >>>> >>>> Couple of questions: >>>> >>>> What user is Jenkins master running as? Is that user known to the host >>>> on which mesos slave is running? >>>> >>>> How are you starting the mesos slave (e.g., cmd line flags)? >>>> >>>> >>>> >>>> On Thu, Nov 7, 2013 at 11:00 AM, Whitney Sorenson < >>>> wsorenson@hubspot.com> wrote: >>>> >>>>> The gist was compiled from that log. Here is the complete log from >>>>> toggling the jenkins plugin on / off (you see the ping statements >>>>> inbetween): >>>>> >>>>> https://gist.github.com/wsorenson/8bf64e44fd42da354fa0 >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Nov 7, 2013 at 1:57 PM, Vinod Kone wrote: >>>>> >>>>>> What does mesos-slave.err say? >>>>>> >>>>>> >>>>>> On Thu, Nov 7, 2013 at 10:49 AM, Whitney Sorenson < >>>>>> wsorenson@hubspot.com> wrote: >>>>>> >>>>>>> Hi Vinod, >>>>>>> >>>>>>> It's 0.14.0-rc4 in both. >>>>>>> >>>>>>> I believe we have logging working: >>>>>>> >>>>>>> -rw-r--r-- 1 root root 0 Oct 22 23:48 mesos-slave.out >>>>>>> lrwxrwxrwx 1 root root 63 Oct 22 23:48 mesos-slave.INFO -> >>>>>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797 >>>>>>> lrwxrwxrwx 1 root root 66 Oct 22 23:49 mesos-slave.WARNING -> >>>>>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797 >>>>>>> drwxr-xr-x 2 root root 4096 Oct 22 23:49 . >>>>>>> -rw-rw-r-- 1 root root 4827 Nov 1 20:34 >>>>>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797 >>>>>>> -rw-rw-r-- 1 root root 10408140 Nov 7 18:44 >>>>>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797 >>>>>>> -rw-r--r-- 1 root root 53759705 Nov 7 18:45 mesos-slave.err >>>>>>> >>>>>>> Is there something else to check? Is it possible the executor is >>>>>>> failing before it even attempts to fetch URIs? >>>>>>> >>>>>>> Ray - Thanks - yeah I found the jenkins logs. I was able to wget the >>>>>>> slave.jar, and even run it. The mesos-jenkins slaves are dead now, so I >>>>>>> can't connect to their slave-agent - but the jar does run. Not sure if the >>>>>>> window for trying to connect to one of the mesos launched slaves is long >>>>>>> enough to try before it is terminated due to failures. Interestingly, when >>>>>>> I try to connect to one of the existing slaves I get a 403. >>>>>>> >>>>>>> -Whitney >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Nov 7, 2013 at 1:34 PM, Vinod Kone wrote: >>>>>>> >>>>>>>> Hey Whitney, >>>>>>>> >>>>>>>> What version of mesos are you using (both in the cluster and the >>>>>>>> plugin)? >>>>>>>> >>>>>>>> The slave should print stuff to console when it is launching >>>>>>>> executor (e.g., "Fetching resources..."). I don't see that in the gist you >>>>>>>> pasted. Are you capturing stdout/stderr of the slave? >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Nov 7, 2013 at 10:30 AM, Whitney Sorenson < >>>>>>>> wsorenson@hubspot.com> wrote: >>>>>>>> >>>>>>>>> Thanks Ray. >>>>>>>>> >>>>>>>>> I have very similar issue (empty executor directories) - but don't >>>>>>>>> have any issues curling the slave.jar URI - and I don't have any existing >>>>>>>>> JNLP process running. I don't have a jenkins user - is that the only setup >>>>>>>>> you did on the slave? >>>>>>>>> >>>>>>>>> -Whitney >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Nov 7, 2013 at 1:16 PM, Ray Rodriguez < >>>>>>>>> rayrod2030@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Whitney I would have a look at this github issue where I work >>>>>>>>>> through some of my jenkins mesos-plugin issues with Vinod. Might be some >>>>>>>>>> of the same issues you are seeing. >>>>>>>>>> https://github.com/jenkinsci/mesos-plugin/issues/2 >>>>>>>>>> >>>>>>>>>> Ray >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Nov 7, 2013 at 1:07 PM, Whitney Sorenson < >>>>>>>>>> wsorenson@hubspot.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all! >>>>>>>>>>> >>>>>>>>>>> I am trying to get the Jenkins Mesos plugin functioning. I was >>>>>>>>>>> able to get it installed on our Jenkins master. >>>>>>>>>>> >>>>>>>>>>> However, it's unclear if there are any required steps for >>>>>>>>>>> setting up the slaves. When a framework task is launched, it fails >>>>>>>>>>> instantly and there are no logs in the runs folder. >>>>>>>>>>> >>>>>>>>>>> Here's a gist with relevant logs from the slave: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https://gist.github.com/wsorenson/b3562c3e4a8992f9a46f/raw/ea5821c442d826456291330452208d8d7ac8418f/failing+jenkins+logs >>>>>>>>>>> >>>>>>>>>>> Any help on how to debug? At first, I thought maybe we needed >>>>>>>>>>> slave.jar or something but it looks like it's trying to fetch that from the >>>>>>>>>>> master using the URIs. To clarify, I have done no special jenkins related >>>>>>>>>>> setup (as per readme.md) on any of the slaves. >>>>>>>>>>> >>>>>>>>>>> -Whitney >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > --001a11c3eb2c92630104ea9c269c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I should also point out the scheduler didn't seem to s= urvive a reboot of Jenkins - I had to delete the mesos cloud and reenter th= e parameters.


On Thu, Nov 7, 2013 at 3:26 PM, Whitney Sorenson <wsorenson@hubspot.co= m> wrote:
Looks like we're using authentication on our slaves. S= o you either need to pass=A0

-jnlpCredentials user:pass<= /div>

on the command line, or change around the permissi= ons in Jenkins to allow anonymous users to connect/run jobs.

I'm not sure if it would make sense or not to= add the user/pass in the Jenkins plugin configuration screen or if it shou= ld be fetched another way.




On Thu, Nov 7, 2013 at 2:52 PM, Vinod Ko= ne <vinodkone@gmail.com> wrote:
Great. Let us know once you figure it out. Maybe I can add= a FAQ to the plugin's README to help others (or you can contribute too= :)).


On Thu, Nov 7, 2013 at 11:40 AM, Whitney Sor= enson <wsorenson@hubspot.com> wrote:
I added the jenkins user on= the slave - this was the missing piece. I'll add this to my PR for the= readme. Got much further now; now I'm getting a 403 on the fetch:

/jenk= ins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent= .jnlp: 403 Forbidden at hudson.remoting.Launcher.parseJnlpArguments(Launcher.java:261) at hudson.remoting.Launcher.run(Launcher.java:215)

an= d corresponding log on jenkins master:

Nov = 7, 2013 2:38:39 PM winstone.Logger logInternal INFO: While serving http://localhost:8080/jenkins/computer/mesos-jenkins-6f4719c8-1c61-4= b28-b5ab-ba298e846840/slave-agent.jnlp: hudson.security.AccessDeniedExc= eption2: anonymous is missing the Slave/Connect permission

Going to look into what this means.
<= br>


On Thu, Nov 7, 2013 at 2:21 PM, Vinod Kone <vinodkone@gmail.com><= /span> wrote:
I looked at the code and it looks there are few places the= executor might fail before it fetches the URI. Most of them have to do wit= h incorrect permissions. The code was written to have any errors reported e= ither in slave log or console or executor logs (there might be a bug here i= f we are in fact swallowing errors). IIUC, the executor log directory is em= pty in your case which suggests the executor died before it could even crea= te "stdout" or "stderr" files in its sandbox (Is this t= rue?).

Couple of questions:

What user is J= enkins master running as? Is that user known to the host on which mesos sla= ve is running?=A0

How are you starting the mesos s= lave (e.g., cmd line flags)?



On Thu, Nov 7, 2013 at 11:00 AM, Whitney Sorenson <w= sorenson@hubspot.com> wrote:
The gist was compiled from = that log. Here is the complete log from toggling the jenkins plugin on / of= f (you see the ping statements inbetween):





On Thu, Nov 7, 2013 at 1:57 PM, Vino= d Kone <vinodkone@gmail.com> wrote:
What does mesos-slave.err s= ay?


On Thu, Nov 7, 2013 at 10:49 AM, Whitney Sorenso= n <wsorenson@hubspot.com> wrote:
Hi Vinod,

It's=A00.14.0-rc4 in both.

I believe we have logging working:

-rw-r--r-- 1 root root =A0 =A0 =A0 =A0 0 Oct 22 23:48 mesos-slave.out
=
lrwxrwxrwx 1 root root =A0 =A0 =A0 =A063 Oct 22 23:48 mesos-slave.INFO= -> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
lrwxrwxrwx 1 root root =A0 =A0 =A0 =A066 Oct 22 23:49 mesos-slave.WAR= NING -> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.57= 97
drwxr-xr-x 2 root root =A0 =A0 =A04096 Oct 22 23:49 .
-rw-rw= -r-- 1 root root =A0 =A0 =A04827 Nov =A01 20:34 mesos-slave.carousel.invali= d-user.log.WARNING.20131022-234954.5797
-rw-rw-r-- 1 root root = =A010408140 Nov =A07 18:44 mesos-slave.carousel.invalid-user.log.INFO.20131= 022-234823.5797
-rw-r--r-- 1 root root =A053759705 Nov =A07 18:45 mesos-slave.err

Is there something else to check? Is it possible the e= xecutor is failing before it even attempts to fetch URIs?

Ray - Thanks - yeah I found the jenkins logs. I was able to wget the s= lave.jar, and even run it. The mesos-jenkins slaves are dead now, so I can&= #39;t connect to their slave-agent - but the jar does run. Not sure if the = window for trying to connect to one of the mesos launched slaves is long en= ough to try before it is terminated due to failures. Interestingly, when I = try to connect to one of the existing slaves I get a 403.

-Whitney



On Thu, Nov 7, 2013 at 1:34 PM, Vinod Kone <vinodkone@gmail.com&= gt; wrote:
Hey Whitney,

=
What version of mesos are you using (both in the cluster and the plugi= n)?

The slave should print stuff to console when it is laun= ching executor (e.g., "Fetching resources..."). I don't see t= hat in the gist you pasted. Are you capturing stdout/stderr of the slave?


On Thu, Nov 7, 2013 at 10:30 AM, Whitney Sorenson <= wsorenson@hubspo= t.com> wrote:
Thanks Ray.

<= div>I have very similar issue (empty executor directories) - but don't = have any issues curling the slave.jar URI - and I don't have any existi= ng JNLP process running. I don't have a jenkins user - is that the only= setup you did on the slave?

-Whitney

<= div>


On Thu, N= ov 7, 2013 at 1:16 PM, Ray Rodriguez <rayrod2030@gmail.com> wrote:
Hi Whitney I would have a l= ook at this github issue where I work through some of my jenkins mesos-plug= in issues with Vinod. =A0Might be some of the same issues you are seeing. = =A0https://github.com/jenkinsci/mesos-plugin/issues/2

Ray



On Thu, Nov 7, 2013= at 1:07 PM, Whitney Sorenson <wsorenson@hubspot.com> wr= ote:
Hi all!

= I am trying to get the Jenkins Mesos plugin functioning. I was able to get = it installed on our Jenkins master.

However, it's unclear if there are any required ste= ps for setting up the slaves. When a framework task is launched, it fails i= nstantly and there are no logs in the runs folder.

Here's a gist with relevant logs from the slave:


Any help on how to debug? At first, I thought may= be we needed slave.jar or something but it looks like it's trying to fe= tch that from the master using the URIs. To clarify, I have done no special= jenkins related setup (as per readme.md) on any of the slaves.

-Whitney











--001a11c3eb2c92630104ea9c269c--