mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marco Massenzio <ma...@mesosphere.io>
Subject Re: Assertion `data.isNone()' failed
Date Tue, 18 Aug 2015 17:45:00 GMT
Hi Ashwanth,

I've pushed a fix out for review <https://reviews.apache.org/r/37584/>,
we'll see if it makes it in time for 0.24.

As for the version, you can quickly verify that by running `mesos-master
--version` (or just look at the very beginning of the logs, it will tell
you a bunch of stuff about version, build, etc.)

I am sorry, I don't really know enough about setting up Hadoop on Mesos to
give you any useful guidance; from a quick glance at the code, it seems to
me that, if the URI is a `hdfs://` one, the only way to retrieve the
tarball is via HDFS (so you will need the hdfs client to be available on
the Slave(s)).
If you do use an HTTP URI (http://....) then it should work just fine.

Hopefully others will be able to chime in with a more informed view.

*Marco Massenzio*

*Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>*

On Tue, Aug 18, 2015 at 2:46 AM, Ashwanth Kumar <ashwanth@indix.com> wrote:

> Thanks Marco for the update.
>
> My understanding of the hadoop mesos framework was that the executor would
> download the hadoop distro from mapred.mesos.executor.uri and execute the
> TTs. I didn't know that to download from HDFS it needs `hdfs` binary in
> PATH. I don't have a hadoop setup on the mesos slave. Should I go ahead and
> add them?
>
> Regarding the line number mismatch, I installed the package through
> mesosphere not sure if that's the reason.
>
>
> On Tue, Aug 18, 2015 at 1:22 PM, Marco Massenzio <marco@mesosphere.io>
> wrote:
>
>> Are you sure this is a 0.21.1 cluster? the line numbers in the logs match
>> the code in Mesos 0.23.0
>>
>> This is, however, a genuine bug (src/launcher/fetcher.cpp#L99):
>>
>>   Try<bool> available = hdfs.available();
>>
>>   if (available.isError() || !available.get()) {
>>     return Error("Skipping fetch with Hadoop Client as"
>>                  " Hadoop Client not available: " + available.error());
>>   }
>>
>> The root cause is that (probably) the HDFS client is not available on the
>> slave; however, we do not 'error()' but rather return a 'false' - this is
>> all good.
>> The bug is exposed in the return line, where we try to retrieve
>> available.error() (which we should not - it's just `false`).
>>
>> This was a 'latent' bug that *may* have been exposed by (my) recent
>> refactoring of os::shell which is used by hdfs.available() under the covers.
>> (this is a bit unclear, though, as that refactoring is post-0.23)
>>
>> Be that as it may, I've filed
>> https://issues.apache.org/jira/browse/MESOS-3287: the fix is trivial and
>> I may be able to sneak it into 0.24 (which we're cutting now).
>>
>> Thanks for reporting!
>>
>> PS - bad code aside, the root cause is that the `hdfs` binary seems to be
>> unreachable on the slave: is it installed in the PATH of the user under
>> which the slave binary executes?
>>
>>
>>
>> *Marco Massenzio*
>>
>> *Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>*
>>
>> On Mon, Aug 17, 2015 at 10:46 PM, Ashwanth Kumar <ashwanth@indix.com>
>> wrote:
>>
>>> We've a 20 node mesos cluster running mesos v0.21.1, We run marathon on
>>> top of this setup without any problems for ~4 months now. I'm now trying to
>>> get hadoop mesos <https://github.com/mesos/hadoop/> integration working
>>> but I see the TaskTrackers that gets launched are failing with the
>>> following error
>>>
>>> I0818 05:36:35.058688 24428 fetcher.cpp:409] Fetcher Info:
>>> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20150706-075218-1611773194-5050-28439-S473\/hadoop","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"hdfs:\/\/hdfs.prod:54310\/user\/ashwanth\/hadoop-with-mesos-2.6.0-cdh5.4.4.tar.gz"}}],"sandbox_directory":"\/var\/lib\/mesos\/slaves\/20150706-075218-1611773194-5050-28439-S473\/frameworks\/20150706-075218-1611773194-5050-28439-4532\/executors\/executor_Task_Tracker_4129\/runs\/c26f52d4-4055-46fa-b999-11d73f2096dd","user":"hadoop"}
>>> I0818 05:36:35.059806 24428 fetcher.cpp:364] Fetching URI
>>> 'hdfs://hdfs.prod:54310/user/ashwanth/hadoop-with-mesos-2.6.0-cdh5.4.4.tar.gz'
>>> I0818 05:36:35.059821 24428 fetcher.cpp:238] Fetching directly into the
>>> sandbox directory
>>> I0818 05:36:35.059835 24428 fetcher.cpp:176] Fetching URI
>>> 'hdfs://hdfs.prod:54310/user/ashwanth/hadoop-with-mesos-2.6.0-cdh5.4.4.tar.gz'
>>> *mesos-fetcher:
>>> /tmp/mesos-build/mesos-repo/3rdparty/libprocess/3rdparty/stout/include/stout/try.hpp:90:
>>> const string& Try<T>::error() const [with T = bool; std::string =
>>> std::basic_string<char>]: Assertion `data.isNone()' failed.*
>>> *** Aborted at 1439876195 (unix time) try "date -d @1439876195" if you
>>> are using GNU date ***
>>> PC: @       0x343ee32635 (unknown)
>>> *** SIGABRT (@0x5f6c) received by PID 24428 (TID 0x7f988832f820) from
>>> PID 24428; stack trace: ***
>>>     @       0x343f20f710 (unknown)
>>>     @       0x343ee32635 (unknown)
>>>     @       0x343ee33e15 (unknown)
>>>     @       0x343ee2b75e (unknown)
>>>     @       0x343ee2b820 (unknown)
>>>     @           0x408b0a Try<>::error()
>>>     @           0x40cbcf download()
>>>     @           0x4098a3 main
>>>     @       0x343ee1ed5d (unknown)
>>>     @           0x40aeb5 (unknown)
>>> Failed to synchronize with slave (it's probably exited)
>>>
>>> Environment
>>> - EC2 Machines
>>> - Output of lsb_release -a
>>> LSB Version:
>>>  :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
>>> Distributor ID: CentOS
>>> Description:  CentOS release 6.5 (Final)
>>> Release:  6.5
>>> Codename: Final
>>>
>>> Any ideas what I'm doing wrong?
>>>
>>> --
>>> -- Ashwanth Kumar
>>>
>>
>>
>
>
> --
> -- Ashwanth Kumar
>

Mime
View raw message