mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marc Villacorta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-6486) Mesos on Alpine Linux: JVM Segmentation fault
Date Wed, 26 Oct 2016 15:49:59 GMT

    [ https://issues.apache.org/jira/browse/MESOS-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15608833#comment-15608833
] 

Marc Villacorta commented on MESOS-6486:
----------------------------------------

What do you think? Is this a problem with _libjvm.so_ or perhaps a JNI problem in _libmesos-1.0.1.so_?

> Mesos on Alpine Linux: JVM Segmentation fault
> ---------------------------------------------
>
>                 Key: MESOS-6486
>                 URL: https://issues.apache.org/jira/browse/MESOS-6486
>             Project: Mesos
>          Issue Type: Wish
>    Affects Versions: 1.0.1
>         Environment: *Docker*
> {code:none}
> ➜  ~ docker version
> Client:
>  Version:      1.12.1
>  API version:  1.24
>  Go version:   go1.7.1
>  Git commit:   6f9534c
>  Built:        Thu Sep  8 10:31:18 2016
>  OS/Arch:      darwin/amd64
> Server:
>  Version:      1.12.1
>  API version:  1.24
>  Go version:   go1.6.3
>  Git commit:   23cf638
>  Built:        Thu Aug 18 17:52:38 2016
>  OS/Arch:      linux/amd64
> {code}
> *Alpine*
> {code:none}
> ---------------  S Y S T E M  ---------------
> OS:NAME="Alpine Linux"
> ID=alpine
> VERSION_ID=3.4.4
> PRETTY_NAME="Alpine Linux v3.4"
> HOME_URL="http://alpinelinux.org"
> BUG_REPORT_URL="http://bugs.alpinelinux.org"
> uname:Linux 4.4.20-moby #1 SMP Thu Sep 15 12:10:20 UTC 2016 x86_64
> libc:glibc 2.9 NPTL
> rlimit: STACK 8192k, CORE infinity, NPROC infinity, NOFILE 1048576, AS infinity
> load average:0.01 0.39 0.89
> {code}
> *Java*
> {code:none}
> # JRE version: OpenJDK Runtime Environment (8.0_101-b13) (build 1.8.0_101-b13)
> # Java VM: OpenJDK 64-Bit Server VM (25.101-b13 mixed mode linux-amd64 compressed oops)
> # Derivative: IcedTea 3.1.0
> # Distribution: Custom build (Tue Aug 30 20:38:19 GMT 2016)
> {code}
>            Reporter: Marc Villacorta
>            Priority: Minor
>         Attachments: hs_err_pid1677.log
>
>
> I have compiled Mesos 1.0.1 inside a Docker container using Alpine Linux (Dockerfile
below):
> {code:none}
> # Set the base image for subsequent instructions:
> FROM alpine:3.4
> MAINTAINER Marc Villacorta Morera <marc.villacorta@gmail.com>
> # Environment variables:
> ENV TAG="1.0.1" \
>     PREFIX="/usr/local" \
>     JAVA_HOME="/usr/lib/jvm/default-jvm" \
>     JAVA_JVM_LIBRARY="/usr/lib/jvm/default-jvm/jre/lib/amd64/server/libjvm.so" \
>     LD_LIBRARY_PATH="/usr/lib/jvm/default-jvm/jre/lib/amd64/server" \
>     EDGE_REPO="http://nl.alpinelinux.org/alpine/edge"
> # Install mesos:
> RUN apk add -U --no-cache -t dev git autoconf automake libtool g++ \
>     zlib-dev fts-dev apr-dev curl-dev file cyrus-sasl-dev cyrus-sasl-crammd5 \
>     subversion-dev make patch linux-headers binutils && apk add -U --no-cache
\
>     -t dev openjdk8 maven --repository ${EDGE_REPO}/community && apk add -U \
>     --no-cache libstdc++ libgcc subversion-libs libcurl fts zlib coreutils \
>     && git clone https://git-wip-us.apache.org/repos/asf/mesos.git &&
cd mesos \
>     && { [ "${TAG}" != "master" ] && git checkout tags/${TAG} -b ${TAG};
}; \
>     ./bootstrap && mkdir build && cd build && ../configure --prefix=${PREFIX}
\
>     --disable-dependency-tracking --disable-maintainer-mode --disable-python \
>     --enable-optimize --enable-silent-rules \
>     && CORES=$(cat /proc/cpuinfo | grep processor | wc -l) \
>     && make -j${CORES} && make install && cd && rm -rf
/mesos ${PREFIX}/include \
>     && find ${PREFIX} -type f -perm /u=x,g=x,o=x | xargs strip -s 2>/dev/null;
\
>     apk del --purge dev && rm -rf /var/cache/apk/*
> # Command:
> CMD ["/bin/sh"]
> {code}
> Some tests are failing and my biggest concern is with this one:
> {code:none}
> make check GTEST_FILTER="ExamplesTest.JavaFramework"
> {code}
> {code:none}
> [==========] Running 1 test from 1 test case.
> [----------] Global test environment set-up.
> [----------] 1 test from ExamplesTest
> [ RUN      ] ExamplesTest.JavaFramework
> ../../src/tests/script.cpp:80: Failure
> Failed
> java_framework_test.sh terminated with signal Segmentation fault
> [  FAILED  ] ExamplesTest.JavaFramework (5655 ms)
> [----------] 1 test from ExamplesTest (5656 ms total)
> [----------] Global test environment tear-down
> [==========] 1 test from 1 test case ran. (5689 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] ExamplesTest.JavaFramework
> {code}
> An ugly SIGSEGV is dispatched by the kernel. It looks like _libjvm.so_ is the offending
library but I am not sure at all:
> {code:none}
> I1026 15:19:54.843340  1706 replica.cpp:712] Persisted action at 7
> I1026 15:19:54.843683  1706 replica.cpp:691] Replica received learned notice for position
7 from @0.0.0.0:0
> I1026 15:19:54.864063  1706 leveldb.cpp:341] Persisting action (690 bytes) to leveldb
took 20.333769ms
> I1026 15:19:54.864123  1706 replica.cpp:712] Persisted action at 7
> I1026 15:19:54.864131  1706 replica.cpp:697] Replica learned APPEND action at position
7
> I1026 15:19:54.864936  1705 registrar.cpp:509] Successfully updated the 'registry' in
31.458048ms
> I1026 15:19:54.864989  1700 log.cpp:596] Attempting to truncate the log to 7
> I1026 15:19:54.865267  1706 coordinator.cpp:348] Coordinator attempting to write TRUNCATE
action at position 8
> I1026 15:19:54.866050  1706 slave.cpp:1095] Registered with master master@172.17.0.2:37015;
given agent ID 7d8d36ff-5d82-4e91-aba8-46267acc8536-S2
> I1026 15:19:54.866025  1700 master.cpp:4619] Registered agent 7d8d36ff-5d82-4e91-aba8-46267acc8536-S2
at slave(1)@172.17.0.2:37015 (2a2f454552b6) with cpus(*):2; mem(*):10240; disk(*):55318; ports(*):[31000-32000]
> I1026 15:19:54.866127  1702 hierarchical.cpp:478] Added agent 7d8d36ff-5d82-4e91-aba8-46267acc8536-S2
(2a2f454552b6) with cpus(*):2; mem(*):10240; disk(*):55318; ports(*):[31000-32000] (allocated:
)
> I1026 15:19:54.866257  1700 status_update_manager.cpp:181] Resuming sending status updates
> I1026 15:19:54.866878  1706 slave.cpp:1155] Forwarding total oversubscribed resources
> I1026 15:19:54.866969  1706 master.cpp:5002] Received update of agent 7d8d36ff-5d82-4e91-aba8-46267acc8536-S2
at slave(1)@172.17.0.2:37015 (2a2f454552b6) with total oversubscribed resources
> I1026 15:19:54.867280  1705 hierarchical.cpp:542] Agent 7d8d36ff-5d82-4e91-aba8-46267acc8536-S2
(2a2f454552b6) updated with oversubscribed resources  (total: cpus(*):2; mem(*):10240; disk(*):55318;
ports(*):[31000-32000], allocated: )
> I1026 15:19:54.867350  1706 replica.cpp:537] Replica received write request for position
8 from (67)@172.17.0.2:37015
> I1026 15:19:54.876315  1706 leveldb.cpp:341] Persisting action (16 bytes) to leveldb
took 8.874131ms
> I1026 15:19:54.876348  1706 replica.cpp:712] Persisted action at 8
> I1026 15:19:54.876600  1705 replica.cpp:691] Replica received learned notice for position
8 from @0.0.0.0:0
> I1026 15:19:54.885751  1705 leveldb.cpp:341] Persisting action (18 bytes) to leveldb
took 9.032464ms
> I1026 15:19:54.885886  1705 leveldb.cpp:399] Deleting ~2 keys from leveldb took 39508ns
> I1026 15:19:54.885917  1705 replica.cpp:712] Persisted action at 8
> I1026 15:19:54.885938  1705 replica.cpp:697] Replica learned TRUNCATE action at position
8
> I1026 15:19:55.790892  1705 master.cpp:2424] Received SUBSCRIBE call for framework 'Test
Framework (Java)' at scheduler-b2956950-fa7e-49c3-88ed-efcef624b837@172.17.0.2:37015
> I1026 15:19:55.791019  1705 master.cpp:2500] Subscribing framework Test Framework (Java)
with checkpointing enabled and capabilities [  ]
> I1026 15:19:55.791221  1705 hierarchical.cpp:271] Added framework 7d8d36ff-5d82-4e91-aba8-46267acc8536-0000
> I1026 15:19:55.791256  1700 sched.cpp:743] Framework registered with 7d8d36ff-5d82-4e91-aba8-46267acc8536-0000
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00007fcc6d6dcc64, pid=1677, tid=0x00007fcc54193ab0
> #
> # JRE version: OpenJDK Runtime Environment (8.0_101-b13) (build 1.8.0_101-b13)
> # Java VM: OpenJDK 64-Bit Server VM (25.101-b13 mixed mode linux-amd64 compressed oops)
> # Derivative: IcedTea 3.1.0
> # Distribution: Custom build (Tue Aug 30 20:38:19 GMT 2016)
> # Problematic frame:
> # C  [libjvm.so+0x300c64]
> #
> # Core dump written. Default location: /mesos/build/src/examples/java/core or core.1677
> #
> # An error report file with more information is saved as:
> # /mesos/build/src/examples/java/hs_err_pid1677.log
> I1026 15:19:55.792402  1705 master.cpp:5725] Sending 3 offers to framework 7d8d36ff-5d82-4e91-aba8-46267acc8536-0000
(Test Framework (Java)) at scheduler-b2956950-fa7e-49c3-88ed-efcef624b837@172.17.0.2:37015
> #
> # If you would like to submit a bug report, please include
> # instructions on how to reproduce the bug and visit:
> #   http://icedtea.classpath.org/bugzilla
> #
> Segmentation fault (core dumped)
> {code}
> Find attached the _/mesos/build/src/examples/java/hs_err_pid1677.log_ file.
> Also here you have a GDB _bt_ (for those who understand it):
> {code:none}
> warning: Can't read pathname for load map: No error information.
> Core was generated by `/usr/lib/jvm/default-jvm/bin/java -cp /mesos/build/src/java/target/protobuf-jav'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00007fcc6e31dd08 in abort () from /lib/ld-musl-x86_64.so.1
> [Current thread is 1 (LWP 1700)]
> (gdb) bt
> #0  0x00007fcc6e31dd08 in abort () from /lib/ld-musl-x86_64.so.1
> #1  0x00007fcc54192d28 in ?? ()
> #2  0x00007fcc6d93ac91 in ?? () from /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so
> #3  0x00007fcc6da0947c in ?? () from /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so
> #4  0x00007fcc6d940a40 in JVM_handle_linux_signal () from /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so
> #5  0x00007fcc6d939b21 in ?? () from /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so
> #6  <signal handler called>
> #7  0x00007fcc6d6dcc64 in ?? () from /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so
> #8  0x00007fcc6d7f4a36 in ?? () from /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so
> #9  0x00007fcc6d7f4c4c in ?? () from /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so
> #10 0x00007fcc6d8218c4 in ?? () from /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so
> #11 0x00007fcc5812057a in JNIScheduler::registered(mesos::SchedulerDriver*, mesos::FrameworkID
const&, mesos::MasterInfo const&) ()
>    from /mesos/build/src/.libs/libmesos-1.0.1.so
> #12 0x00007fcc577e18df in mesos::internal::SchedulerProcess::registered(process::UPID
const&, mesos::FrameworkID const&, mesos::MasterInfo const&) () from /mesos/build/src/.libs/libmesos-1.0.1.so
> #13 0x00007fcc577f51e4 in void ProtobufProcess<mesos::internal::SchedulerProcess>::handler2<mesos::internal::FrameworkRegisteredMessage,
mesos::FrameworkID const&, mesos::FrameworkID const&, mesos::MasterInfo const&,
mesos::MasterInfo const&>(mesos::internal::SchedulerProcess*, void (mesos::internal::SchedulerProcess::*)(process::UPID
const&, mesos::FrameworkID const&, mesos::MasterInfo const&), mesos::FrameworkID
const& (mesos::internal::FrameworkRegisteredMessage::*)() const, mesos::MasterInfo const&
(mesos::internal::FrameworkRegisteredMessage::*)() const, process::UPID const&, std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&) () from /mesos/build/src/.libs/libmesos-1.0.1.so
> #14 0x00007fcc577df4aa in std::_Function_handler<void (process::UPID const&, std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&), std::_Bind<void
(*(mesos::internal::SchedulerProcess*, void (mesos::internal::SchedulerProcess::*)(process::UPID
const&, mesos::FrameworkID const&, mesos::MasterInfo const&), mesos::FrameworkID
const& (mesos::internal::FrameworkRegisteredMessage::*)() const, mesos::MasterInfo const&
(mesos::internal::FrameworkRegisteredMessage::*)() const, std::_Placeholder<1>, std::_Placeholder<2>))(mesos::internal::SchedulerProcess*,
void (mesos::internal::SchedulerProcess::*)(process::UPID const&, mesos::FrameworkID const&,
mesos::MasterInfo const&), mesos::FrameworkID const& (mesos::internal::FrameworkRegisteredMessage::*)()
const, mesos::MasterInfo const& (mesos::internal::FrameworkRegisteredMessage::*)() const,
process::UPID const&, std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&)> >::_M_invoke(std::_Any_data const&,
process::UPID const&, std::__cxx11::basic_string<---Type ---Type <return> to
continue, or q <return> to quit---
> char, std::char_traits<char>, std::allocator<char> > const&) () from
/mesos/build/src/.libs/libmesos-1.0.1.so
> #15 0x00007fcc577e9d0a in ProtobufProcess<mesos::internal::SchedulerProcess>::visit(process::MessageEvent
const&) () from /mesos/build/src/.libs/libmesos-1.0.1.so
> #16 0x00007fcc580cef73 in process::ProcessManager::resume(process::ProcessBase*) () from
/mesos/build/src/.libs/libmesos-1.0.1.so
> #17 0x00007fcc580cf8d7 in std::thread::_Impl<std::_Bind_simple<process::ProcessManager::init_threads()::{unnamed
type#1} ()> >::_M_run() () from /mesos/build/src/.libs/libmesos-1.0.1.so
> #18 0x00007fcc6d147c8a in execute_native_thread_routine () from /usr/lib/libstdc++.so.6
> #19 0x00007fcc6e35154d in ?? () from /lib/ld-musl-x86_64.so.1
> #20 0x0000000000000000 in ?? ()
> (gdb)
> {code}
> ... and the same _bt_ after I installed the _musl-dbg_ package:
> {code:none}
> warning: Can't read pathname for load map: No error information.
> Core was generated by `/usr/lib/jvm/default-jvm/bin/java -cp /mesos/build/src/java/target/protobuf-jav'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  a_crash () at ./arch/x86_64/atomic_arch.h:108
> 108	./arch/x86_64/atomic_arch.h: No such file or directory.
> [Current thread is 1 (LWP 1700)]
> (gdb) bt
> #0  a_crash () at ./arch/x86_64/atomic_arch.h:108
> #1  abort () at src/exit/abort.c:11
> #2  0x00007fcc6d93ac91 in ?? () from /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so
> #3  0x00007fcc6da0947c in ?? () from /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so
> #4  0x00007fcc6d940a40 in JVM_handle_linux_signal () from /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so
> #5  0x00007fcc6d939b21 in ?? () from /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so
> #6  0x00007fcc6e345d04 in sigwaitinfo (mask=<optimized out>, si=<optimized out>)
at src/signal/sigwaitinfo.c:5
> #7  0x0000000000000001 in ?? ()
> #8  0x0000000000000000 in ?? ()
> (gdb)
> {code}
> I have tested with _openjdk7_ and _openjdk8_ (3.4.4 and edge) with no luck.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message