cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrija Panic <andrija.pa...@gmail.com>
Subject Re: Agent dies every night/morning.... memory violation
Date Tue, 24 Feb 2015 09:04:24 GMT
Thanks guys,

I already disabled all cron jobs and everythgin (did not disable logrotate
though...) - will share my findings.

Thanks a lot for hint.

On 23 February 2015 at 17:30, Simon Weller <sweller@ena.com> wrote:

> I agree with Marcus. I suggest you start monitoring everything that's
> going on around this time frame.
> Maybe dump available memory, and IO (both disk and network) to a file
> every minute or so, and see if you can correlate it to something in
> particular that might be happening on the underlying server, or the network
> connectivity to that server. Maybe slowly move  VMs one at a time to a
> different host and see if the issue follows a particular VM.
>
> In the mean time in order to reduce the affect of this problem, you could
> use a process monitoring like Monit to watch the PID and restart
> cloudstack-agent if a failure is detected.
>
> - Si
>
> ________________________________________
> From: Marcus <shadowsor@gmail.com>
> Sent: Monday, February 23, 2015 10:21 AM
> To: dev@cloudstack.apache.org
> Cc: users@cloudstack.apache.org
> Subject: Re: Agent dies every night/morning.... memory violation
>
> It doesn't really sound like an agent problem, but some other root
> problem that is causing issues for the agent. Perhaps it is specific
> to the host simply because there is a particular VM that always runs
> on that host and the VM itself is triggering the issue. Perhaps a
> heavy logrotate or cron job on the vm causes issues for librados. Just
> grasping at straws here. From the output provided it does seem that
> the libvirt bindings that include ceph code are terminating the agent
> execution.  My guess is that if you focus on "why this host" as
> opposed to "what's going on", you'll find the answer to both. Sorry, I
> know that's not much help.
>
> On Mon, Feb 23, 2015 at 7:29 AM, Andrija Panic <andrija.panic@gmail.com>
> wrote:
> > Anybody?, before I start to cry :(
> >
> > On 21 February 2015 at 21:18, Andrija Panic <andrija.panic@gmail.com>
> wrote:
> >
> >> HI Simon,
> >>
> >> selinux is disabled, I have just double checked.
> >>
> >> BTW, this is what I can see in the cloudstack-agent.err log - seems like
> >> some CEPH related issues, but not sure why would agent die...
> >> If I recall correclty, this might be happening since the CEPH update
> from
> >> 0.80.3? to 0.87 - and this seesm like some crash in librados....
> >>
> >>
> >> libust[1907/2046]: Warning: HOME environment variable not set. Disabling
> >> LTTng-UST per-user tracing. (in setup_local_apps() at
> lttng-ust-comm.c:305)
> >> libvirt:  error : name in virDomainLookupByName must not be NULL
> >> libvirt:  error : name in virDomainLookupByName must not be NULL
> >> libvirt:  error : name in virDomainLookupByName must not be NULL
> >> libvirt:  error : name in virDomainLookupByName must not be NULL
> >> libvirt: Storage Driver error : failed to remove volume
> >> 'cloudstack/bd751250-de35-4d2e-a4e3-3ee4b636c2a7': Device or resource
> busy
> >> ./log/SubsystemMap.h: In function 'bool
> >> ceph::log::SubsystemMap::should_gather(unsigned int, int)' thread
> >> 7f04427fc700 time 2015-02-21 06:39:38.839210
> >> ./log/SubsystemMap.h: 62: FAILED assert(sub < m_subsys.size())
> >>  ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
> >>  1: (()+0x1fe223) [0x7f060c932223]
> >>  2: (ObjectCacher::flusher_entry()+0x155) [0x7f060c9866e5]
> >>  3: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f060c9976cd]
> >>  4: (()+0x79d1) [0x7f06605ee9d1]
> >>  5: (clone()+0x6d) [0x7f066033bb5d]
> >>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed
> >> to interpret this.
> >> terminate called after throwing an instance of 'ceph::FailedAssertion'
> >> 21/02/2015 06:39:38 1905 jsvc.exec error: Service did not exit cleanly
> >>
> >> On 20 February 2015 at 21:56, Simon Weller <sweller@ena.com> wrote:
> >>
> >>> Andrija,
> >>>
> >>> What is SELinux set to on this host?
> >>>
> >>>
> >>> - SI
> >>>
> >>>
> >>> ________________________________________
> >>> From: Andrija Panic <andrija.panic@gmail.com>
> >>> Sent: Friday, February 20, 2015 6:06 AM
> >>> To: dev@cloudstack.apache.org; users@cloudstack.apache.org
> >>> Subject: Agent dies every night/morning.... memory violation
> >>>
> >>> Hi,
> >>>
> >>> I have crazy agent on one of the hosts, that is being killed each
> morning
> >>> and I found this in /var/log/audit.log:
> >>>
> >>> type=ANOM_ABEND msg=audit(1424321463.930:430678): auid=0 uid=0 gid=0
> >>> ses=68891 pid=10831 comm="jsvc" reason="memory violation" sig=6
> >>>
> >>> I dont remember changing anything on the system, but this keeps
> happening
> >>> each morning arrond same time 5.20am-5.40am.
> >>>
> >>> I'm wondering what the hack is happening, any suggestions where to
> >>> troubleshoot ?
> >>> Will check logs in details anyway...
> >>>
> >>> --
> >>>
> >>> Andrija Panić
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> Andrija Panić
> >>
> >
> >
> >
> > --
> >
> > Andrija Panić
>



-- 

Andrija Panić

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message