Return-Path: X-Original-To: apmail-cloudstack-dev-archive@www.apache.org Delivered-To: apmail-cloudstack-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 01F3510F30 for ; Mon, 24 Mar 2014 13:32:31 +0000 (UTC) Received: (qmail 93581 invoked by uid 500); 24 Mar 2014 13:32:30 -0000 Delivered-To: apmail-cloudstack-dev-archive@cloudstack.apache.org Received: (qmail 92952 invoked by uid 500); 24 Mar 2014 13:32:28 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 92930 invoked by uid 99); 24 Mar 2014 13:32:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Mar 2014 13:32:26 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [74.125.149.75] (HELO na3sys009aog105.obsmtp.com) (74.125.149.75) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Mar 2014 13:32:22 +0000 Received: from mail-ob0-f173.google.com ([209.85.214.173]) (using TLSv1) by na3sys009aob105.postini.com ([74.125.148.12]) with SMTP ID DSNKUzAz0PTU0s9ymFYGhrni154bLdgxyLB/@postini.com; Mon, 24 Mar 2014 06:32:02 PDT Received: by mail-ob0-f173.google.com with SMTP id gq1so5679678obb.18 for ; Mon, 24 Mar 2014 06:32:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=jijXstHwrB3unoyEenPdwqQYLBfg0ENAzbRGKtDNcgk=; b=Lzal0GLMKC3dxupKCph9+vWpxgYuVGFxo8EdqcqPekjztS+t0o3v8SoOvZYm2VO8JN 1LojHpZWa62Q8ropRWOB9cqz9XQGxvfDotZlvFHS0dAAqQ4Bo7jfTifSYKzQxY4YBM8z 6hP5xXFBzmDIfWL1LrSElQ/0krxYx9fJJrh+TtZStfYl0/C+yHEoVsMYcPpU+NM9HVhG 8lCKE9tCpayBMoScMgWq3B1rAPPA+V1ATt0KUXuZdG3XHsJSHerARRP/0ZSO+RLARm9V wZ1vtvPpA7Bg97d/VkOlpiu3Ge2wX/UqhCkGCIhndxZf/VXlXdYX4eR5c1xbsPEGoWEr FfAg== X-Gm-Message-State: ALoCoQnIHioyrPQkV4yBa5gsYQ5nVrgJ53PLcvmkomjgVTwVxBaGFnpleJfv4tM5uTIceXEprCBZXrv9az/4lid1gH4waa1oKCDQfqh2Exu1KS0jfKCi4GU6Ji28hgzTCqCK/GSzyF5s6ANVsMx1gtbIW2PntryfeibRC35l0Awox+a/F3h19l8= X-Received: by 10.60.42.194 with SMTP id q2mr1037025oel.66.1395667920444; Mon, 24 Mar 2014 06:32:00 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.60.42.194 with SMTP id q2mr1037005oel.66.1395667920270; Mon, 24 Mar 2014 06:32:00 -0700 (PDT) Received: by 10.60.84.230 with HTTP; Mon, 24 Mar 2014 06:32:00 -0700 (PDT) In-Reply-To: References: <8CCE9859D2CAFD45948DBF7145AFB98C06756434@SINPEX01CL01.citrite.net> Date: Mon, 24 Mar 2014 19:02:00 +0530 Message-ID: Subject: Re: system vm disk space issue in ACS 4.3 From: Saurav Lahiri To: "dev@cloudstack.apache.org" Content-Type: multipart/alternative; boundary=001a11c20e2a9cc2da04f55a42eb X-Virus-Checked: Checked by ClamAV on apache.org --001a11c20e2a9cc2da04f55a42eb Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable So if you look at the code for run.sh you will see that it is a while loop that calls _run.sh. If _run.sh gets terminated and the java process is still running, future runs of _run.sh will attempt to start the java process and will fail. But as it fails a whole lot of log message will get logged to cloud.out by the /etc/init.d/cloud script. I guess the fix I am working on should also address( by preventing scripts from logging to cloud.out) this issue. Thanks Saurav On Thu, Mar 20, 2014 at 10:29 PM, Sunil Bakhru wrote: > The fix for 6258 will contain the log size and prevent it from hogging th= e > root partition. > > However, we do see the following 'ERROR' log which fills up our system VM > log file. Has anybody seen this? > Any pointers on what could possibly be wrong? > > > 2014-03-20 08:40:54,395 ERROR [cloud.agent.AgentShell] (main:null) Unable > to start agent: Java process is being started twice. If this is not true= , > remove /var/run/agent.SecStorage.pid > 2014-03-20 08:41:05,359 INFO [cloud.agent.AgentShell] (main:null) Agent > started > 2014-03-20 08:41:05,364 INFO [cloud.agent.AgentShell] (main:null) > Implementation Version is 4.3.0 > 2014-03-20 08:41:05,364 INFO [cloud.agent.AgentShell] (main:null) > agent.properties found at /usr/local/cloud/systemvm/conf/agent.properties > 2014-03-20 08:41:05,370 DEBUG [cloud.agent.AgentShell] (main:null) Found > property: instance > 2014-03-20 08:41:05,370 DEBUG [cloud.agent.AgentShell] (main:null) Found > property: NfsSecondaryStorageResource.id > 2014-03-20 08:41:05,370 DEBUG [cloud.agent.AgentShell] (main:null) Found > property: resource > 2014-03-20 08:41:05,370 INFO [cloud.agent.AgentShell] (main:null) > Defaulting to using properties file for storage > 2014-03-20 08:41:05,371 INFO [cloud.agent.AgentShell] (main:null) > Defaulting to the constant time backoff algorithm > 2014-03-20 08:41:05,380 INFO [cloud.utils.LogUtils] (main:null) log4j > configuration found at /usr/local/cloud/systemvm/conf/log4j-cloud.xml > 2014-03-20 08:41:05,393 DEBUG [cloud.agent.AgentShell] (main:null) > Checking to see if agent.SecStorage.pid exists. > 2014-03-20 08:41:05,396 DEBUG [cloud.utils.ProcessUtil] (main:null) > environment.properties could not be opened > 2014-03-20 08:41:05,403 DEBUG [cloud.utils.ProcessUtil] (main:null) > Executing: bash -c ps -p 3469 > 2014-03-20 08:41:05,411 DEBUG [cloud.utils.ProcessUtil] (main:null) > Execution is successful. > 2014-03-20 08:41:05,411 DEBUG [cloud.utils.ProcessUtil] (main:null) PID > TTY TIME CMD > 3469 ? 00:00:36 java > > 2014-03-20 08:41:05,411 ERROR [cloud.agent.AgentShell] (main:null) Unable > to start agent: Java process is being started twice. If this is not true= , > remove /var/run/agent.SecStorage.pid > > > > Appreciate the help. > > Thanks, > Sunil > > > > > > On 3/19/14 8:40 AM, "Rajesh Battala" wrote: > > >Great. Post your patch at reviews.apache.org > > > >Thanks > >Rajesh Battala > > > >-----Original Message----- > >From: Saurav Lahiri [mailto:saurav.lahiri@sungard.com] > >Sent: Wednesday, March 19, 2014 8:49 PM > >To: dev@cloudstack.apache.org > >Subject: Re: system vm disk space issue in ACS 4.3 > > > >Thanks Rajesh. I have created a jira ticket for this > >https://issues.apache.org/jira/browse/CLOUDSTACK-6258. Will send in the > >fix for review in a couple of days. > > > >Thanks > >Saurav > > > > > >On Wed, Mar 19, 2014 at 8:03 PM, Rajesh Battala > >wrote: > > > >> Can you please file a bug and send your fix for review. > >> > >> Thanks > >> Rajesh Battala > >> > >> -----Original Message----- > >> From: Saurav Lahiri [mailto:saurav.lahiri@sungard.com] > >> Sent: Wednesday, March 19, 2014 7:20 PM > >> To: dev@cloudstack.apache.org > >> Subject: Re: system vm disk space issue in ACS 4.3 > >> > >> The problem appears to be the start function in the /etc/init.d/cloud > >> service for console proxy. > >> More specifically the following line also writes to /var/log/cloud.out > >> > >> > >> ---------------------------------------------------------------------- > >> ------------------------------------------------------------ > >> (cd $CLOUD_COM_HOME/systemvm; nohup ./run.sh > > >> /var/log/cloud/cloud.out > >> 2>&1 & ) > >> > >> ---------------------------------------------------------------------- > >> ------------------------------------------------------------ > >> > >> since run.sh calls _run.sh and both has "set -x" enabled, in certain > >> situations they can keep logging messages to cloud.out without being > >> aware of the settings in log4j-cloud.xml > >> > >> > >> One way to fix that could be that run.sh and _run.sh would log to > >> cloud.out only if a debug flag was set to true, otherwise only the > >> java process would write to cloud.out and log4j would respect the > >> settings in log4j-cloud.xml > >> > >> > >> Thanks > >> Saurav > >> > >> > >> > >> On Mon, Mar 17, 2014 at 8:47 PM, Saurav Lahiri > >> >> >wrote: > >> > >> > Could it have something to do with the RollingFileAppender that is > >> > being used. > >> > The following > >> > rollingfileappender >> > Fi leAppender-not-working-consistently-td8582.html> link appears to > >> > be a > >> bit outdated but they more or less describe a similar problem that we > >> are seeing? > >> > > >> > > >> > On our environment that is what we have seeing for sometime on > >> > console proxy. The root filesystem goes full with the cloud.out.* > >> > occupying all the space. This happens pretty frequently and we have > >> > to regularly recycle the console proxy to resolve this issue. > >> > > >> > > >> > As seen below, cloud.out.2 should not have exceeded 10MB but it > >> > stands at 217MB now. > >> > > >> > drwxr-xr-x 2 root root 4.0K Mar 17 14:57 . > >> > drwxr-xr-x 8 root root 4.0K Mar 17 15:01 .. > >> > -rw-r--r-- 1 root root 0 Mar 12 18:18 api-server.log > >> > -rw-r--r-- 1 root root 357K Mar 17 15:06 cloud.out > >> > -rw-r--r-- 1 root root 2.1M Mar 17 14:56 cloud.out.1 > >> > -rw-r--r-- 1 root root 217M Mar 17 15:06 cloud.out.2 > >> > > >> > root@v-zzzz-VM:/var/log/cloud# lsof | grep cloud.out > >> > sleep 649 root 1w REG 202,1 226122291 181737 > >> > /var/log/cloud/cloud.out.2 > >> > sleep 649 root 2w REG 202,1 226122291 181737 > >> > /var/log/cloud/cloud.out.2 > >> > bash 2312 root 1w REG 202,1 226122291 181737 > >> > /var/log/cloud/cloud.out.2 > >> > bash 2312 root 2w REG 202,1 226122291 181737 > >> > /var/log/cloud/cloud.out.2 > >> > bash 2339 root 1w REG 202,1 226122291 181737 > >> > /var/log/cloud/cloud.out.2 > >> > bash 2339 root 2w REG 202,1 226122291 181737 > >> > /var/log/cloud/cloud.out.2 > >> > bash 2786 root 1w REG 202,1 226122291 181737 > >> > /var/log/cloud/cloud.out.2 > >> > bash 2786 root 2w REG 202,1 226122291 181737 > >> > /var/log/cloud/cloud.out.2 > >> > java 2805 root 1w REG 202,1 226122291 181737 > >> > /var/log/cloud/cloud.out.2 > >> > java 2805 root 2w REG 202,1 226122291 181737 > >> > /var/log/cloud/cloud.out.2 > >> > java 2805 root 116w REG 202,1 319382 181769 > >> > /var/log/cloud/cloud.out > >> > root@v-zzzz-VM:/var/log/cloud# ls -alh > >> > > >> > Thanks > >> > Saurav > >> > > >> > > >> > On Tue, Mar 11, 2014 at 7:58 AM, Chiradeep Vittal < > >> > Chiradeep.Vittal@citrix.com> wrote: > >> > > >> >> Yes, it was deliberate. I can=B9t find the discussion, but it > >> >> revolved around a security best practice of having separate > >> >> partitions for /, /swap, home directories > >> >> > >> >> > >> >> On 3/10/14, 11:35 AM, "Marcus" wrote: > >> >> > >> >> >There have been several raised, actually regarding /var/log. As > >> >> >for the system vm partitioning, it was explicitly changed from > >> >> >single to multiple partitions last year. I have no idea why, but I > >> >> >generally don't file bugs without community discussion on things > >> >> >that seem deliberate. > >> >> > > >> >> >On Sat, Mar 8, 2014 at 11:32 AM, Marcus > wrote: > >> >> >> Yeah, I've just seen on busy systems where even with log > >> >> >>rotation working properly the little space left in var after OS > >> >> >>files is barely enough, for example the conntrackd log on a busy > >> >> >>VPC. We actually ended up rolling our own system vm, the > >> >> >>existing image has plenty of space, its just locked up in other > >>partitions. > >> >> >> > >> >> >> On Mar 8, 2014 8:58 AM, "Rajesh Battala" > >> >> >> > >> >> >>wrote: > >> >> >>> > >> >> >>> Yes, only 435MB is available for /var . we can increase the > >> >> >>> space > >> >> also. > >> >> >>> But we need to find out the root cause which services are > >> >> >>>causing the /var to fill up. > >> >> >>> Can you please find out and post which log files are taking up > >> >> >>>more space in /var > >> >> >>> > >> >> >>> Thanks > >> >> >>> Rajesh Battala > >> >> >>> > >> >> >>> -----Original Message----- > >> >> >>> From: Marcus [mailto:shadowsor@gmail.com] > >> >> >>> Sent: Saturday, March 8, 2014 8:19 PM > >> >> >>> To: dev@cloudstack.apache.org > >> >> >>> Subject: RE: system vm disk space issue in ACS 4.3 > >> >> >>> > >> >> >>> Perhaps there's a new service. I know in the past we've seen > >> >> >>>issues with this , specifically the conntrackd log. I think the > >> >> >>>cloud logs weren't getting rolled either, but I thought it was > >> >> >>>all fixed. > >> >> >>> > >> >> >>> There's also simply not a ton of space on /var, I wish we would > >> >> >>>go back to just having one partition because it orphans lots of > >> >> >>>free space in other filesystems. > >> >> >>> On Mar 8, 2014 12:37 AM, "Rajesh Battala" > >> >> >>> > >> >> >>> wrote: > >> >> >>> > >> >> >>> > AFAIK, log roation is enabled in the systemvm. > >> >> >>> > Can you check whether the logs are getting zipped .? > >> >> >>> > > >> >> >>> > -----Original Message----- > >> >> >>> > From: Anirban Chakraborty [mailto:abchak@juniper.net] > >> >> >>> > Sent: Saturday, March 8, 2014 12:46 PM > >> >> >>> > To: dev@cloudstack.apache.org > >> >> >>> > Subject: system vm disk space issue in ACS 4.3 > >> >> >>> > > >> >> >>> > Hi All, > >> >> >>> > > >> >> >>> > I am seeing system vm disk has no space left after running > >> >> >>> > for few > >> >> >>>days. > >> >> >>> > Cloudstack UI shows the agent in v-2-VM in alert state, while > >> >> >>> > agent state of s-1-VM shows blank (hyphen in the UI). > >> >> >>> > Both the system vms are running and ssh-able from the host. > >> >> >>> > The log > >> >> >>>in > >> >> >>> > s-1-Vm shows following errors: > >> >> >>> > > >> >> >>> > root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.* > >> >> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left > >> >> >>> > on device > >> >> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left > >> >> >>> > on device > >> >> >>> > > >> >> >>> > whereas logs in v-1-VM shows > >> >> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left > >> >> >>> > on device > >> >> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left > >> >> >>> > on device > >> >> >>> > /var/log/cloud/cloud.out.3:07:18:00,547 INFO > >> >> CSExceptionErrorCode:87 > >> >> >>> > - Could not find exception: > >> >> >>> > com.cloud.exception.AgentControlChannelException > >> >> >>> > in error code list for exceptions > >> >> >>> > > >> >> >>> > > >> >> > >> >> >>>/var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChann > >> >> >>>el > >> >> >>>Except > >> >> >>>ion: > >> >> >>> > Unable to post agent control request as link is not available > >> >> >>> > > >> >> >>> > Looks like cloud agent is filling up the log, which is > >> >> >>> > leading to > >> >> the > >> >> >>> > disk full state. > >> >> >>> > > >> >> >>> > Is this a known issue? Thanks. > >> >> >>> > > >> >> >>> > Anirban > >> >> >>> > > >> >> > >> >> > >> >> > >> > > >> > >> > > > > > > --001a11c20e2a9cc2da04f55a42eb--