Return-Path: X-Original-To: apmail-cloudstack-dev-archive@www.apache.org Delivered-To: apmail-cloudstack-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A40DA103EB for ; Wed, 19 Mar 2014 14:33:32 +0000 (UTC) Received: (qmail 22442 invoked by uid 500); 19 Mar 2014 14:33:32 -0000 Delivered-To: apmail-cloudstack-dev-archive@cloudstack.apache.org Received: (qmail 22217 invoked by uid 500); 19 Mar 2014 14:33:31 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 22209 invoked by uid 99); 19 Mar 2014 14:33:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Mar 2014 14:33:31 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rajesh.battala@citrix.com designates 103.14.252.240 as permitted sender) Received: from [103.14.252.240] (HELO SMTP.CITRIX.COM.AU) (103.14.252.240) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Mar 2014 14:33:26 +0000 X-IronPort-AV: E=Sophos;i="4.97,686,1389744000"; d="scan'208";a="3830704" Received: from sinaccessns.citrite.net (HELO SINPEX01CL03.citrite.net) ([10.151.60.9]) by sinpip01.citrite.net with ESMTP; 19 Mar 2014 14:33:03 +0000 Received: from SINPEX01CL02.citrite.net ([169.254.2.128]) by SINPEX01CL03.citrite.net ([169.254.3.142]) with mapi id 14.02.0342.004; Wed, 19 Mar 2014 22:33:02 +0800 From: Rajesh Battala To: "dev@cloudstack.apache.org" Subject: RE: system vm disk space issue in ACS 4.3 Thread-Topic: system vm disk space issue in ACS 4.3 Thread-Index: AQHPOp5QaEPJodBYGE6fg7m1e/UB6JrWzL5A///zGgCAAJjAIP//pY4AgAMlbQCAAIREAIAKRMMAgAMMaICAAJGacA== Date: Wed, 19 Mar 2014 14:33:02 +0000 Message-ID: <8CCE9859D2CAFD45948DBF7145AFB98C067512FA@SINPEX01CL02.citrite.net> References: <5187E1BA-E421-4771-AEA3-0EEBFB098155@juniper.net> <8CCE9859D2CAFD45948DBF7145AFB98C06726A7A@SINPEX01CL01.citrite.net> <8CCE9859D2CAFD45948DBF7145AFB98C06727DB3@SINPEX01CL01.citrite.net> In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.2.30] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-DLP: SIN1 X-Virus-Checked: Checked by ClamAV on apache.org Can you please file a bug and send your fix for review.=20 Thanks Rajesh Battala -----Original Message----- From: Saurav Lahiri [mailto:saurav.lahiri@sungard.com]=20 Sent: Wednesday, March 19, 2014 7:20 PM To: dev@cloudstack.apache.org Subject: Re: system vm disk space issue in ACS 4.3 The problem appears to be the start function in the /etc/init.d/cloud servi= ce for console proxy. More specifically the following line also writes to /var/log/cloud.out ---------------------------------------------------------------------------= ------------------------------------------------------- (cd $CLOUD_COM_HOME/systemvm; nohup ./run.sh > /var/log/cloud/cloud.out 2>&1 & ) ---------------------------------------------------------------------------= ------------------------------------------------------- since run.sh calls _run.sh and both has "set -x" enabled, in certain situat= ions they can keep logging messages to cloud.out without being aware of the= settings in log4j-cloud.xml One way to fix that could be that run.sh and _run.sh would log to cloud.out= only if a debug flag was set to true, otherwise only the java process woul= d write to cloud.out and log4j would respect the settings in log4j-cloud.xm= l Thanks Saurav On Mon, Mar 17, 2014 at 8:47 PM, Saurav Lahiri w= rote: > Could it have something to do with the RollingFileAppender that is=20 > being used. > The following=20 > rollingfileappender leAppender-not-working-consistently-td8582.html> link appears to be a bit= outdated but they more or less describe a similar problem that we are seei= ng? > > > On our environment that is what we have seeing for sometime on console=20 > proxy. The root filesystem goes full with the cloud.out.* occupying=20 > all the space. This happens pretty frequently and we have to regularly=20 > recycle the console proxy to resolve this issue. > > > As seen below, cloud.out.2 should not have exceeded 10MB but it stands=20 > at 217MB now. > > drwxr-xr-x 2 root root 4.0K Mar 17 14:57 . > drwxr-xr-x 8 root root 4.0K Mar 17 15:01 .. > -rw-r--r-- 1 root root 0 Mar 12 18:18 api-server.log > -rw-r--r-- 1 root root 357K Mar 17 15:06 cloud.out > -rw-r--r-- 1 root root 2.1M Mar 17 14:56 cloud.out.1 > -rw-r--r-- 1 root root 217M Mar 17 15:06 cloud.out.2 > > root@v-zzzz-VM:/var/log/cloud# lsof | grep cloud.out > sleep 649 root 1w REG 202,1 226122291 181737 > /var/log/cloud/cloud.out.2 > sleep 649 root 2w REG 202,1 226122291 181737 > /var/log/cloud/cloud.out.2 > bash 2312 root 1w REG 202,1 226122291 181737 > /var/log/cloud/cloud.out.2 > bash 2312 root 2w REG 202,1 226122291 181737 > /var/log/cloud/cloud.out.2 > bash 2339 root 1w REG 202,1 226122291 181737 > /var/log/cloud/cloud.out.2 > bash 2339 root 2w REG 202,1 226122291 181737 > /var/log/cloud/cloud.out.2 > bash 2786 root 1w REG 202,1 226122291 181737 > /var/log/cloud/cloud.out.2 > bash 2786 root 2w REG 202,1 226122291 181737 > /var/log/cloud/cloud.out.2 > java 2805 root 1w REG 202,1 226122291 181737 > /var/log/cloud/cloud.out.2 > java 2805 root 2w REG 202,1 226122291 181737 > /var/log/cloud/cloud.out.2 > java 2805 root 116w REG 202,1 319382 181769 > /var/log/cloud/cloud.out > root@v-zzzz-VM:/var/log/cloud# ls -alh > > Thanks > Saurav > > > On Tue, Mar 11, 2014 at 7:58 AM, Chiradeep Vittal <=20 > Chiradeep.Vittal@citrix.com> wrote: > >> Yes, it was deliberate. I can=B9t find the discussion, but it revolved=20 >> around a security best practice of having separate partitions for /,=20 >> /swap, home directories >> >> >> On 3/10/14, 11:35 AM, "Marcus" wrote: >> >> >There have been several raised, actually regarding /var/log. As for=20 >> >the system vm partitioning, it was explicitly changed from single to=20 >> >multiple partitions last year. I have no idea why, but I generally=20 >> >don't file bugs without community discussion on things that seem=20 >> >deliberate. >> > >> >On Sat, Mar 8, 2014 at 11:32 AM, Marcus wrote: >> >> Yeah, I've just seen on busy systems where even with log rotation=20 >> >>working properly the little space left in var after OS files is=20 >> >>barely enough, for example the conntrackd log on a busy VPC. We=20 >> >>actually ended up rolling our own system vm, the existing image=20 >> >>has plenty of space, its just locked up in other partitions. >> >> >> >> On Mar 8, 2014 8:58 AM, "Rajesh Battala"=20 >> >> >> >>wrote: >> >>> >> >>> Yes, only 435MB is available for /var . we can increase the space >> also. >> >>> But we need to find out the root cause which services are causing=20 >> >>>the /var to fill up. >> >>> Can you please find out and post which log files are taking up=20 >> >>>more space in /var >> >>> >> >>> Thanks >> >>> Rajesh Battala >> >>> >> >>> -----Original Message----- >> >>> From: Marcus [mailto:shadowsor@gmail.com] >> >>> Sent: Saturday, March 8, 2014 8:19 PM >> >>> To: dev@cloudstack.apache.org >> >>> Subject: RE: system vm disk space issue in ACS 4.3 >> >>> >> >>> Perhaps there's a new service. I know in the past we've seen=20 >> >>>issues with this , specifically the conntrackd log. I think the=20 >> >>>cloud logs weren't getting rolled either, but I thought it was=20 >> >>>all fixed. >> >>> >> >>> There's also simply not a ton of space on /var, I wish we would=20 >> >>>go back to just having one partition because it orphans lots of=20 >> >>>free space in other filesystems. >> >>> On Mar 8, 2014 12:37 AM, "Rajesh Battala"=20 >> >>> >> >>> wrote: >> >>> >> >>> > AFAIK, log roation is enabled in the systemvm. >> >>> > Can you check whether the logs are getting zipped .? >> >>> > >> >>> > -----Original Message----- >> >>> > From: Anirban Chakraborty [mailto:abchak@juniper.net] >> >>> > Sent: Saturday, March 8, 2014 12:46 PM >> >>> > To: dev@cloudstack.apache.org >> >>> > Subject: system vm disk space issue in ACS 4.3 >> >>> > >> >>> > Hi All, >> >>> > >> >>> > I am seeing system vm disk has no space left after running for=20 >> >>> > few >> >>>days. >> >>> > Cloudstack UI shows the agent in v-2-VM in alert state, while=20 >> >>> > agent state of s-1-VM shows blank (hyphen in the UI). >> >>> > Both the system vms are running and ssh-able from the host. The=20 >> >>> > log >> >>>in >> >>> > s-1-Vm shows following errors: >> >>> > >> >>> > root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.* >> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left=20 >> >>> > on device >> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left=20 >> >>> > on device >> >>> > >> >>> > whereas logs in v-1-VM shows >> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left=20 >> >>> > on device >> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left=20 >> >>> > on device >> >>> > /var/log/cloud/cloud.out.3:07:18:00,547 INFO >> CSExceptionErrorCode:87 >> >>> > - Could not find exception: >> >>> > com.cloud.exception.AgentControlChannelException >> >>> > in error code list for exceptions >> >>> > >> >>> > >> >> >>>/var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChannel >> >>>Except >> >>>ion: >> >>> > Unable to post agent control request as link is not available >> >>> > >> >>> > Looks like cloud agent is filling up the log, which is leading=20 >> >>> > to >> the >> >>> > disk full state. >> >>> > >> >>> > Is this a known issue? Thanks. >> >>> > >> >>> > Anirban >> >>> > >> >> >> >