Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cloudstack.apache.org
Received-SPF: pass (athena.apache.org: domain of rajesh.battala@citrix.com
 designates 103.14.252.240 as permitted sender)
From: Rajesh Battala <rajesh.battala@citrix.com>
To: "dev@cloudstack.apache.org" <dev@cloudstack.apache.org>
Subject: RE: system vm disk space issue in ACS 4.3
Thread-Topic: system vm disk space issue in ACS 4.3
Thread-Index: 
 AQHPOp5QaEPJodBYGE6fg7m1e/UB6JrWzL5A///zGgCAAJjAIP//pY4AgAMlbQCAAIREAIAKRMMAgAMMaICAAJGacA==
Date: Wed, 19 Mar 2014 14:33:02 +0000
Message-ID: 
 <8CCE9859D2CAFD45948DBF7145AFB98C067512FA@SINPEX01CL02.citrite.net>
References: 
 <CALFpzo4ahGKJssthdjPj2M1TDLJH6pXdz6X9mfVB1=3kVZGLKw@mail.gmail.com>
	<5187E1BA-E421-4771-AEA3-0EEBFB098155@juniper.net>
	<8CCE9859D2CAFD45948DBF7145AFB98C06726A7A@SINPEX01CL01.citrite.net>
	<CALFpzo61OQtwxwd0kEEJ3C+yHon6j-_K4MmkrGmfQ5Mv1u_uxQ@mail.gmail.com>
	<8CCE9859D2CAFD45948DBF7145AFB98C06727DB3@SINPEX01CL01.citrite.net>
	<CALFpzo7XocmkSfrjzXFJa4D0CS5-Q_13eT1v8egU6nvUSeAaDA@mail.gmail.com>
	<CALFpzo4sxKBDfgF7B_rymmYcoemBPUwiL=yW9sAiJMh4EFUN7A@mail.gmail.com>
	<CF43C166.3E38F%chiradeep.vittal@citrix.com>
	<CAAwm4jsr+2ncxeOeJc8U4tMfPZ78EJqT-256bc1sWVpK2T3j_w@mail.gmail.com>
 <CAAwm4jvZJ36WJ5XZK4r0Jbh7Z2Tv-VbePKPQS_Cd9YNV6L7b-w@mail.gmail.com>
In-Reply-To: 
 <CAAwm4jvZJ36WJ5XZK4r0Jbh7Z2Tv-VbePKPQS_Cd9YNV6L7b-w@mail.gmail.com>
Accept-Language: en-GB, en-US
Content-Language: en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Can you please file a bug and send your fix for review.=20

Thanks
Rajesh Battala

-----Original Message-----
From: Saurav Lahiri [mailto:saurav.lahiri@sungard.com]=20
Sent: Wednesday, March 19, 2014 7:20 PM
To: dev@cloudstack.apache.org
Subject: Re: system vm disk space issue in ACS 4.3

The problem appears to be the start function in the /etc/init.d/cloud servi=
ce for console proxy.
More specifically the following line also writes to /var/log/cloud.out

---------------------------------------------------------------------------=
-------------------------------------------------------
(cd $CLOUD_COM_HOME/systemvm; nohup ./run.sh > /var/log/cloud/cloud.out
2>&1 & )
---------------------------------------------------------------------------=
-------------------------------------------------------

since run.sh calls _run.sh and both has "set -x" enabled, in certain situat=
ions they can keep logging messages to cloud.out without being aware of the=
 settings in log4j-cloud.xml


One way to fix that could be that run.sh and _run.sh would log to cloud.out=
 only if a debug flag was set to true, otherwise only the java process woul=
d write to cloud.out and log4j would respect the settings in log4j-cloud.xm=
l


Thanks
Saurav


On Mon, Mar 17, 2014 at 8:47 PM, Saurav Lahiri <saurav.lahiri@sungard.com>w=
rote:

> Could it have  something to do with the RollingFileAppender that is=20
> being used.
> The following=20
> rollingfileappender<http://apache-logging.6191.n7.nabble.com/RollingFi
> leAppender-not-working-consistently-td8582.html> link appears to be a bit=
 outdated but they more or less describe a similar problem that we are seei=
ng?
>
>
> On our environment that is what we have seeing for sometime on console=20
> proxy.  The root filesystem goes full with the cloud.out.* occupying=20
> all the space. This happens pretty frequently and we have to regularly=20
> recycle the console proxy to resolve this issue.
>
>
> As seen below, cloud.out.2 should not have exceeded 10MB but it stands=20
> at 217MB now.
>
> drwxr-xr-x 2 root root 4.0K Mar 17 14:57 .
> drwxr-xr-x 8 root root 4.0K Mar 17 15:01 ..
> -rw-r--r-- 1 root root    0 Mar 12 18:18 api-server.log
> -rw-r--r-- 1 root root 357K Mar 17 15:06 cloud.out
> -rw-r--r-- 1 root root 2.1M Mar 17 14:56 cloud.out.1
> -rw-r--r-- 1 root root 217M Mar 17 15:06 cloud.out.2
>
> root@v-zzzz-VM:/var/log/cloud# lsof | grep cloud.out
> sleep       649 root    1w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> sleep       649 root    2w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2312 root    1w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2312 root    2w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2339 root    1w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2339 root    2w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2786 root    1w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2786 root    2w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> java       2805 root    1w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> java       2805 root    2w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> java       2805 root  116w      REG      202,1    319382     181769
> /var/log/cloud/cloud.out
> root@v-zzzz-VM:/var/log/cloud# ls -alh
>
> Thanks
> Saurav
>
>
> On Tue, Mar 11, 2014 at 7:58 AM, Chiradeep Vittal <=20
> Chiradeep.Vittal@citrix.com> wrote:
>
>> Yes, it was deliberate. I can=B9t find the discussion, but it revolved=20
>> around a security best practice of having separate partitions for /,=20
>> /swap, home directories
>>
>>
>> On 3/10/14, 11:35 AM, "Marcus" <shadowsor@gmail.com> wrote:
>>
>> >There have been several raised, actually regarding /var/log.  As for=20
>> >the system vm partitioning, it was explicitly changed from single to=20
>> >multiple partitions last year. I have no idea why, but I generally=20
>> >don't file bugs without community discussion on things that seem=20
>> >deliberate.
>> >
>> >On Sat, Mar 8, 2014 at 11:32 AM, Marcus <shadowsor@gmail.com> wrote:
>> >> Yeah, I've just seen on busy systems where even with log rotation=20
>> >>working  properly the little space left in var after OS files is=20
>> >>barely enough, for  example the conntrackd log on a busy VPC. We=20
>> >>actually ended up rolling our  own system vm, the existing image=20
>> >>has plenty of space, its just locked up in  other partitions.
>> >>
>> >> On Mar 8, 2014 8:58 AM, "Rajesh Battala"=20
>> >><rajesh.battala@citrix.com>
>> >>wrote:
>> >>>
>> >>> Yes, only 435MB is available for /var . we can increase the space
>> also.
>> >>> But we need to find out the root cause which services are causing=20
>> >>>the /var  to fill up.
>> >>> Can you please find out and post which log files are taking up=20
>> >>>more space  in /var
>> >>>
>> >>> Thanks
>> >>> Rajesh Battala
>> >>>
>> >>> -----Original Message-----
>> >>> From: Marcus [mailto:shadowsor@gmail.com]
>> >>> Sent: Saturday, March 8, 2014 8:19 PM
>> >>> To: dev@cloudstack.apache.org
>> >>> Subject: RE: system vm disk space issue in ACS 4.3
>> >>>
>> >>> Perhaps there's a new service. I know in the past we've seen=20
>> >>>issues with  this , specifically the conntrackd log. I think the=20
>> >>>cloud logs weren't  getting rolled either, but I thought it was=20
>> >>>all fixed.
>> >>>
>> >>> There's also simply not a ton of space on /var, I wish we would=20
>> >>>go back to  just having one partition because it orphans lots of=20
>> >>>free space in other  filesystems.
>> >>> On Mar 8, 2014 12:37 AM, "Rajesh Battala"=20
>> >>><rajesh.battala@citrix.com>
>> >>> wrote:
>> >>>
>> >>> > AFAIK, log roation is enabled in the systemvm.
>> >>> > Can you check whether the logs are getting zipped .?
>> >>> >
>> >>> > -----Original Message-----
>> >>> > From: Anirban Chakraborty [mailto:abchak@juniper.net]
>> >>> > Sent: Saturday, March 8, 2014 12:46 PM
>> >>> > To: dev@cloudstack.apache.org
>> >>> > Subject: system vm disk space issue in ACS 4.3
>> >>> >
>> >>> > Hi All,
>> >>> >
>> >>> > I am seeing system vm disk has no space left after running for=20
>> >>> > few
>> >>>days.
>> >>> > Cloudstack UI shows the agent in v-2-VM in alert state, while=20
>> >>> > agent state of s-1-VM shows blank (hyphen in the UI).
>> >>> > Both the system vms are running and ssh-able from the host. The=20
>> >>> > log
>> >>>in
>> >>> > s-1-Vm shows following errors:
>> >>> >
>> >>> > root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.*
>> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left=20
>> >>> > on device
>> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left=20
>> >>> > on device
>> >>> >
>> >>> > whereas logs in v-1-VM shows
>> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left=20
>> >>> > on device
>> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left=20
>> >>> > on device
>> >>> > /var/log/cloud/cloud.out.3:07:18:00,547  INFO
>> CSExceptionErrorCode:87
>> >>> > - Could not find exception:
>> >>> > com.cloud.exception.AgentControlChannelException
>> >>> > in error code list for exceptions
>> >>> >
>> >>> >
>>
>> >>>/var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChannel
>> >>>Except
>> >>>ion:
>> >>> > Unable to post agent control request as link is not available
>> >>> >
>> >>> > Looks like cloud agent is filling up the log, which is leading=20
>> >>> > to
>> the
>> >>> > disk full state.
>> >>> >
>> >>> > Is this a known issue? Thanks.
>> >>> >
>> >>> > Anirban
>> >>> >
>>
>>
>>
>