geronimo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Dillon <jason.dil...@gmail.com>
Subject Re: Continuous TCK Testing
Date Sat, 18 Oct 2008 08:25:20 GMT
Before when I had those 2 build machines running in my apartment in  
berkeley, I setup one xen domain specifically for running monitoring  
tools, and installed cacti on it, and then setup snmpd on each of the  
other machines configured to allow access from the xen monitoring  
domain.  This provided a very detail easy to grok monitoring console  
for the build agents.

--jason


On Oct 18, 2008, at 5:58 AM, Jay D. McHugh wrote:

> Hey Kevan,
>
> Regarding monitoring...
>
> I managed to run into xenmon.py.
>
> It appears to log the system utilization for the whole box as well  
> as each
> VM to log files in 'your' home directory if you specify the '-n' flag.
>
> Here is the help page for xenmon.py:
> jaydm@phoebe:~$ sudo python /usr/sbin/xenmon.py -h
> Usage: xenmon.py [options]
>
> Options:
>  -h, --help            show this help message and exit
>  -l, --live            show the ncurses live monitoring frontend  
> (default)
>  -n, --notlive         write to file instead of live monitoring
>  -p PREFIX, --prefix=PREFIX
>                        prefix to use for output files
>  -t DURATION, --time=DURATION
>                        stop logging to file after this much time has  
> elapsed
>                        (in seconds). set to 0 to keep logging  
> indefinitely
>  -i INTERVAL, --interval=INTERVAL
>                        interval for logging (in ms)
>  --ms_per_sample=MSPERSAMPLE
>                        determines how many ms worth of data goes in  
> a sample
>  --cpu=CPU             specifies which cpu to display data for
>  --allocated           Display allocated time for each domain
>  --noallocated         Don't display allocated time for each domain
>  --blocked             Display blocked time for each domain
>  --noblocked           Don't display blocked time for each domain
>  --waited              Display waiting time for each domain
>  --nowaited            Don't display waiting time for each domain
>  --excount             Display execution count for each domain
>  --noexcount           Don't display execution count for each domain
>  --iocount             Display I/O count for each domain
>  --noiocount           Don't display I/O count for each domain
>
> And here is some sample output:
>
> jaydm@phoebe:~$ cat log-dom0.log
> # passed cpu dom cpu(tot) cpu(%) cpu/ex allocated/ex blocked(tot)  
> blocked(%) blocked/io waited(tot) waited(%) waited/ex ex/s io(tot)  
> io/ex
> 0.000 0 0 2.086 0.000 38863.798 30000000.000 154.177 0.000 0.000  
> 0.504 0.000 9383.278 0.000 0.000 0.000
> 2.750 1 0 2.512 0.000 53804.925 30000000.000 153.217 0.000 0.000  
> 0.316 0.000 6774.813 0.000 0.000 0.000
> 4.063 2 0 2.625 0.000 59959.942 30000000.000 153.886 0.000 0.000  
> 0.173 0.000 3939.987 0.000 0.000 0.000
> 5.203 3 0 3.020 0.000 47522.430 30000000.000 171.834 0.000 0.000  
> 0.701 0.000 11031.759 0.000 0.000 0.000
> 6.403 4 0 2.130 0.000 39256.871 30000000.000 171.870 0.000 0.000  
> 0.617 0.000 11378.014 0.000 0.000 0.000
> 9.230 6 0 0.836 0.000 53962.875 30000000.000 57.287 0.000 0.000  
> 0.038 0.000 2450.488 0.000 0.000 0.000
> 10.305 7 0 2.171 0.000 46119.247 30000000.000 154.008 0.000 0.000  
> 0.367 0.000 7804.444 0.000 0.000 0.000
> 11.518 0 0 15931680.822 1.593 54019.023 30000000.000 889706824.191  
> 88.971 0.000 2630292.436 0.263 8918.446 294.927 0.000 0.000
> 1009.216 1 0 7687035.544 0.769 53822.548 30000000.000 473101345.004  
> 47.310 0.000 864964.568 0.086 6056.248 142.822 0.000 0.000
> 1010.199 2 0 20502235.224 2.050 61655.293 30000000.000 979188763.754  
> 97.919 0.000 4279443600.516 427.944 12869345.608 332.530 0.000 0.000
> 1011.239 3 0 13634865.766 1.363 45934.870 30000000.000 985479796.363  
> 98.548 0.000 1593248.596 0.159 5367.538 296.830 0.000 0.000
> 1012.312 4 0 18228049.181 1.823 61242.790 30000000.000 979822521.396  
> 97.982 0.000 2593364.560 0.259 8713.213 297.636 0.000 0.000
> 1013.338 5 0 9891757.872 0.989 65386.046 30000000.000 571275802.794  
> 57.128 0.000 357431.539 0.036 2362.678 151.282 0.000 0.000
>
> We could probably add a cron job to grab a single sample every X  
> minutes
> and append them together to build up a utilization history (rather  
> than
> simply running it all of the time).
>
> I just tried to get a single sample and the smallest run I could get  
> was
> about three seconds with four samples taken.
>
> Or, I also tried xentop in batch mode:
>
> jaydm@phoebe:~$ sudo xentop -b -i 1
>      NAME  STATE   CPU(sec) CPU(%)     MEM(k) MEM(%)  MAXMEM(k)  
> MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS   VBD_OO   VBD_RD    
> VBD_WR SSID
>  Domain-0 -----r     430567    0.0    3939328   23.5   no  
> limit       n/a     8    4        0        0    0        0         
> 0        0 2149631536
>     tck01 --b---     750449    0.0    3145728   18.8    3145728       
> 18.8     2    1   483054  1855493    1       15   655667  8445829  
> 2149631536
>     tck02 --b---    1101273    0.0    3145728   18.8    3145728       
> 18.8     2    1   367792  1773407    1       83  1131709  9030663  
> 2149631536
>     tck03 -----r     144552    0.0    3145728   18.8    3145728       
> 18.8     2    1   188115  2370069    1        6   370431  1290683  
> 2149631536
>     tck04 --b---     103742    0.0    3145728   18.8    3145728       
> 18.8     2    1   286936  2341941    1        7   381523  1484476  
> 2149631536
>
> It looks to me like having a cron job that periodically ran xentop and
> build up a history would be the best option (without digging through
> a ton of different specialized monitor packages).
>
>
> Jay
>
> Kevan Miller wrote:
>>
>> On Oct 10, 2008, at 11:29 AM, Kevan Miller wrote:
>>
>>>
>>> On Oct 10, 2008, at 11:25 AM, Kevan Miller wrote:
>>>
>>>>
>>>> On Oct 8, 2008, at 11:56 PM, Kevan Miller wrote:
>>>>
>>>>>
>>>>> On Oct 8, 2008, at 4:31 PM, Jason Warner wrote:
>>>>>
>>>>>> We had some suggestions earlier for some alternate means of
>>>>>> implementing this (Hudson, Conitnuum, etc...).  Now that we've  
>>>>>> had
>>>>>> Jason Dillon provide an overview of what we had in place before,
>>>>>> does anyone have thoughts on what we should go with?  I'm  
>>>>>> thinking
>>>>>> we should stick with the AHP based solution.  It will need to be
>>>>>> updated most likely, but it's been tried and tested and shown to
>>>>>> meet our needs.  I'm wondering, though, why we stopped using it
>>>>>> before.  Was there a specific issue we're going to have to deal
>>>>>> with again?
>>>>>
>>>>> IIRC, the overwhelming reason we stopped using it before was  
>>>>> because
>>>>> of hosting issues -- spotty networking, hardware failures, poor  
>>>>> colo
>>>>> support, etc. We shouldn't have any of these problems, now. If  
>>>>> we do
>>>>> run into problems, they should now be fixable. I have no reason to
>>>>> favor Hudson/Continuum over AHP. So, if we can get AHP running
>>>>> easily, I'm all for it. There's only one potential issue, that I'm
>>>>> aware of.
>>>>>
>>>>> We previously had an Open Source License issued for our use of
>>>>> Anthill. Here's some of the old discussion --
>>>>> http://www.nabble.com/Geronimo-build-automation-status-(longish)-tt7649902.html#a7649902
>>>>>
>>>>>
>>>>> Although the board was aware of our usage of AntHill, since we
>>>>> weren't running AntHill on ASF hardware, I'm not sure the license
>>>>> was fully vetted by Infra. I don't see any issues, but I'll want  
>>>>> to
>>>>> run this by Infra.
>>>>>
>>>>> Jason D, will the existing license cover the version of AntHill  
>>>>> that
>>>>> we'll want to use? I'll run the license by Infra and will also
>>>>> describe the issue for review by the Board, in our quarterly  
>>>>> report.
>>
>> Heh. Oops. Just noticed that I sent the following to myself and not  
>> the
>> dev list. I hate when I do that...
>>
>>>
>>> One more thing... from emails on infrastructure@apache.org looks  
>>> like
>>> Infra is cool with us running Anthill on selene and phoebe.
>>>
>>> BTW, am planning on installing monitoring software over the  
>>> weekend on
>>> selene and phoebe. The board is interested in monitoring our  
>>> usage...
>>
>>
>> Also, we now have a new AntHill license for our use. I've placed the
>> license in ~kevan/License2.txt on phoebe and selene. This license  
>> should
>> only be used for Apache use. So, should not be placed in a public
>> location (e.g.  our public svn tree).
>>
>> Regarding monitoring software -- I haven't been able to get it to  
>> work
>> yet. vmstat/iostat don't work, unless you run on every virtual  
>> machine.
>> 'xm top' gathers data on all domains, however, doesn't make the data
>> easy to tuck away in a log file/available to snmp... Advice  
>> welcome...
>>
>> --kevan
>>


Mime
View raw message