incubator-olio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasileios Kontorinis <bkontori...@gmail.com>
Subject Re: Stuck with olio.
Date Sun, 17 Jan 2010 00:49:48 GMT
Akara hi again,
   Below I have comments on your suggestions and at the end some bonus
questions... Thanks again.

2010/1/13 Akara Sucharitakul <Akara.Sucharitakul@sun.com>

> With your permission, I'd like to copy the Olio and Faban user aliases
> going forward. I feel it will help a much wider audience. Please see below
> for answers/comments:
>
> Sure. I cced olio user alias. I am not sure which is the right faban list.


> Vasileios Kontorinis wrote:
>
>> Akara hi,
>>   I am a grad student at UCSD and I use Olio for a research project where
>> we want to measure olio performance under live virtual machine migration. We
>> use ubuntu 8.04 on nehalem servers.
>> I have co ed the last version of olio from the online svn repository and
>> downloaded the last version of faban (faban-kit-101509.tar.gz <
>> http://faban.sunsource.net/nightly/faban-kit-101509.tar.gz>)
>>
>
> 101509 is fairly recent. But the latest on the web site is 111109 (Faban
> 1.0). There were just bug fixes between those releases.


I have upgraded to Faban 1.0, still using olio1.0 though ( the release of
2.0 was announced, will switch to it if I run into bugs that have been
fixed)


>
>
>
>> So far, I employed a bunch of hacks to get most of it to work and I am
>> almost there. In the process I got a bunch of questions.
>>
>> Questions (some of them might be just faban related, not olio so bear with
>> me):
>> 1) In there any way to deploy OlioDriver.jar through the command line?
>> Firefox through ssh forwarding is dead slow and I d rather avoid if I can.
>>
>
> Just drop the jar into faban/benchmarks/ and it will deploy itself. This is
> documented at
> http://faban.sunsource.net/1.0/docs/guide/harnessdev/deploybenchmark.htmlunder "Alternate
Deployment Methods."
>
>
>  2) The services ApacheHttpdService, MemcachedService, MySQLService that
>> come with Faban should be deployed before running Olio?
>>    I was getting some very weird errors. e.g.
>>
>
> Yes, you should. Olio will search for those.
>
> Done


>
>
>> 03:50:27:WARNING:UIDriverAgent[0]: Forcefully terminating benchmark run
>> 03:50:27:WARNING:UIDriverAgent[0]: 25 threads forcefully terminated.
>> java.lang.Throwable: Stack of non-terminating thread.
>>    at java.net.SocketInputStream.socketRead0 (null)
>>    at java.net.SocketInputStream.read (129)
>>    at java.io.FilterInputStream.read (116)
>>    at com.sun.faban.driver.transport.util.TimedInputStream.read (139)
>>    at java.io.BufferedInputStream.fill (218)
>>    at java.io.BufferedInputStream.read (237)
>>    at org.apache.commons.httpclient.HttpParser.readRawLine (78)
>>    at org.apache.commons.httpclient.HttpParser.readLine (106)
>>    at org.apache.commons.httpclient.HttpConnection.readLine (1116)
>>    at org.apache.commons.httpclient.HttpMethodBase.readStatusLine (1973)
>>    at org.apache.commons.httpclient.HttpMethodBase.readResponse (1735)
>>    at org.apache.commons.httpclient.HttpMethodBase.execute (1098)
>>    at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry
>> (398)
>>    at org.apache.commons.httpclient.HttpMethodDirector.executeMethod (171)
>>    at org.apache.commons.httpclient.HttpClient.executeMethod (397)
>>    at org.apache.commons.httpclient.HttpClient.executeMethod (323)
>>    at com.sun.faban.driver.transport.hc3.ApacheHC3Transport.fetchURL (529)
>>    at com.sun.faban.driver.transport.hc3.ApacheHC3Transport.fetchURL (552)
>>    at org.apache.olio.workload.driver.UIDriver.doHomePage (355)
>>    at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
>>    at sun.reflect.NativeMethodAccessorImpl.invoke (39)
>>    at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
>>    at java.lang.reflect.Method.invoke (597)
>>    at com.sun.faban.driver.engine.TimeThread.doRun (169)
>>    at com.sun.faban.driver.engine.AgentThread.run (202)
>>
>> and afterwards the master was waiting for threads to join for ever... (I
>> attached gdb to verify that something was wrong) and hence I had to kill the
>> benchmark.
>>
>
> These threads are hanging reading the server responses, that never came.
>
>
Building the services from Faban probably fixes it.



>
>
>> In the Olio log there are WARNINGS  complaining about not deploying those.
>> After building those and manually copying them to /faban/services (ant
>> deploy did not place them there... :-(  )
>>
>
> Yes. But ant deploy should get them there. If not, can you please let me
> know the ant messages?


Ant was deploying them indeed. I had a mistake in building.properties.
I had:  faban.url=http://<hostname>:9980/   instead of  faban.url=
http://localhost:9980/
After I changed that it started working...


>
>  it worked. (mostly worked)
>>
>> 3) I still have warnings like:
>> 01:38:08:INFO:Time difference to host olio-web is 269 ms. Attempting to
>> set clock.
>> 01:38:08:INFO:Time difference to host olio-db is 263 ms. Attempting to set
>> clock.
>>
>
> These two are OK. Just trying to do a clock sync between the systems.
>
>
>  01:38:08:WARNING:olio-web wakeup-before time reached 700ms limit. System
>> is too busy. Giving up.
>>
>
> This is one of Faban's clock-setting calibrations. If the system is too
> busy or you run on some virtualization architectures, the lag time between
> an intended end of sleep and the actual time when the thread really wakes up
> (gets scheduled/executed) is too high, calibrations will fail.
>
>
>  01:38:08:INFO:Time difference to host olio-mem is 262 ms. Attempting to
>> set clock.
>> 01:38:10:WARNING:olio-db wakeup-before time reached 700ms limit. System is
>> too busy. Giving up.
>> 09:38:09:WARNING:[date, -u, 011309382010.10]
>> stderr:
>> date: cannot set date: Operation not permitted
>> 09:38:09:WARNING:Error on "[date, -u, 011309382010.10]" command trying to
>> set the date. Exit value: 1
>> 09:38:10:WARNING:[date, -u, 011309382010.11]
>> stderr:
>> date: cannot set date: Operation not permitted
>> 09:38:10:WARNING:Error on "[date, -u, 011309382010.11]" command trying to
>> set the date. Exit value: 1
>> 09:38:09:WARNING:[date, -u, 011309382010.10]
>> stderr:
>> date: cannot set date: Operation not permitted
>> 09:38:09:WARNING:Error on "[date, -u, 011309382010.10]" command trying to
>> set the date. Exit value: 1
>> 09:38:10:WARNING:[date, -u, 011309382010.11]
>> stderr:
>> date: cannot set date: Operation not permitted
>>
>> Leting faban change the vm clock sounds from the beginning a bad idea.
>>
>
> OK. So it is xen. Yes, this is what Faban is trying to solve. You can
> certainly turn it off. Please see:
>   http://faban.sunsource.net/1.0/docs/howd services ApacheHttpdService,
> MemcachedService, MySQLService that come with Faban should be deployed
> before running Olio?
>    I was gettingoi/physclocksync.html<http://faban.sunsource.net/1.0/docs/howdoi/physclocksync.html>
>
>
I added the  <fh:timeSync>false<fh:timeSync> in my run.xml file ( btw in the
link above there is a mistake :  <fh:timeSync>false</fh:timeSync> is
correct, the second <fh:timeSync> needs a closing tag, the "/" is missing)
that made the warnings go away.



>
>  Unfortunately, xen is really bad in maintaining an accurate clock. As a
>> result there is usually time difference between the different virtual
>> machines
>> of more than 10ms. I went over the setTime function in Faban source
>> (/faban/com/sun/faban/harness/agent/CmdAgentImpl.java), it's big and ugly
>> (very ugly)
>>
>
> Thanks for the compliments! I think you mean CmdService.setClockTask. Time
> sensitive code ain't pretty. It is the complexities dealing with the clock
> and trying to achieve good accuracy. If you think you can simplify this, I'm
> listening (without loosing the accuracy, of course). In comparison,
> CmdAgentImpl has nothing.
>
>
Yes, you r right it is CmdService.setClockTask. The previous email was
composed at 3am ... :-)
I am still a little confused.  the setClockTask is used to set the clock so
that all the machines are synchronized with master. From what you mentioned
the physical clock sync is only used for the logs.
Why do we need to do that since 1) it requires root privileges (which might
not be always available) 2) I could imagine an alternative that uses deltas
from the actual physical clock without having to set it.
( I am probably missing something... :-)



>  Why there is this strict requirement for 10ms difference? Any ideas?
>>
>
> It is easily achievable in most cases. May not be true for VMs.
>
> On some VM architectures, the OS however does not get scheduled till way
> after that, thus causing problems. You may be able to measure performance on
> those VMs. But you don't want to use such VMs to be a driver. Your response
> time measurements will be way off.
>
> The physical clock sync is not really rigorous. And you can turn it off. It
> is more to keep the systems in good time sync. If your VM stands in the way,
> just turn it off. The driver's virtual clock sync is much more picky in
> comparison. This is because the start time for the steady state should be
> the same (with a very small tolerance) no matter how many drivers are
> driving. Otherwise the measurement period won't be the same when viewed from
> different drivers and the results won't be reliable.
>
>
>  Even with ntp it's hard to provide the 10ms guarantee.
>>
>
> That's why we don't use ntp ;-)


Just out of curiosity, the physical clocks are set only once at the
beginning (right?), therefore for long runs the 10ms difference will not be
guaranteed. Nope? Especially under VMs I 've seen significant clock
difference withing a few minutes.
At least ntp can periodically resync (of course doing so, might screw up the
logs with time going backwards etc)


>
>
>  I am thinking of modifying this function to always return that the time
>> difference is less than 10ms (so that I do not have to wait all the time for
>> the timeouts.)
>>
>
> Why bother. Don't like it, just turn it off. It has good use in most
> configurations we're dealing with. And, it avoids ntp inaccuracies.
>
>
>  Will this break anything in Olio?
>>
>
> Nope. Except the times in your logs will appear out of sequence. They rely
> on the local time on the originating systems.
>
>
>> 4) Warning like:
>> 09:39:48:WARNING:Image at
>> http://olio-web:80/fileService.php?cache=false&file=e168t.jpg <
>> http://olio-web:80/fileService.php?cache=false&file=e168t.jpg> size of
>> 249 bytes is too small. Image may not exist
>> can be ignored, right?
>>
>
> Well, something is wrong. We don't have images that small. Check whether
> e168t.jpg is really that small. That's why we have that warning.
>
>

It kinda funny, my problem was that I had the olio webkit version installed
and then I downloaded the version from the online svn repository. I built
the driver but forgot to update the webpage for my apache server.  Which
as expected was the source for many of my issues.




>
>
>> 5) Last and most important.
>> I can run the benchmark and all the operation succeed but for login.
>> I get a bunch of:
>>
>> 09:39:25:WARNING:UIDriverAgent[0].2.doLogin: Found login prompt at index
>> 2926, Login as at786o08x, 2178 failed.
>> Note: Error not counted in result.
>> Either transaction start or end time is not within steady state.
>> java.lang.RuntimeException: Found login prompt at index 2926, Login as
>> at786o08x, 2178 failed.
>>    at org.apache.olio.workload.driver.UIDriver.doLogin (404)
>>    at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
>>    at sun.reflect.NativeMethodAccessorImpl.invoke (39)
>>    at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
>>    at java.lang.reflect.Method.invoke (597)
>>    at com.sun.faban.driver.engine.TimeThread.doRun (169)
>>    at com.sun.faban.driver.engine.AgentThread.run (202)
>>
>> Any ideas? I do get
>>
>
> You likely have cookie issues. It can't seem to hold on to a session.
>
>
>
Well there was a permission issue with the http_session dir. I could not
right to it. chmod 777 it fixed this.


>
>
>
>> (I ve found online:
>> http://www.mail-archive.com/olio-dev@incubator.apache.org/msg00647.htmlwhich is similar,
but when I added
>>
>> com.sun.faban.driver.transport.sunhttp.ThreadCookieHandler.level=FINER  in
>> build.properties
>> I did not see any cookie related warnings. Those should appear in the olio
>> run log or the apache log, right? Am i just looking at the wrong place? )
>>
>
> Yes, that's applicable only to the Sun Http Transport. The version of Olio
> you're using is based on the Apache Http Transport (Apache HttpClient 3.1).
> The ThreadCookieHandler is not used for the Apache transport and that's why
> you don't see any logs. Try upgrade to Faban 1.0 before looking at other
> things.
>
>
>>
>> It's a long email I know. Your feedback would be most appreciated.
>>
>> -Regards
>> -------------------------------------------------------------------
>> Kontorinis Vasileios
>> Phd student, University of California San Diego
>> San Diego, CA 92122
>> Cell. phone: (858) 717 6899
>> bkontorinis@gmail.com <mailto:bkontorinis@gmail.com>, vkontori@ucsd.edu<mailto:
>> vkontori@ucsd.edu>
>> -------------------------------------------------------------------\
>>
>
> Thanks for all the questions/comments.
>
> -Akara
>
>

And now some more questions/ comments:
1) I get the following error:

15:13:05:SEVERE:CmdService: Getting - exception reading
/usr/data/olio-db.err
java.io.FileNotFoundException: File /usr/data/olio-db.err does not exist.
    at com.sun.faban.common.FileTransfer.<init> (70)
    at com.sun.faban.harness.agent.FileAgentImpl.get (315)
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
    at sun.reflect.NativeMethodAccessorImpl.invoke (39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
    at java.lang.reflect.Method.invoke (597)
    at sun.rmi.server.UnicastServerRef.dispatch (305)
    at sun.rmi.transport.Transport$1.run (159)
    at java.security.AccessController.doPrivileged (null)
    at sun.rmi.transport.Transport.serviceCall (155)
    at sun.rmi.transport.tcp.TCPTransport.handleMessages (535)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0 (790)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run (649)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask (885)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run (907)
    at java.lang.Thread.run (619)
    at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer (255)
    at sun.rmi.transport.StreamRemoteCall.executeCall (233)
    at sun.rmi.server.UnicastRef.invoke (142)
    at com.sun.faban.harness.agent.FileAgentImpl_Stub.get (null)
    at com.sun.faban.harness.engine.CmdService.get (1334)
    at com.sun.faban.harness.RunContext.getFile (346)
    at com.sun.services.MySQLService.getLogs (197)
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
    at sun.reflect.NativeMethodAccessorImpl.invoke (39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
    at java.lang.reflect.Method.invoke (597)
    at com.sun.faban.harness.util.Invoker.invoke (98)
    at com.sun.faban.harness.services.ServiceWrapper.getLogs (200)
    at com.sun.faban.harness.services.ServiceManager.getLogs (642)
    at com.sun.faban.harness.engine.GenericBenchmark.start (323)
    at com.sun.faban.harness.engine.RunDaemon.run (338)
    at java.lang.Thread.run (619)
15:13:05:WARNING:Could not copy /usr/data/olio-db.err to
/home/gdhiman/faban.1.0/faban/output/OlioDriver.2D/mysql_err.log.olio-db

Apparently something is misconfigured in my db-server. Any ideas?

2) I get the following error:
15:13:16:WARNING:[/home/gdhiman/faban.1.0/faban/bin/fenxi, process,
/home/gdhiman/faban.1.0/faban/output/OlioDriver.2D/,
/home/gdhiman/faban.1.0/faban/output/OlioDriver.2D//post/, OlioDriver.2D]
stderr:
Error in executing perl
/home/gdhiman/faban.1.0/faban/master/webapps/fenxi/txt2db/mpstat.pl
Error in executing perl
/home/gdhiman/faban.1.0/faban/master/webapps/fenxi/txt2db/mpstat.pl

Actually I traced back this one. The problem is the difference in output
format of the Sun's mpstat and default GNU mpstat.
This is my output of my mpstat:

gdhiman@olio-client00:~/faban.1.0/faban/output/OlioDriver.2D$ mpstat 1
Linux 2.6.18.8-xen (olio-client00)     01/16/10

16:25:06     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal
%idle    intr/s
16:25:07     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00
100.00     52.48
16:25:08     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00
100.00     50.50
16:25:09     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00
100.00     79.21
16:25:10     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00
100.00     45.54
16:25:11     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00
100.00     55.45

The first line as well as the time at the beginning of each entry messing up
the parsing at mpstat.pl. (also the fields are different)   Any plans to
support this??

3) Scaling questions.
- So far I did not have a single experiment passing. Some are pretty close
with only one metric check failing.

Average images loaded per Home Page2.79>= 3
FAILED
Any ideas? Is it the case that the disc is not fast enough? I am just using
the local filesystem for the filestore.

- As I double the number of concurrent users I observe linear scaling in the
thoughput.
Con Users         Throughput
 25                        4.967
 50                       10.06
100                      19.375
200                      40.21
400                      75.818
800                       0.383
1000                     0.483

The linear scaling stops for 400 concurrent users ( only one agent).
Actually it would be exactly linear (value of ~80) but almost half of the
login operations failed. I am looking into it.
Any insights on what might be the first thing failing?

For the 800 and 1000 experiments there are no failed operations logged. It
looks like those are being discarded... (?)

Bonus question:
In the runtime statistics
<runtimeStats enabled="true">
         <interval>30</interval>
 </runtimeStats>

only the 90% response time is reported. Is there an easy way to also report
the 99% ? ( or I need to add code for that?)


Thanks a lot again in advance.
-VK

Mime
View raw message