incubator-olio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shanti Subramanyam <shanti.subraman...@gmail.com>
Subject Re: Stuck with olio.
Date Sun, 17 Jan 2010 03:32:29 GMT
I would really recommend that you migrate to Olio 0.2. In addition to bug
fixes, there are some major features changes in it. See Olio 0.2
released<http://perfwork.wordpress.com/2010/01/13/olio-0-2-relesed/>


Shanti

On Sat, Jan 16, 2010 at 4:49 PM, Vasileios Kontorinis <bkontorinis@gmail.com
> wrote:

> Akara hi again,
>    Below I have comments on your suggestions and at the end some bonus
> questions... Thanks again.
>
> 2010/1/13 Akara Sucharitakul <Akara.Sucharitakul@sun.com>
>
>> With your permission, I'd like to copy the Olio and Faban user aliases
>> going forward. I feel it will help a much wider audience. Please see below
>> for answers/comments:
>>
>> Sure. I cced olio user alias. I am not sure which is the right faban list.
>
>
>> Vasileios Kontorinis wrote:
>>
>>> Akara hi,
>>>   I am a grad student at UCSD and I use Olio for a research project where
>>> we want to measure olio performance under live virtual machine migration. We
>>> use ubuntu 8.04 on nehalem servers.
>>> I have co ed the last version of olio from the online svn repository and
>>> downloaded the last version of faban (faban-kit-101509.tar.gz <
>>> http://faban.sunsource.net/nightly/faban-kit-101509.tar.gz>)
>>>
>>
>> 101509 is fairly recent. But the latest on the web site is 111109 (Faban
>> 1.0). There were just bug fixes between those releases.
>
>
> I have upgraded to Faban 1.0, still using olio1.0 though ( the release of
> 2.0 was announced, will switch to it if I run into bugs that have been
> fixed)
>
>
>>
>>
>>
>>> So far, I employed a bunch of hacks to get most of it to work and I am
>>> almost there. In the process I got a bunch of questions.
>>>
>>> Questions (some of them might be just faban related, not olio so bear
>>> with me):
>>> 1) In there any way to deploy OlioDriver.jar through the command line?
>>> Firefox through ssh forwarding is dead slow and I d rather avoid if I can.
>>>
>>
>> Just drop the jar into faban/benchmarks/ and it will deploy itself. This
>> is documented at
>> http://faban.sunsource.net/1.0/docs/guide/harnessdev/deploybenchmark.htmlunder "Alternate
Deployment Methods."
>>
>>
>>  2) The services ApacheHttpdService, MemcachedService, MySQLService that
>>> come with Faban should be deployed before running Olio?
>>>    I was getting some very weird errors. e.g.
>>>
>>
>> Yes, you should. Olio will search for those.
>>
>> Done
>
>
>>
>>
>>> 03:50:27:WARNING:UIDriverAgent[0]: Forcefully terminating benchmark run
>>> 03:50:27:WARNING:UIDriverAgent[0]: 25 threads forcefully terminated.
>>> java.lang.Throwable: Stack of non-terminating thread.
>>>    at java.net.SocketInputStream.socketRead0 (null)
>>>    at java.net.SocketInputStream.read (129)
>>>    at java.io.FilterInputStream.read (116)
>>>    at com.sun.faban.driver.transport.util.TimedInputStream.read (139)
>>>    at java.io.BufferedInputStream.fill (218)
>>>    at java.io.BufferedInputStream.read (237)
>>>    at org.apache.commons.httpclient.HttpParser.readRawLine (78)
>>>    at org.apache.commons.httpclient.HttpParser.readLine (106)
>>>    at org.apache.commons.httpclient.HttpConnection.readLine (1116)
>>>    at org.apache.commons.httpclient.HttpMethodBase.readStatusLine (1973)
>>>    at org.apache.commons.httpclient.HttpMethodBase.readResponse (1735)
>>>    at org.apache.commons.httpclient.HttpMethodBase.execute (1098)
>>>    at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry
>>> (398)
>>>    at org.apache.commons.httpclient.HttpMethodDirector.executeMethod
>>> (171)
>>>    at org.apache.commons.httpclient.HttpClient.executeMethod (397)
>>>    at org.apache.commons.httpclient.HttpClient.executeMethod (323)
>>>    at com.sun.faban.driver.transport.hc3.ApacheHC3Transport.fetchURL
>>> (529)
>>>    at com.sun.faban.driver.transport.hc3.ApacheHC3Transport.fetchURL
>>> (552)
>>>    at org.apache.olio.workload.driver.UIDriver.doHomePage (355)
>>>    at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
>>>    at sun.reflect.NativeMethodAccessorImpl.invoke (39)
>>>    at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
>>>    at java.lang.reflect.Method.invoke (597)
>>>    at com.sun.faban.driver.engine.TimeThread.doRun (169)
>>>    at com.sun.faban.driver.engine.AgentThread.run (202)
>>>
>>> and afterwards the master was waiting for threads to join for ever... (I
>>> attached gdb to verify that something was wrong) and hence I had to kill the
>>> benchmark.
>>>
>>
>> These threads are hanging reading the server responses, that never came.
>>
>>
> Building the services from Faban probably fixes it.
>
>
>
>>
>>
>>> In the Olio log there are WARNINGS  complaining about not deploying
>>> those. After building those and manually copying them to /faban/services
>>> (ant deploy did not place them there... :-(  )
>>>
>>
>> Yes. But ant deploy should get them there. If not, can you please let me
>> know the ant messages?
>
>
> Ant was deploying them indeed. I had a mistake in building.properties.
> I had:  faban.url=http://<hostname>:9980/   instead of  faban.url=
> http://localhost:9980/
> After I changed that it started working...
>
>
>>
>>  it worked. (mostly worked)
>>>
>>> 3) I still have warnings like:
>>> 01:38:08:INFO:Time difference to host olio-web is 269 ms. Attempting to
>>> set clock.
>>> 01:38:08:INFO:Time difference to host olio-db is 263 ms. Attempting to
>>> set clock.
>>>
>>
>> These two are OK. Just trying to do a clock sync between the systems.
>>
>>
>>  01:38:08:WARNING:olio-web wakeup-before time reached 700ms limit. System
>>> is too busy. Giving up.
>>>
>>
>> This is one of Faban's clock-setting calibrations. If the system is too
>> busy or you run on some virtualization architectures, the lag time between
>> an intended end of sleep and the actual time when the thread really wakes up
>> (gets scheduled/executed) is too high, calibrations will fail.
>>
>>
>>  01:38:08:INFO:Time difference to host olio-mem is 262 ms. Attempting to
>>> set clock.
>>> 01:38:10:WARNING:olio-db wakeup-before time reached 700ms limit. System
>>> is too busy. Giving up.
>>> 09:38:09:WARNING:[date, -u, 011309382010.10]
>>> stderr:
>>> date: cannot set date: Operation not permitted
>>> 09:38:09:WARNING:Error on "[date, -u, 011309382010.10]" command trying to
>>> set the date. Exit value: 1
>>> 09:38:10:WARNING:[date, -u, 011309382010.11]
>>> stderr:
>>> date: cannot set date: Operation not permitted
>>> 09:38:10:WARNING:Error on "[date, -u, 011309382010.11]" command trying to
>>> set the date. Exit value: 1
>>> 09:38:09:WARNING:[date, -u, 011309382010.10]
>>> stderr:
>>> date: cannot set date: Operation not permitted
>>> 09:38:09:WARNING:Error on "[date, -u, 011309382010.10]" command trying to
>>> set the date. Exit value: 1
>>> 09:38:10:WARNING:[date, -u, 011309382010.11]
>>> stderr:
>>> date: cannot set date: Operation not permitted
>>>
>>> Leting faban change the vm clock sounds from the beginning a bad idea.
>>>
>>
>> OK. So it is xen. Yes, this is what Faban is trying to solve. You can
>> certainly turn it off. Please see:
>>   http://faban.sunsource.net/1.0/docs/howd services ApacheHttpdService,
>> MemcachedService, MySQLService that come with Faban should be deployed
>> before running Olio?
>>    I was gettingoi/physclocksync.html<http://faban.sunsource.net/1.0/docs/howdoi/physclocksync.html>
>>
>>
> I added the  <fh:timeSync>false<fh:timeSync> in my run.xml file ( btw in
> the link above there is a mistake :  <fh:timeSync>false</fh:timeSync> is
> correct, the second <fh:timeSync> needs a closing tag, the "/" is missing)
> that made the warnings go away.
>
>
>
>>
>>  Unfortunately, xen is really bad in maintaining an accurate clock. As a
>>> result there is usually time difference between the different virtual
>>> machines
>>> of more than 10ms. I went over the setTime function in Faban source
>>> (/faban/com/sun/faban/harness/agent/CmdAgentImpl.java), it's big and ugly
>>> (very ugly)
>>>
>>
>> Thanks for the compliments! I think you mean CmdService.setClockTask. Time
>> sensitive code ain't pretty. It is the complexities dealing with the clock
>> and trying to achieve good accuracy. If you think you can simplify this, I'm
>> listening (without loosing the accuracy, of course). In comparison,
>> CmdAgentImpl has nothing.
>>
>>
> Yes, you r right it is CmdService.setClockTask. The previous email was
> composed at 3am ... :-)
> I am still a little confused.  the setClockTask is used to set the clock so
> that all the machines are synchronized with master. From what you mentioned
> the physical clock sync is only used for the logs.
> Why do we need to do that since 1) it requires root privileges (which might
> not be always available) 2) I could imagine an alternative that uses deltas
> from the actual physical clock without having to set it.
> ( I am probably missing something... :-)
>
>
>
>>  Why there is this strict requirement for 10ms difference? Any ideas?
>>>
>>
>> It is easily achievable in most cases. May not be true for VMs.
>>
>> On some VM architectures, the OS however does not get scheduled till way
>> after that, thus causing problems. You may be able to measure performance on
>> those VMs. But you don't want to use such VMs to be a driver. Your response
>> time measurements will be way off.
>>
>> The physical clock sync is not really rigorous. And you can turn it off.
>> It is more to keep the systems in good time sync. If your VM stands in the
>> way, just turn it off. The driver's virtual clock sync is much more picky in
>> comparison. This is because the start time for the steady state should be
>> the same (with a very small tolerance) no matter how many drivers are
>> driving. Otherwise the measurement period won't be the same when viewed from
>> different drivers and the results won't be reliable.
>>
>>
>>  Even with ntp it's hard to provide the 10ms guarantee.
>>>
>>
>> That's why we don't use ntp ;-)
>
>
> Just out of curiosity, the physical clocks are set only once at the
> beginning (right?), therefore for long runs the 10ms difference will not be
> guaranteed. Nope? Especially under VMs I 've seen significant clock
> difference withing a few minutes.
> At least ntp can periodically resync (of course doing so, might screw up
> the logs with time going backwards etc)
>
>
>>
>>
>>  I am thinking of modifying this function to always return that the time
>>> difference is less than 10ms (so that I do not have to wait all the time for
>>> the timeouts.)
>>>
>>
>> Why bother. Don't like it, just turn it off. It has good use in most
>> configurations we're dealing with. And, it avoids ntp inaccuracies.
>>
>>
>>  Will this break anything in Olio?
>>>
>>
>> Nope. Except the times in your logs will appear out of sequence. They rely
>> on the local time on the originating systems.
>>
>>
>>> 4) Warning like:
>>> 09:39:48:WARNING:Image at
>>> http://olio-web:80/fileService.php?cache=false&file=e168t.jpg <
>>> http://olio-web:80/fileService.php?cache=false&file=e168t.jpg> size of
>>> 249 bytes is too small. Image may not exist
>>> can be ignored, right?
>>>
>>
>> Well, something is wrong. We don't have images that small. Check whether
>> e168t.jpg is really that small. That's why we have that warning.
>>
>>
>
> It kinda funny, my problem was that I had the olio webkit version installed
> and then I downloaded the version from the online svn repository. I built
> the driver but forgot to update the webpage for my apache server.  Which
> as expected was the source for many of my issues.
>
>
>
>
>>
>>
>>> 5) Last and most important.
>>> I can run the benchmark and all the operation succeed but for login.
>>> I get a bunch of:
>>>
>>> 09:39:25:WARNING:UIDriverAgent[0].2.doLogin: Found login prompt at index
>>> 2926, Login as at786o08x, 2178 failed.
>>> Note: Error not counted in result.
>>> Either transaction start or end time is not within steady state.
>>> java.lang.RuntimeException: Found login prompt at index 2926, Login as
>>> at786o08x, 2178 failed.
>>>    at org.apache.olio.workload.driver.UIDriver.doLogin (404)
>>>    at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
>>>    at sun.reflect.NativeMethodAccessorImpl.invoke (39)
>>>    at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
>>>    at java.lang.reflect.Method.invoke (597)
>>>    at com.sun.faban.driver.engine.TimeThread.doRun (169)
>>>    at com.sun.faban.driver.engine.AgentThread.run (202)
>>>
>>> Any ideas? I do get
>>>
>>
>> You likely have cookie issues. It can't seem to hold on to a session.
>>
>>
>>
> Well there was a permission issue with the http_session dir. I could not
> right to it. chmod 777 it fixed this.
>
>
>>
>>
>>
>>> (I ve found online:
>>> http://www.mail-archive.com/olio-dev@incubator.apache.org/msg00647.htmlwhich
is similar, but when I added
>>>
>>> com.sun.faban.driver.transport.sunhttp.ThreadCookieHandler.level=FINER
>>>  in build.properties
>>> I did not see any cookie related warnings. Those should appear in the
>>> olio run log or the apache log, right? Am i just looking at the wrong place?
>>> )
>>>
>>
>> Yes, that's applicable only to the Sun Http Transport. The version of Olio
>> you're using is based on the Apache Http Transport (Apache HttpClient 3.1).
>> The ThreadCookieHandler is not used for the Apache transport and that's why
>> you don't see any logs. Try upgrade to Faban 1.0 before looking at other
>> things.
>>
>>
>>>
>>> It's a long email I know. Your feedback would be most appreciated.
>>>
>>> -Regards
>>> -------------------------------------------------------------------
>>> Kontorinis Vasileios
>>> Phd student, University of California San Diego
>>> San Diego, CA 92122
>>> Cell. phone: (858) 717 6899
>>> bkontorinis@gmail.com <mailto:bkontorinis@gmail.com>, vkontori@ucsd.edu<mailto:
>>> vkontori@ucsd.edu>
>>> -------------------------------------------------------------------\
>>>
>>
>> Thanks for all the questions/comments.
>>
>> -Akara
>>
>>
>
> And now some more questions/ comments:
> 1) I get the following error:
>
> 15:13:05:SEVERE:CmdService: Getting - exception reading
> /usr/data/olio-db.err
> java.io.FileNotFoundException: File /usr/data/olio-db.err does not exist.
>     at com.sun.faban.common.FileTransfer.<init> (70)
>     at com.sun.faban.harness.agent.FileAgentImpl.get (315)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
>     at sun.reflect.NativeMethodAccessorImpl.invoke (39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
>     at java.lang.reflect.Method.invoke (597)
>     at sun.rmi.server.UnicastServerRef.dispatch (305)
>     at sun.rmi.transport.Transport$1.run (159)
>     at java.security.AccessController.doPrivileged (null)
>     at sun.rmi.transport.Transport.serviceCall (155)
>     at sun.rmi.transport.tcp.TCPTransport.handleMessages (535)
>     at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0 (790)
>     at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run (649)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask (885)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run (907)
>     at java.lang.Thread.run (619)
>     at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer (255)
>     at sun.rmi.transport.StreamRemoteCall.executeCall (233)
>     at sun.rmi.server.UnicastRef.invoke (142)
>     at com.sun.faban.harness.agent.FileAgentImpl_Stub.get (null)
>     at com.sun.faban.harness.engine.CmdService.get (1334)
>     at com.sun.faban.harness.RunContext.getFile (346)
>     at com.sun.services.MySQLService.getLogs (197)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
>     at sun.reflect.NativeMethodAccessorImpl.invoke (39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
>     at java.lang.reflect.Method.invoke (597)
>     at com.sun.faban.harness.util.Invoker.invoke (98)
>     at com.sun.faban.harness.services.ServiceWrapper.getLogs (200)
>     at com.sun.faban.harness.services.ServiceManager.getLogs (642)
>     at com.sun.faban.harness.engine.GenericBenchmark.start (323)
>     at com.sun.faban.harness.engine.RunDaemon.run (338)
>     at java.lang.Thread.run (619)
> 15:13:05:WARNING:Could not copy /usr/data/olio-db.err to
> /home/gdhiman/faban.1.0/faban/output/OlioDriver.2D/mysql_err.log.olio-db
>
> Apparently something is misconfigured in my db-server. Any ideas?
>
> 2) I get the following error:
> 15:13:16:WARNING:[/home/gdhiman/faban.1.0/faban/bin/fenxi, process,
> /home/gdhiman/faban.1.0/faban/output/OlioDriver.2D/,
> /home/gdhiman/faban.1.0/faban/output/OlioDriver.2D//post/, OlioDriver.2D]
> stderr:
> Error in executing perl
> /home/gdhiman/faban.1.0/faban/master/webapps/fenxi/txt2db/mpstat.pl
> Error in executing perl
> /home/gdhiman/faban.1.0/faban/master/webapps/fenxi/txt2db/mpstat.pl
>
> Actually I traced back this one. The problem is the difference in output
> format of the Sun's mpstat and default GNU mpstat.
> This is my output of my mpstat:
>
> gdhiman@olio-client00:~/faban.1.0/faban/output/OlioDriver.2D$ mpstat 1
> Linux 2.6.18.8-xen (olio-client00)     01/16/10
>
> 16:25:06     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal
> %idle    intr/s
> 16:25:07     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00
> 100.00     52.48
> 16:25:08     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00
> 100.00     50.50
> 16:25:09     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00
> 100.00     79.21
> 16:25:10     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00
> 100.00     45.54
> 16:25:11     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00
> 100.00     55.45
>
> The first line as well as the time at the beginning of each entry messing
> up the parsing at mpstat.pl. (also the fields are different)   Any plans
> to support this??
>
> 3) Scaling questions.
> - So far I did not have a single experiment passing. Some are pretty close
> with only one metric check failing.
>
> Average images loaded per Home Page2.79>= 3
> FAILED
> Any ideas? Is it the case that the disc is not fast enough? I am just using
> the local filesystem for the filestore.
>
> - As I double the number of concurrent users I observe linear scaling in
> the thoughput.
> Con Users         Throughput
>  25                        4.967
>  50                       10.06
> 100                      19.375
> 200                      40.21
> 400                      75.818
> 800                       0.383
> 1000                     0.483
>
> The linear scaling stops for 400 concurrent users ( only one agent).
> Actually it would be exactly linear (value of ~80) but almost half of the
> login operations failed. I am looking into it.
> Any insights on what might be the first thing failing?
>
> For the 800 and 1000 experiments there are no failed operations logged. It
> looks like those are being discarded... (?)
>
> Bonus question:
> In the runtime statistics
> <runtimeStats enabled="true">
>          <interval>30</interval>
>  </runtimeStats>
>
> only the 90% response time is reported. Is there an easy way to also report
> the 99% ? ( or I need to add code for that?)
>
>
> Thanks a lot again in advance.
> -VK
>

Mime
View raw message