incubator-olio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasileios Kontorinis <bkontori...@gmail.com>
Subject Re: Stuck with olio.
Date Wed, 27 Jan 2010 09:34:10 GMT
Shanti hi again,
   I checked my apache logs and there were a bunch of errors.
It looks like there some issues with the
webapp/php/trunk/classes/ImageUtil.php in the last release of olio. (I
downloaded
http://www.alliedquotes.com/mirrors/apache/incubator/olio/0.2/apache-olio-php-src-0.2.tar.gz)
1) There is a line that needs to be commented. php complains ("1.5. Must be
greater than zero.").
2) Then, it was complaining that it cannot find function
fastimagecopyresampled . To work around that moved the function
fastimagecopyresampled above createThumb (this might not  be required ) and
deplared it static.
    Finally,  I call the function from createThumb with
self::fastimagecopyresampled .
3) Then, it started complaining because it could not write to the filestore.
The problem is that wants to write the new images as www-data from the
apache, while the filestore does not have write persmission for others.
Manually,
    giving access solves the problem (chmod -R o+w <path>/filestore) but
since the directories in filestore are generated automatically, maybe the
chmod command should be added in fileloader.sh

Funnily enough, after fixing those issues, I still cannot pass the:
Average images loaded per Home Page 2.65   >=3       FAILED

and on top of that I also have:
Response Times (secs)
AddPerson     5.190  13.194  3.387 8.800     3.000 FAILED
AddEvent       5.904  16.784  3.159 10.400   4.000 FAILED

Think tims for AddPerson and AddEvent fail as well.

Any insights are welcome .... :-(

-------------------------------------------------------------------
Kontorinis Vasileios
Phd student, University of California San Diego
San Diego, CA 92122
Cell. phone: (858) 717 6899
bkontorinis@gmail.com, vkontori@ucsd.edu
-------------------------------------------------------------------


2010/1/26 Shanti Subramanyam <shanti.subramanyam@gmail.com>

> Yes - 0.2 requires a lot more disk space as we changed the ratio of
> concurrent users to registered users to 1:100. If you haven't already,
> please check out our published Blueprints for detailed performance
> characteristics of the workload:
> Deploying Web 2.0 Applications on Sun Servers and the OpenSolaris Operating
> System<http://wikis.sun.com/display/BluePrints/Deploying+Web+2.0+Applications+on+Sun+Servers+and+the+OpenSolaris+Operating+System>
> <http://wikis.sun.com/display/BluePrints/Deploying+Web+2.0+Applications+on+Sun+Servers+and+the+OpenSolaris+Operating+System>
> If you run for long enough, you should get passing runs. Have you verified
> that there are no errors in the run logs when you see the 'Avg. images
> loaded per home page' fail ?
>
> On to your open files error  - you may have to tune your networking tier
> and/or #open file descriptors. I don't believe we have ever seen as many
> files open as you are seeing. Can you determine whether these are from the
> file store or network ? We also typically run the filestore on a different
> system and nfs-mount it on the webserver box.
> You will have to tune your system to ensure good performance since you will
> need memory for both apache and files.
>
> Shanti
>
>
> On Mon, Jan 25, 2010 at 5:06 PM, Vasileios Kontorinis <
> bkontorinis@gmail.com> wrote:
>
>> Akara and Shanti hi,
>>    I did migrate to Olio 0.2. With the last version of Olio I came across
>> some new interesting things.
>>
>> Scaling issues:
>>   - I am still getting the:
>> Average images loaded per Home Page2.55>= 3
>> FAILED
>>  - additionally, when I scale the concurrent users to 800 I run out of
>> diskspace since my filestore occupies more than 62GB.
>> Actually for 600 users it occupies 50GB. I was curious if that makes
>> sense. How much space I will need to reach 1000 users?
>> In the php_setup.html it suggests that we will need 50GB but apparently we
>> need way more for large number of users.
>>
>>  - Finally and most importantly, for 600 users many of the operations fail
>> with the exception:
>> Message: java.net.SocketException: Too many open files
>> Stack Trace:
>>  Class Method Line java.net.PlainSocketImpl socketAccept
>> java.net.PlainSocketImpl accept 390 java.net.ServerSocket implAccept 453
>> java.net.ServerSocket accept 421
>> sun.rmi.transport.tcp.TCPTransport$AcceptLoop executeAcceptLoop 369
>> sun.rmi.transport.tcp.TCPTransport$AcceptLoop run 341java.lang.Thread run
>> 619
>> or
>>
>> java.net.SocketException: Too many open files
>> Stack Trace:
>>  Class Method Line java.net.Socket createImpl 394 java.net.Socket getImpl
>> 457 java.net.Socket bind 571
>> com.sun.faban.driver.transport.hc3.ProtocolTimedSocketFactory
>> createSocket 60 org.apache.commons.httpclient.HttpConnection open 707
>> org.apache.commons.httpclient.HttpMethodDirector executeWithRetry 387
>> org.apache.commons.httpclient.HttpMethodDirector executeMethod 171
>> org.apache.commons.httpclient.HttpClient executeMethod 397
>> org.apache.commons.httpclient.HttpClient executeMethod 323
>> com.sun.faban.driver.transport.hc3.ApacheHC3Transport readURL 274
>> org.apache.olio.workload.driver.UIDriver doLogin 398
>> org.apache.olio.workload.driver.UIDriver doLogin 424
>> sun.reflect.GeneratedMethodAccessor8 invoke
>> sun.reflect.DelegatingMethodAccessorImpl invoke 25
>> java.lang.reflect.Method invoke 597
>> com.sun.faban.driver.engine.TimeThread doRun 169
>> com.sun.faban.driver.engine.AgentThrea
>>
>> I am monitoring the number of open files in the web-server with   `watch
>> "lsof | wc"` and the olio starts failing when around 65000-70,000 files are
>> open. lsof shows that for each apache2 thread there are around 100 files
>> open. Therefore there are around 650-700 different apache2 threads that
>> create the bulk of those open file descriptors.
>> The soft and hard limit is set to 403238, which means that there should be
>> many more open files before it will start failing.
>> (Actually, I verified the limit by opening a bunch of files with a python
>> script and it does reach the limitation of 403238.)
>> Any insights?  Is there any chance the the file descriptors take more time
>> that usual to be reclaimed after being closed in the xen vm I use for my
>> web-server? Does it make sense for olio at the first place to have so many
>> files open at the same time?
>>
>> Thanks again.
>>
>>
>> -------------------------------------------------------------------
>> Kontorinis Vasileios
>> Phd student, University of California San Diego
>> San Diego, CA 92122
>> Cell. phone: (858) 717 6899
>> bkontorinis@gmail.com, vkontori@ucsd.edu
>> -------------------------------------------------------------------
>>
>>
>> 2010/1/16 Shanti Subramanyam <shanti.subramanyam@gmail.com>
>>
>>  I would really recommend that you migrate to Olio 0.2. In addition to bug
>>> fixes, there are some major features changes in it. See Olio
>>> 0.2 released<http://perfwork.wordpress.com/2010/01/13/olio-0-2-relesed/>
>>>
>>>
>>> Shanti
>>>
>>>
>>> On Sat, Jan 16, 2010 at 4:49 PM, Vasileios Kontorinis <
>>> bkontorinis@gmail.com> wrote:
>>>
>>>> Akara hi again,
>>>>    Below I have comments on your suggestions and at the end some bonus
>>>> questions... Thanks again.
>>>>
>>>> 2010/1/13 Akara Sucharitakul <Akara.Sucharitakul@sun.com>
>>>>
>>>>> With your permission, I'd like to copy the Olio and Faban user aliases
>>>>> going forward. I feel it will help a much wider audience. Please see
below
>>>>> for answers/comments:
>>>>>
>>>>> Sure. I cced olio user alias. I am not sure which is the right faban
>>>> list.
>>>>
>>>>
>>>>> Vasileios Kontorinis wrote:
>>>>>
>>>>>> Akara hi,
>>>>>>   I am a grad student at UCSD and I use Olio for a research project
>>>>>> where we want to measure olio performance under live virtual machine
>>>>>> migration. We use ubuntu 8.04 on nehalem servers.
>>>>>> I have co ed the last version of olio from the online svn repository
>>>>>> and downloaded the last version of faban (faban-kit-101509.tar.gz
<
>>>>>> http://faban.sunsource.net/nightly/faban-kit-101509.tar.gz>)
>>>>>>
>>>>>
>>>>> 101509 is fairly recent. But the latest on the web site is 111109
>>>>> (Faban 1.0). There were just bug fixes between those releases.
>>>>
>>>>
>>>> I have upgraded to Faban 1.0, still using olio1.0 though ( the release
>>>> of 2.0 was announced, will switch to it if I run into bugs that have been
>>>> fixed)
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>> So far, I employed a bunch of hacks to get most of it to work and
I am
>>>>>> almost there. In the process I got a bunch of questions.
>>>>>>
>>>>>> Questions (some of them might be just faban related, not olio so
bear
>>>>>> with me):
>>>>>> 1) In there any way to deploy OlioDriver.jar through the command
line?
>>>>>> Firefox through ssh forwarding is dead slow and I d rather avoid
if I can.
>>>>>>
>>>>>
>>>>> Just drop the jar into faban/benchmarks/ and it will deploy itself.
>>>>> This is documented at
>>>>> http://faban.sunsource.net/1.0/docs/guide/harnessdev/deploybenchmark.htmlunder
"Alternate Deployment Methods."
>>>>>
>>>>>
>>>>>  2) The services ApacheHttpdService, MemcachedService, MySQLService
>>>>>> that come with Faban should be deployed before running Olio?
>>>>>>    I was getting some very weird errors. e.g.
>>>>>>
>>>>>
>>>>> Yes, you should. Olio will search for those.
>>>>>
>>>>> Done
>>>>
>>>>
>>>>>
>>>>>
>>>>>> 03:50:27:WARNING:UIDriverAgent[0]: Forcefully terminating benchmark
>>>>>> run
>>>>>> 03:50:27:WARNING:UIDriverAgent[0]: 25 threads forcefully terminated.
>>>>>> java.lang.Throwable: Stack of non-terminating thread.
>>>>>>    at java.net.SocketInputStream.socketRead0 (null)
>>>>>>    at java.net.SocketInputStream.read (129)
>>>>>>    at java.io.FilterInputStream.read (116)
>>>>>>    at com.sun.faban.driver.transport.util.TimedInputStream.read (139)
>>>>>>    at java.io.BufferedInputStream.fill (218)
>>>>>>    at java.io.BufferedInputStream.read (237)
>>>>>>    at org.apache.commons.httpclient.HttpParser.readRawLine (78)
>>>>>>    at org.apache.commons.httpclient.HttpParser.readLine (106)
>>>>>>    at org.apache.commons.httpclient.HttpConnection.readLine (1116)
>>>>>>    at org.apache.commons.httpclient.HttpMethodBase.readStatusLine
>>>>>> (1973)
>>>>>>    at org.apache.commons.httpclient.HttpMethodBase.readResponse (1735)
>>>>>>    at org.apache.commons.httpclient.HttpMethodBase.execute (1098)
>>>>>>    at
>>>>>> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry
(398)
>>>>>>    at org.apache.commons.httpclient.HttpMethodDirector.executeMethod
>>>>>> (171)
>>>>>>    at org.apache.commons.httpclient.HttpClient.executeMethod (397)
>>>>>>    at org.apache.commons.httpclient.HttpClient.executeMethod (323)
>>>>>>    at com.sun.faban.driver.transport.hc3.ApacheHC3Transport.fetchURL
>>>>>> (529)
>>>>>>    at com.sun.faban.driver.transport.hc3.ApacheHC3Transport.fetchURL
>>>>>> (552)
>>>>>>    at org.apache.olio.workload.driver.UIDriver.doHomePage (355)
>>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
>>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke (39)
>>>>>>    at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
>>>>>>    at java.lang.reflect.Method.invoke (597)
>>>>>>    at com.sun.faban.driver.engine.TimeThread.doRun (169)
>>>>>>    at com.sun.faban.driver.engine.AgentThread.run (202)
>>>>>>
>>>>>> and afterwards the master was waiting for threads to join for ever...
>>>>>> (I attached gdb to verify that something was wrong) and hence I had
to kill
>>>>>> the benchmark.
>>>>>>
>>>>>
>>>>> These threads are hanging reading the server responses, that never
>>>>> came.
>>>>>
>>>>>
>>>> Building the services from Faban probably fixes it.
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>> In the Olio log there are WARNINGS  complaining about not deploying
>>>>>> those. After building those and manually copying them to /faban/services
>>>>>> (ant deploy did not place them there... :-(  )
>>>>>>
>>>>>
>>>>> Yes. But ant deploy should get them there. If not, can you please let
>>>>> me know the ant messages?
>>>>
>>>>
>>>> Ant was deploying them indeed. I had a mistake in building.properties.
>>>> I had:  faban.url=http://<hostname>:9980/   instead of  faban.url=
>>>> http://localhost:9980/
>>>> After I changed that it started working...
>>>>
>>>>
>>>>>
>>>>>  it worked. (mostly worked)
>>>>>>
>>>>>> 3) I still have warnings like:
>>>>>> 01:38:08:INFO:Time difference to host olio-web is 269 ms. Attempting
>>>>>> to set clock.
>>>>>> 01:38:08:INFO:Time difference to host olio-db is 263 ms. Attempting
to
>>>>>> set clock.
>>>>>>
>>>>>
>>>>> These two are OK. Just trying to do a clock sync between the systems.
>>>>>
>>>>>
>>>>>  01:38:08:WARNING:olio-web wakeup-before time reached 700ms limit.
>>>>>> System is too busy. Giving up.
>>>>>>
>>>>>
>>>>> This is one of Faban's clock-setting calibrations. If the system is too
>>>>> busy or you run on some virtualization architectures, the lag time between
>>>>> an intended end of sleep and the actual time when the thread really wakes
up
>>>>> (gets scheduled/executed) is too high, calibrations will fail.
>>>>>
>>>>>
>>>>>  01:38:08:INFO:Time difference to host olio-mem is 262 ms. Attempting
>>>>>> to set clock.
>>>>>> 01:38:10:WARNING:olio-db wakeup-before time reached 700ms limit.
>>>>>> System is too busy. Giving up.
>>>>>> 09:38:09:WARNING:[date, -u, 011309382010.10]
>>>>>> stderr:
>>>>>> date: cannot set date: Operation not permitted
>>>>>> 09:38:09:WARNING:Error on "[date, -u, 011309382010.10]" command trying
>>>>>> to set the date. Exit value: 1
>>>>>> 09:38:10:WARNING:[date, -u, 011309382010.11]
>>>>>> stderr:
>>>>>> date: cannot set date: Operation not permitted
>>>>>> 09:38:10:WARNING:Error on "[date, -u, 011309382010.11]" command trying
>>>>>> to set the date. Exit value: 1
>>>>>> 09:38:09:WARNING:[date, -u, 011309382010.10]
>>>>>> stderr:
>>>>>> date: cannot set date: Operation not permitted
>>>>>> 09:38:09:WARNING:Error on "[date, -u, 011309382010.10]" command trying
>>>>>> to set the date. Exit value: 1
>>>>>> 09:38:10:WARNING:[date, -u, 011309382010.11]
>>>>>> stderr:
>>>>>> date: cannot set date: Operation not permitted
>>>>>>
>>>>>> Leting faban change the vm clock sounds from the beginning a bad
idea.
>>>>>>
>>>>>
>>>>> OK. So it is xen. Yes, this is what Faban is trying to solve. You can
>>>>> certainly turn it off. Please see:
>>>>>   http://faban.sunsource.net/1.0/docs/howd services
>>>>> ApacheHttpdService, MemcachedService, MySQLService that come with Faban
>>>>> should be deployed before running Olio?
>>>>>    I was gettingoi/physclocksync.html<http://faban.sunsource.net/1.0/docs/howdoi/physclocksync.html>
>>>>>
>>>>>
>>>> I added the  <fh:timeSync>false<fh:timeSync> in my run.xml file
( btw
>>>> in the link above there is a mistake :  <fh:timeSync>false
>>>> </fh:timeSync> is correct, the second <fh:timeSync> needs a closing
>>>> tag, the "/" is missing)
>>>> that made the warnings go away.
>>>>
>>>>
>>>>
>>>>>
>>>>>  Unfortunately, xen is really bad in maintaining an accurate clock. As
>>>>>> a result there is usually time difference between the different virtual
>>>>>> machines
>>>>>> of more than 10ms. I went over the setTime function in Faban source
>>>>>> (/faban/com/sun/faban/harness/agent/CmdAgentImpl.java), it's big
and ugly
>>>>>> (very ugly)
>>>>>>
>>>>>
>>>>> Thanks for the compliments! I think you mean CmdService.setClockTask.
>>>>> Time sensitive code ain't pretty. It is the complexities dealing with
the
>>>>> clock and trying to achieve good accuracy. If you think you can simplify
>>>>> this, I'm listening (without loosing the accuracy, of course). In
>>>>> comparison, CmdAgentImpl has nothing.
>>>>>
>>>>>
>>>> Yes, you r right it is CmdService.setClockTask. The previous email was
>>>> composed at 3am ... :-)
>>>> I am still a little confused.  the setClockTask is used to set the clock
>>>> so that all the machines are synchronized with master. From what you
>>>> mentioned the physical clock sync is only used for the logs.
>>>> Why do we need to do that since 1) it requires root privileges (which
>>>> might not be always available) 2) I could imagine an alternative that uses
>>>> deltas from the actual physical clock without having to set it.
>>>> ( I am probably missing something... :-)
>>>>
>>>>
>>>>
>>>>>  Why there is this strict requirement for 10ms difference? Any ideas?
>>>>>>
>>>>>
>>>>> It is easily achievable in most cases. May not be true for VMs.
>>>>>
>>>>> On some VM architectures, the OS however does not get scheduled till
>>>>> way after that, thus causing problems. You may be able to measure
>>>>> performance on those VMs. But you don't want to use such VMs to be a
driver.
>>>>> Your response time measurements will be way off.
>>>>>
>>>>> The physical clock sync is not really rigorous. And you can turn it
>>>>> off. It is more to keep the systems in good time sync. If your VM stands
in
>>>>> the way, just turn it off. The driver's virtual clock sync is much more
>>>>> picky in comparison. This is because the start time for the steady state
>>>>> should be the same (with a very small tolerance) no matter how many drivers
>>>>> are driving. Otherwise the measurement period won't be the same when
viewed
>>>>> from different drivers and the results won't be reliable.
>>>>>
>>>>>
>>>>>  Even with ntp it's hard to provide the 10ms guarantee.
>>>>>>
>>>>>
>>>>> That's why we don't use ntp ;-)
>>>>
>>>>
>>>> Just out of curiosity, the physical clocks are set only once at the
>>>> beginning (right?), therefore for long runs the 10ms difference will not
be
>>>> guaranteed. Nope? Especially under VMs I 've seen significant clock
>>>> difference withing a few minutes.
>>>> At least ntp can periodically resync (of course doing so, might screw up
>>>> the logs with time going backwards etc)
>>>>
>>>>
>>>>>
>>>>>
>>>>>  I am thinking of modifying this function to always return that the
>>>>>> time difference is less than 10ms (so that I do not have to wait
all the
>>>>>> time for the timeouts.)
>>>>>>
>>>>>
>>>>> Why bother. Don't like it, just turn it off. It has good use in most
>>>>> configurations we're dealing with. And, it avoids ntp inaccuracies.
>>>>>
>>>>>
>>>>>  Will this break anything in Olio?
>>>>>>
>>>>>
>>>>> Nope. Except the times in your logs will appear out of sequence. They
>>>>> rely on the local time on the originating systems.
>>>>>
>>>>>
>>>>>> 4) Warning like:
>>>>>> 09:39:48:WARNING:Image at
>>>>>> http://olio-web:80/fileService.php?cache=false&file=e168t.jpg
<
>>>>>> http://olio-web:80/fileService.php?cache=false&file=e168t.jpg>
size
>>>>>> of 249 bytes is too small. Image may not exist
>>>>>> can be ignored, right?
>>>>>>
>>>>>
>>>>> Well, something is wrong. We don't have images that small. Check
>>>>> whether e168t.jpg is really that small. That's why we have that warning.
>>>>>
>>>>>
>>>>
>>>> It kinda funny, my problem was that I had the olio webkit version
>>>> installed and then I downloaded the version from the online svn repository.
>>>> I built the driver but forgot to update the webpage for my apache server.
>>>> Which
>>>> as expected was the source for many of my issues.
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>> 5) Last and most important.
>>>>>> I can run the benchmark and all the operation succeed but for login.
>>>>>> I get a bunch of:
>>>>>>
>>>>>> 09:39:25:WARNING:UIDriverAgent[0].2.doLogin: Found login prompt at
>>>>>> index 2926, Login as at786o08x, 2178 failed.
>>>>>> Note: Error not counted in result.
>>>>>> Either transaction start or end time is not within steady state.
>>>>>> java.lang.RuntimeException: Found login prompt at index 2926, Login
as
>>>>>> at786o08x, 2178 failed.
>>>>>>    at org.apache.olio.workload.driver.UIDriver.doLogin (404)
>>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
>>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke (39)
>>>>>>    at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
>>>>>>    at java.lang.reflect.Method.invoke (597)
>>>>>>    at com.sun.faban.driver.engine.TimeThread.doRun (169)
>>>>>>    at com.sun.faban.driver.engine.AgentThread.run (202)
>>>>>>
>>>>>> Any ideas? I do get
>>>>>>
>>>>>
>>>>> You likely have cookie issues. It can't seem to hold on to a session.
>>>>>
>>>>>
>>>>>
>>>> Well there was a permission issue with the http_session dir. I could not
>>>> right to it. chmod 777 it fixed this.
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>> (I ve found online:
>>>>>> http://www.mail-archive.com/olio-dev@incubator.apache.org/msg00647.htmlwhich
is similar, but when I added
>>>>>>
>>>>>> com.sun.faban.driver.transport.sunhttp.ThreadCookieHandler.level=FINER
>>>>>>  in build.properties
>>>>>> I did not see any cookie related warnings. Those should appear in
the
>>>>>> olio run log or the apache log, right? Am i just looking at the wrong
place?
>>>>>> )
>>>>>>
>>>>>
>>>>> Yes, that's applicable only to the Sun Http Transport. The version of
>>>>> Olio you're using is based on the Apache Http Transport (Apache HttpClient
>>>>> 3.1). The ThreadCookieHandler is not used for the Apache transport and
>>>>> that's why you don't see any logs. Try upgrade to Faban 1.0 before looking
>>>>> at other things.
>>>>>
>>>>>
>>>>>>
>>>>>> It's a long email I know. Your feedback would be most appreciated.
>>>>>>
>>>>>> -Regards
>>>>>> -------------------------------------------------------------------
>>>>>> Kontorinis Vasileios
>>>>>> Phd student, University of California San Diego
>>>>>> San Diego, CA 92122
>>>>>> Cell. phone: (858) 717 6899
>>>>>> bkontorinis@gmail.com <mailto:bkontorinis@gmail.com>,
>>>>>> vkontori@ucsd.edu <mailto:vkontori@ucsd.edu>
>>>>>> -------------------------------------------------------------------\
>>>>>>
>>>>>
>>>>> Thanks for all the questions/comments.
>>>>>
>>>>> -Akara
>>>>>
>>>>>
>>>>
>>>> And now some more questions/ comments:
>>>> 1) I get the following error:
>>>>
>>>> 15:13:05:SEVERE:CmdService: Getting - exception reading
>>>> /usr/data/olio-db.err
>>>> java.io.FileNotFoundException: File /usr/data/olio-db.err does not
>>>> exist.
>>>>     at com.sun.faban.common.FileTransfer.<init> (70)
>>>>     at com.sun.faban.harness.agent.FileAgentImpl.get (315)
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke (39)
>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
>>>>     at java.lang.reflect.Method.invoke (597)
>>>>     at sun.rmi.server.UnicastServerRef.dispatch (305)
>>>>     at sun.rmi.transport.Transport$1.run (159)
>>>>     at java.security.AccessController.doPrivileged (null)
>>>>     at sun.rmi.transport.Transport.serviceCall (155)
>>>>     at sun.rmi.transport.tcp.TCPTransport.handleMessages (535)
>>>>     at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0 (790)
>>>>     at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run (649)
>>>>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask (885)
>>>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run (907)
>>>>     at java.lang.Thread.run (619)
>>>>     at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer
>>>> (255)
>>>>     at sun.rmi.transport.StreamRemoteCall.executeCall (233)
>>>>     at sun.rmi.server.UnicastRef.invoke (142)
>>>>     at com.sun.faban.harness.agent.FileAgentImpl_Stub.get (null)
>>>>     at com.sun.faban.harness.engine.CmdService.get (1334)
>>>>     at com.sun.faban.harness.RunContext.getFile (346)
>>>>     at com.sun.services.MySQLService.getLogs (197)
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke (39)
>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
>>>>     at java.lang.reflect.Method.invoke (597)
>>>>     at com.sun.faban.harness.util.Invoker.invoke (98)
>>>>     at com.sun.faban.harness.services.ServiceWrapper.getLogs (200)
>>>>     at com.sun.faban.harness.services.ServiceManager.getLogs (642)
>>>>     at com.sun.faban.harness.engine.GenericBenchmark.start (323)
>>>>     at com.sun.faban.harness.engine.RunDaemon.run (338)
>>>>     at java.lang.Thread.run (619)
>>>> 15:13:05:WARNING:Could not copy /usr/data/olio-db.err to
>>>> /home/gdhiman/faban.1.0/faban/output/OlioDriver.2D/mysql_err.log.olio-db
>>>>
>>>> Apparently something is misconfigured in my db-server. Any ideas?
>>>>
>>>> 2) I get the following error:
>>>> 15:13:16:WARNING:[/home/gdhiman/faban.1.0/faban/bin/fenxi, process,
>>>> /home/gdhiman/faban.1.0/faban/output/OlioDriver.2D/,
>>>> /home/gdhiman/faban.1.0/faban/output/OlioDriver.2D//post/, OlioDriver.2D]
>>>> stderr:
>>>> Error in executing perl
>>>> /home/gdhiman/faban.1.0/faban/master/webapps/fenxi/txt2db/mpstat.pl
>>>> Error in executing perl
>>>> /home/gdhiman/faban.1.0/faban/master/webapps/fenxi/txt2db/mpstat.pl
>>>>
>>>> Actually I traced back this one. The problem is the difference in output
>>>> format of the Sun's mpstat and default GNU mpstat.
>>>> This is my output of my mpstat:
>>>>
>>>> gdhiman@olio-client00:~/faban.1.0/faban/output/OlioDriver.2D$ mpstat
>>>> 1
>>>> Linux 2.6.18.8-xen (olio-client00)     01/16/10
>>>>
>>>> 16:25:06     CPU   %user   %nice    %sys %iowait    %irq   %soft
>>>> %steal   %idle    intr/s
>>>> 16:25:07     all    0.00    0.00    0.00    0.00    0.00    0.00
>>>> 0.00  100.00     52.48
>>>> 16:25:08     all    0.00    0.00    0.00    0.00    0.00    0.00
>>>> 0.00  100.00     50.50
>>>> 16:25:09     all    0.00    0.00    0.00    0.00    0.00    0.00
>>>> 0.00  100.00     79.21
>>>> 16:25:10     all    0.00    0.00    0.00    0.00    0.00    0.00
>>>> 0.00  100.00     45.54
>>>> 16:25:11     all    0.00    0.00    0.00    0.00    0.00    0.00
>>>> 0.00  100.00     55.45
>>>>
>>>> The first line as well as the time at the beginning of each entry
>>>> messing up the parsing at mpstat.pl. (also the fields are different)
>>>> Any plans to support this??
>>>>
>>>> 3) Scaling questions.
>>>> - So far I did not have a single experiment passing. Some are pretty
>>>> close with only one metric check failing.
>>>>
>>>> Average images loaded per Home Page2.79>= 3
>>>> FAILED
>>>> Any ideas? Is it the case that the disc is not fast enough? I am just
>>>> using the local filesystem for the filestore.
>>>>
>>>> - As I double the number of concurrent users I observe linear scaling in
>>>> the thoughput.
>>>> Con Users         Throughput
>>>>  25                        4.967
>>>>  50                       10.06
>>>> 100                      19.375
>>>> 200                      40.21
>>>> 400                      75.818
>>>> 800                       0.383
>>>> 1000                     0.483
>>>>
>>>> The linear scaling stops for 400 concurrent users ( only one agent).
>>>> Actually it would be exactly linear (value of ~80) but almost half of the
>>>> login operations failed. I am looking into it.
>>>> Any insights on what might be the first thing failing?
>>>>
>>>> For the 800 and 1000 experiments there are no failed operations logged.
>>>> It looks like those are being discarded... (?)
>>>>
>>>> Bonus question:
>>>> In the runtime statistics
>>>> <runtimeStats enabled="true">
>>>>          <interval>30</interval>
>>>>  </runtimeStats>
>>>>
>>>> only the 90% response time is reported. Is there an easy way to also
>>>> report the 99% ? ( or I need to add code for that?)
>>>>
>>>>
>>>> Thanks a lot again in advance.
>>>> -VK
>>>>
>>>
>>>
>>
>

Mime
View raw message