incubator-olio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasileios Kontorinis <bkontori...@gmail.com>
Subject Olio Scaling
Date Mon, 08 Feb 2010 09:18:55 GMT
Akara and Shanti,
  I managed to fix a very subtle issue with xen. There was an issue with the
checksum that reduces the throughput of the network from 1Gbs to 1Mbs.
When that was fixed my I managed to scale to 1800 concurrent users.
However, the only metric failing now is the

Average images loaded per Home Page 2.65   >=3       FAILED

Actually I managed to get a passing result for 25 users.

The logs seem clean.
I only get
[Mon Feb 08 07:56:13 2010] [error] [client 10.17.255.250] index.php waiting
for cache

and

Thu Feb 04 00:10:51 2010] [error] [client 10.17.255.250]
olio-local-web:21355 obtained HomeUpdateLock
[Thu Feb 04 00:10:51 2010] [error] [client 10.17.255.250]
olio-local-web:21355 released HomeUpdateLock

Any idea on how to debug this the failing metric?

I also had some question regarding Memcached. In the MemcachedStats output
log I get:

Server              Time  items  cache_MB  conns  sets/s  gets/s  get_hits/s
 get_misses/s  evicts/s  rB/s    wB/s
--------------  --------  -----  --------  -----  ------  ------  ----------
 ------------  --------  ----  ------
olio-mem:11211  04:20:47      3      0.05     34    1.70   13.10       10.30
         2.80         0  5709  216402


Server              Time  items  cache_MB  conns  sets/s  gets/s  get_hits/s
 get_misses/s  evicts/s  rB/s  wB/s
--------------  --------  -----  --------  -----  ------  ------  ----------
 ------------  --------  ----  ----
olio-mem:11211  04:20:47      3      0.05     34    0.00    0.00        0.00
         0.00         0     0    48


Server              Time  items  cache_MB  conns  sets/s  gets/s  get_hits/s
 get_misses/s  evicts/s  rB/s  wB/s
--------------  --------  -----  --------  -----  ------  ------  ----------
 ------------  --------  ----  ----
olio-mem:11211  04:20:47      3      0.05     34    0.00    0.00        0.00
         0.00         0     0    48


Does this mean that I only use 0.05 MB from the memcached memory?
I am pretty sure that the memcached command has  -m 256   which means that I
should be reach close to 256MB, when running with high number of users.
Is cache_MB something different?

Thanks again
-------------------------------------------------------------------
Kontorinis Vasileios
Phd student, University of California San Diego
San Diego, CA 92122
Cell. phone: (858) 717 6899
bkontorinis@gmail.com, vkontori@ucsd.edu
-------------------------------------------------------------------


2010/1/27 Shanti Subramanyam <shanti.subramanyam@gmail.com>

> Yes - these are problems that I'm already aware of.
> The best solution to the filestore issue is to change ownership of the
> directory to the same user/group as the apache process. We could have the
> fileloader.sh change write access I guess, but since that's a big security
> hole, we may not want to do that automatically without letting the user know
> about it.
>
> The fact that your response times are so high indicate that you're running
> a far larger load than the system can handle and/or you still need some
> tuning.
> I suggest you start over from say 100 users and see at what point your
> response times start getting really large. The apache error log should be
> pulled in as part of the 'Statistics' tab, so do keep monitoring that.
>
> Shanti
>
>
> On Wed, Jan 27, 2010 at 1:34 AM, Vasileios Kontorinis <
> bkontorinis@gmail.com> wrote:
>
>> Shanti hi again,
>>    I checked my apache logs and there were a bunch of errors.
>> It looks like there some issues with the
>> webapp/php/trunk/classes/ImageUtil.php in the last release of olio. (I
>> downloaded
>> http://www.alliedquotes.com/mirrors/apache/incubator/olio/0.2/apache-olio-php-src-0.2.tar.gz)
>> 1) There is a line that needs to be commented. php complains ("1.5. Must
>> be greater than zero.").
>> 2) Then, it was complaining that it cannot find function
>> fastimagecopyresampled . To work around that moved the function
>> fastimagecopyresampled above createThumb (this might not  be required ) and
>> deplared it static.
>>     Finally,  I call the function from createThumb with
>> self::fastimagecopyresampled .
>> 3) Then, it started complaining because it could not write to the
>> filestore. The problem is that wants to write the new images as www-data
>> from the apache, while the filestore does not have write persmission for
>> others. Manually,
>>     giving access solves the problem (chmod -R o+w <path>/filestore) but
>> since the directories in filestore are generated automatically, maybe the
>> chmod command should be added in fileloader.sh
>>
>> Funnily enough, after fixing those issues, I still cannot pass the:
>> Average images loaded per Home Page 2.65   >=3       FAILED
>>
>> and on top of that I also have:
>> Response Times (secs)
>> AddPerson     5.190  13.194  3.387 8.800     3.000 FAILED
>> AddEvent       5.904  16.784  3.159 10.400   4.000 FAILED
>>
>> Think tims for AddPerson and AddEvent fail as well.
>>
>> Any insights are welcome .... :-(
>>
>> -------------------------------------------------------------------
>> Kontorinis Vasileios
>> Phd student, University of California San Diego
>> San Diego, CA 92122
>> Cell. phone: (858) 717 6899
>> bkontorinis@gmail.com, vkontori@ucsd.edu
>> -------------------------------------------------------------------
>>
>>
>> 2010/1/26 Shanti Subramanyam <shanti.subramanyam@gmail.com>
>>
>>> Yes - 0.2 requires a lot more disk space as we changed the ratio of
>>> concurrent users to registered users to 1:100. If you haven't already,
>>> please check out our published Blueprints for detailed performance
>>> characteristics of the workload:
>>> Deploying Web 2.0 Applications on Sun Servers and the OpenSolaris
>>> Operating System<http://wikis.sun.com/display/BluePrints/Deploying+Web+2.0+Applications+on+Sun+Servers+and+the+OpenSolaris+Operating+System>
>>> <http://wikis.sun.com/display/BluePrints/Deploying+Web+2.0+Applications+on+Sun+Servers+and+the+OpenSolaris+Operating+System>
>>> If you run for long enough, you should get passing runs. Have you
>>> verified that there are no errors in the run logs when you see the 'Avg.
>>> images loaded per home page' fail ?
>>>
>>> On to your open files error  - you may have to tune your networking tier
>>> and/or #open file descriptors. I don't believe we have ever seen as many
>>> files open as you are seeing. Can you determine whether these are from the
>>> file store or network ? We also typically run the filestore on a different
>>> system and nfs-mount it on the webserver box.
>>> You will have to tune your system to ensure good performance since you
>>> will need memory for both apache and files.
>>>
>>> Shanti
>>>
>>>
>>> On Mon, Jan 25, 2010 at 5:06 PM, Vasileios Kontorinis <
>>> bkontorinis@gmail.com> wrote:
>>>
>>>> Akara and Shanti hi,
>>>>    I did migrate to Olio 0.2. With the last version of Olio I came
>>>> across some new interesting things.
>>>>
>>>> Scaling issues:
>>>>   - I am still getting the:
>>>> Average images loaded per Home Page2.55>= 3
>>>> FAILED
>>>>  - additionally, when I scale the concurrent users to 800 I run out of
>>>> diskspace since my filestore occupies more than 62GB.
>>>> Actually for 600 users it occupies 50GB. I was curious if that makes
>>>> sense. How much space I will need to reach 1000 users?
>>>> In the php_setup.html it suggests that we will need 50GB but apparently
>>>> we need way more for large number of users.
>>>>
>>>>  - Finally and most importantly, for 600 users many of the operations
>>>> fail with the exception:
>>>> Message: java.net.SocketException: Too many open files
>>>> Stack Trace:
>>>>  Class Method Line java.net.PlainSocketImpl socketAccept
>>>> java.net.PlainSocketImpl accept 390 java.net.ServerSocket implAccept
>>>> 453 java.net.ServerSocket accept 421
>>>> sun.rmi.transport.tcp.TCPTransport$AcceptLoop executeAcceptLoop 369
>>>> sun.rmi.transport.tcp.TCPTransport$AcceptLoop run 341java.lang.Thread
>>>> run 619
>>>> or
>>>>
>>>> java.net.SocketException: Too many open files
>>>> Stack Trace:
>>>>  Class Method Line java.net.Socket createImpl 394 java.net.Socket
>>>> getImpl 457 java.net.Socket bind 571
>>>> com.sun.faban.driver.transport.hc3.ProtocolTimedSocketFactory
>>>> createSocket 60 org.apache.commons.httpclient.HttpConnection open 707
>>>> org.apache.commons.httpclient.HttpMethodDirector executeWithRetry 387
>>>> org.apache.commons.httpclient.HttpMethodDirector executeMethod 171
>>>> org.apache.commons.httpclient.HttpClient executeMethod 397
>>>> org.apache.commons.httpclient.HttpClient executeMethod 323
>>>> com.sun.faban.driver.transport.hc3.ApacheHC3Transport readURL 274
>>>> org.apache.olio.workload.driver.UIDriver doLogin 398
>>>> org.apache.olio.workload.driver.UIDriver doLogin 424
>>>> sun.reflect.GeneratedMethodAccessor8 invoke
>>>> sun.reflect.DelegatingMethodAccessorImpl invoke 25
>>>> java.lang.reflect.Method invoke 597
>>>> com.sun.faban.driver.engine.TimeThread doRun 169
>>>> com.sun.faban.driver.engine.AgentThrea
>>>>
>>>> I am monitoring the number of open files in the web-server with   `watch
>>>> "lsof | wc"` and the olio starts failing when around 65000-70,000 files are
>>>> open. lsof shows that for each apache2 thread there are around 100 files
>>>> open. Therefore there are around 650-700 different apache2 threads that
>>>> create the bulk of those open file descriptors.
>>>> The soft and hard limit is set to 403238, which means that there should
>>>> be many more open files before it will start failing.
>>>> (Actually, I verified the limit by opening a bunch of files with a
>>>> python script and it does reach the limitation of 403238.)
>>>> Any insights?  Is there any chance the the file descriptors take more
>>>> time that usual to be reclaimed after being closed in the xen vm I use for
>>>> my web-server? Does it make sense for olio at the first place to have so
>>>> many files open at the same time?
>>>>
>>>> Thanks again.
>>>>
>>>>
>>>> -------------------------------------------------------------------
>>>> Kontorinis Vasileios
>>>> Phd student, University of California San Diego
>>>> San Diego, CA 92122
>>>> Cell. phone: (858) 717 6899
>>>> bkontorinis@gmail.com, vkontori@ucsd.edu
>>>> -------------------------------------------------------------------
>>>>
>>>>
>>>> 2010/1/16 Shanti Subramanyam <shanti.subramanyam@gmail.com>
>>>>
>>>>  I would really recommend that you migrate to Olio 0.2. In addition to
>>>>> bug fixes, there are some major features changes in it. See Olio
>>>>> 0.2 released<http://perfwork.wordpress.com/2010/01/13/olio-0-2-relesed/>
>>>>>
>>>>>
>>>>> Shanti
>>>>>
>>>>>
>>>>> On Sat, Jan 16, 2010 at 4:49 PM, Vasileios Kontorinis <
>>>>> bkontorinis@gmail.com> wrote:
>>>>>
>>>>>> Akara hi again,
>>>>>>    Below I have comments on your suggestions and at the end some
bonus
>>>>>> questions... Thanks again.
>>>>>>
>>>>>> 2010/1/13 Akara Sucharitakul <Akara.Sucharitakul@sun.com>
>>>>>>
>>>>>>> With your permission, I'd like to copy the Olio and Faban user
>>>>>>> aliases going forward. I feel it will help a much wider audience.
Please see
>>>>>>> below for answers/comments:
>>>>>>>
>>>>>>> Sure. I cced olio user alias. I am not sure which is the right
faban
>>>>>> list.
>>>>>>
>>>>>>
>>>>>>> Vasileios Kontorinis wrote:
>>>>>>>
>>>>>>>> Akara hi,
>>>>>>>>   I am a grad student at UCSD and I use Olio for a research
project
>>>>>>>> where we want to measure olio performance under live virtual
machine
>>>>>>>> migration. We use ubuntu 8.04 on nehalem servers.
>>>>>>>> I have co ed the last version of olio from the online svn
repository
>>>>>>>> and downloaded the last version of faban (faban-kit-101509.tar.gz
<
>>>>>>>> http://faban.sunsource.net/nightly/faban-kit-101509.tar.gz>)
>>>>>>>>
>>>>>>>
>>>>>>> 101509 is fairly recent. But the latest on the web site is 111109
>>>>>>> (Faban 1.0). There were just bug fixes between those releases.
>>>>>>
>>>>>>
>>>>>> I have upgraded to Faban 1.0, still using olio1.0 though ( the release
>>>>>> of 2.0 was announced, will switch to it if I run into bugs that have
been
>>>>>> fixed)
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> So far, I employed a bunch of hacks to get most of it to
work and I
>>>>>>>> am almost there. In the process I got a bunch of questions.
>>>>>>>>
>>>>>>>> Questions (some of them might be just faban related, not
olio so
>>>>>>>> bear with me):
>>>>>>>> 1) In there any way to deploy OlioDriver.jar through the
command
>>>>>>>> line? Firefox through ssh forwarding is dead slow and I d
rather avoid if I
>>>>>>>> can.
>>>>>>>>
>>>>>>>
>>>>>>> Just drop the jar into faban/benchmarks/ and it will deploy itself.
>>>>>>> This is documented at
>>>>>>> http://faban.sunsource.net/1.0/docs/guide/harnessdev/deploybenchmark.htmlunder
"Alternate Deployment Methods."
>>>>>>>
>>>>>>>
>>>>>>>  2) The services ApacheHttpdService, MemcachedService, MySQLService
>>>>>>>> that come with Faban should be deployed before running Olio?
>>>>>>>>    I was getting some very weird errors. e.g.
>>>>>>>>
>>>>>>>
>>>>>>> Yes, you should. Olio will search for those.
>>>>>>>
>>>>>>> Done
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> 03:50:27:WARNING:UIDriverAgent[0]: Forcefully terminating
benchmark
>>>>>>>> run
>>>>>>>> 03:50:27:WARNING:UIDriverAgent[0]: 25 threads forcefully
terminated.
>>>>>>>> java.lang.Throwable: Stack of non-terminating thread.
>>>>>>>>    at java.net.SocketInputStream.socketRead0 (null)
>>>>>>>>    at java.net.SocketInputStream.read (129)
>>>>>>>>    at java.io.FilterInputStream.read (116)
>>>>>>>>    at com.sun.faban.driver.transport.util.TimedInputStream.read
>>>>>>>> (139)
>>>>>>>>    at java.io.BufferedInputStream.fill (218)
>>>>>>>>    at java.io.BufferedInputStream.read (237)
>>>>>>>>    at org.apache.commons.httpclient.HttpParser.readRawLine
(78)
>>>>>>>>    at org.apache.commons.httpclient.HttpParser.readLine (106)
>>>>>>>>    at org.apache.commons.httpclient.HttpConnection.readLine
(1116)
>>>>>>>>    at org.apache.commons.httpclient.HttpMethodBase.readStatusLine
>>>>>>>> (1973)
>>>>>>>>    at org.apache.commons.httpclient.HttpMethodBase.readResponse
>>>>>>>> (1735)
>>>>>>>>    at org.apache.commons.httpclient.HttpMethodBase.execute
(1098)
>>>>>>>>    at
>>>>>>>> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry
(398)
>>>>>>>>    at org.apache.commons.httpclient.HttpMethodDirector.executeMethod
>>>>>>>> (171)
>>>>>>>>    at org.apache.commons.httpclient.HttpClient.executeMethod
(397)
>>>>>>>>    at org.apache.commons.httpclient.HttpClient.executeMethod
(323)
>>>>>>>>    at com.sun.faban.driver.transport.hc3.ApacheHC3Transport.fetchURL
>>>>>>>> (529)
>>>>>>>>    at com.sun.faban.driver.transport.hc3.ApacheHC3Transport.fetchURL
>>>>>>>> (552)
>>>>>>>>    at org.apache.olio.workload.driver.UIDriver.doHomePage
(355)
>>>>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
>>>>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke (39)
>>>>>>>>    at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
>>>>>>>>    at java.lang.reflect.Method.invoke (597)
>>>>>>>>    at com.sun.faban.driver.engine.TimeThread.doRun (169)
>>>>>>>>    at com.sun.faban.driver.engine.AgentThread.run (202)
>>>>>>>>
>>>>>>>> and afterwards the master was waiting for threads to join
for
>>>>>>>> ever... (I attached gdb to verify that something was wrong)
and hence I had
>>>>>>>> to kill the benchmark.
>>>>>>>>
>>>>>>>
>>>>>>> These threads are hanging reading the server responses, that
never
>>>>>>> came.
>>>>>>>
>>>>>>>
>>>>>> Building the services from Faban probably fixes it.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> In the Olio log there are WARNINGS  complaining about not
deploying
>>>>>>>> those. After building those and manually copying them to
/faban/services
>>>>>>>> (ant deploy did not place them there... :-(  )
>>>>>>>>
>>>>>>>
>>>>>>> Yes. But ant deploy should get them there. If not, can you please
let
>>>>>>> me know the ant messages?
>>>>>>
>>>>>>
>>>>>> Ant was deploying them indeed. I had a mistake in building.properties.
>>>>>> I had:  faban.url=http://<hostname>:9980/   instead of  faban.url=
>>>>>> http://localhost:9980/
>>>>>> After I changed that it started working...
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>  it worked. (mostly worked)
>>>>>>>>
>>>>>>>> 3) I still have warnings like:
>>>>>>>> 01:38:08:INFO:Time difference to host olio-web is 269 ms.
Attempting
>>>>>>>> to set clock.
>>>>>>>> 01:38:08:INFO:Time difference to host olio-db is 263 ms.
Attempting
>>>>>>>> to set clock.
>>>>>>>>
>>>>>>>
>>>>>>> These two are OK. Just trying to do a clock sync between the
systems.
>>>>>>>
>>>>>>>
>>>>>>>  01:38:08:WARNING:olio-web wakeup-before time reached 700ms limit.
>>>>>>>> System is too busy. Giving up.
>>>>>>>>
>>>>>>>
>>>>>>> This is one of Faban's clock-setting calibrations. If the system
is
>>>>>>> too busy or you run on some virtualization architectures, the
lag time
>>>>>>> between an intended end of sleep and the actual time when the
thread really
>>>>>>> wakes up (gets scheduled/executed) is too high, calibrations
will fail.
>>>>>>>
>>>>>>>
>>>>>>>  01:38:08:INFO:Time difference to host olio-mem is 262 ms. Attempting
>>>>>>>> to set clock.
>>>>>>>> 01:38:10:WARNING:olio-db wakeup-before time reached 700ms
limit.
>>>>>>>> System is too busy. Giving up.
>>>>>>>> 09:38:09:WARNING:[date, -u, 011309382010.10]
>>>>>>>> stderr:
>>>>>>>> date: cannot set date: Operation not permitted
>>>>>>>> 09:38:09:WARNING:Error on "[date, -u, 011309382010.10]" command
>>>>>>>> trying to set the date. Exit value: 1
>>>>>>>> 09:38:10:WARNING:[date, -u, 011309382010.11]
>>>>>>>> stderr:
>>>>>>>> date: cannot set date: Operation not permitted
>>>>>>>> 09:38:10:WARNING:Error on "[date, -u, 011309382010.11]" command
>>>>>>>> trying to set the date. Exit value: 1
>>>>>>>> 09:38:09:WARNING:[date, -u, 011309382010.10]
>>>>>>>> stderr:
>>>>>>>> date: cannot set date: Operation not permitted
>>>>>>>> 09:38:09:WARNING:Error on "[date, -u, 011309382010.10]" command
>>>>>>>> trying to set the date. Exit value: 1
>>>>>>>> 09:38:10:WARNING:[date, -u, 011309382010.11]
>>>>>>>> stderr:
>>>>>>>> date: cannot set date: Operation not permitted
>>>>>>>>
>>>>>>>> Leting faban change the vm clock sounds from the beginning
a bad
>>>>>>>> idea.
>>>>>>>>
>>>>>>>
>>>>>>> OK. So it is xen. Yes, this is what Faban is trying to solve.
You can
>>>>>>> certainly turn it off. Please see:
>>>>>>>   http://faban.sunsource.net/1.0/docs/howd services
>>>>>>> ApacheHttpdService, MemcachedService, MySQLService that come
with Faban
>>>>>>> should be deployed before running Olio?
>>>>>>>    I was gettingoi/physclocksync.html<http://faban.sunsource.net/1.0/docs/howdoi/physclocksync.html>
>>>>>>>
>>>>>>>
>>>>>> I added the  <fh:timeSync>false<fh:timeSync> in my run.xml
file ( btw
>>>>>> in the link above there is a mistake :  <fh:timeSync>false
>>>>>> </fh:timeSync> is correct, the second <fh:timeSync> needs
a closing
>>>>>> tag, the "/" is missing)
>>>>>> that made the warnings go away.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>  Unfortunately, xen is really bad in maintaining an accurate
clock.
>>>>>>>> As a result there is usually time difference between the
different virtual
>>>>>>>> machines
>>>>>>>> of more than 10ms. I went over the setTime function in Faban
source
>>>>>>>> (/faban/com/sun/faban/harness/agent/CmdAgentImpl.java), it's
big and ugly
>>>>>>>> (very ugly)
>>>>>>>>
>>>>>>>
>>>>>>> Thanks for the compliments! I think you mean CmdService.setClockTask.
>>>>>>> Time sensitive code ain't pretty. It is the complexities dealing
with the
>>>>>>> clock and trying to achieve good accuracy. If you think you can
simplify
>>>>>>> this, I'm listening (without loosing the accuracy, of course).
In
>>>>>>> comparison, CmdAgentImpl has nothing.
>>>>>>>
>>>>>>>
>>>>>> Yes, you r right it is CmdService.setClockTask. The previous email
was
>>>>>> composed at 3am ... :-)
>>>>>> I am still a little confused.  the setClockTask is used to set the
>>>>>> clock so that all the machines are synchronized with master. From
what you
>>>>>> mentioned the physical clock sync is only used for the logs.
>>>>>> Why do we need to do that since 1) it requires root privileges (which
>>>>>> might not be always available) 2) I could imagine an alternative
that uses
>>>>>> deltas from the actual physical clock without having to set it.
>>>>>> ( I am probably missing something... :-)
>>>>>>
>>>>>>
>>>>>>
>>>>>>>  Why there is this strict requirement for 10ms difference? Any
ideas?
>>>>>>>>
>>>>>>>
>>>>>>> It is easily achievable in most cases. May not be true for VMs.
>>>>>>>
>>>>>>> On some VM architectures, the OS however does not get scheduled
till
>>>>>>> way after that, thus causing problems. You may be able to measure
>>>>>>> performance on those VMs. But you don't want to use such VMs
to be a driver.
>>>>>>> Your response time measurements will be way off.
>>>>>>>
>>>>>>> The physical clock sync is not really rigorous. And you can turn
it
>>>>>>> off. It is more to keep the systems in good time sync. If your
VM stands in
>>>>>>> the way, just turn it off. The driver's virtual clock sync is
much more
>>>>>>> picky in comparison. This is because the start time for the steady
state
>>>>>>> should be the same (with a very small tolerance) no matter how
many drivers
>>>>>>> are driving. Otherwise the measurement period won't be the same
when viewed
>>>>>>> from different drivers and the results won't be reliable.
>>>>>>>
>>>>>>>
>>>>>>>  Even with ntp it's hard to provide the 10ms guarantee.
>>>>>>>>
>>>>>>>
>>>>>>> That's why we don't use ntp ;-)
>>>>>>
>>>>>>
>>>>>> Just out of curiosity, the physical clocks are set only once at the
>>>>>> beginning (right?), therefore for long runs the 10ms difference will
not be
>>>>>> guaranteed. Nope? Especially under VMs I 've seen significant clock
>>>>>> difference withing a few minutes.
>>>>>> At least ntp can periodically resync (of course doing so, might screw
>>>>>> up the logs with time going backwards etc)
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  I am thinking of modifying this function to always return that
the
>>>>>>>> time difference is less than 10ms (so that I do not have
to wait all the
>>>>>>>> time for the timeouts.)
>>>>>>>>
>>>>>>>
>>>>>>> Why bother. Don't like it, just turn it off. It has good use
in most
>>>>>>> configurations we're dealing with. And, it avoids ntp inaccuracies.
>>>>>>>
>>>>>>>
>>>>>>>  Will this break anything in Olio?
>>>>>>>>
>>>>>>>
>>>>>>> Nope. Except the times in your logs will appear out of sequence.
They
>>>>>>> rely on the local time on the originating systems.
>>>>>>>
>>>>>>>
>>>>>>>> 4) Warning like:
>>>>>>>> 09:39:48:WARNING:Image at
>>>>>>>> http://olio-web:80/fileService.php?cache=false&file=e168t.jpg
<
>>>>>>>> http://olio-web:80/fileService.php?cache=false&file=e168t.jpg>
size
>>>>>>>> of 249 bytes is too small. Image may not exist
>>>>>>>> can be ignored, right?
>>>>>>>>
>>>>>>>
>>>>>>> Well, something is wrong. We don't have images that small. Check
>>>>>>> whether e168t.jpg is really that small. That's why we have that
warning.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> It kinda funny, my problem was that I had the olio webkit version
>>>>>> installed and then I downloaded the version from the online svn repository.
>>>>>> I built the driver but forgot to update the webpage for my apache
server.
>>>>>> Which
>>>>>> as expected was the source for many of my issues.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> 5) Last and most important.
>>>>>>>> I can run the benchmark and all the operation succeed but
for login.
>>>>>>>> I get a bunch of:
>>>>>>>>
>>>>>>>> 09:39:25:WARNING:UIDriverAgent[0].2.doLogin: Found login
prompt at
>>>>>>>> index 2926, Login as at786o08x, 2178 failed.
>>>>>>>> Note: Error not counted in result.
>>>>>>>> Either transaction start or end time is not within steady
state.
>>>>>>>> java.lang.RuntimeException: Found login prompt at index 2926,
Login
>>>>>>>> as at786o08x, 2178 failed.
>>>>>>>>    at org.apache.olio.workload.driver.UIDriver.doLogin (404)
>>>>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
>>>>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke (39)
>>>>>>>>    at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
>>>>>>>>    at java.lang.reflect.Method.invoke (597)
>>>>>>>>    at com.sun.faban.driver.engine.TimeThread.doRun (169)
>>>>>>>>    at com.sun.faban.driver.engine.AgentThread.run (202)
>>>>>>>>
>>>>>>>> Any ideas? I do get
>>>>>>>>
>>>>>>>
>>>>>>> You likely have cookie issues. It can't seem to hold on to a
session.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Well there was a permission issue with the http_session dir. I could
>>>>>> not right to it. chmod 777 it fixed this.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> (I ve found online:
>>>>>>>> http://www.mail-archive.com/olio-dev@incubator.apache.org/msg00647.htmlwhich
is similar, but when I added
>>>>>>>>
>>>>>>>> com.sun.faban.driver.transport.sunhttp.ThreadCookieHandler.level=FINER
>>>>>>>>  in build.properties
>>>>>>>> I did not see any cookie related warnings. Those should appear
in
>>>>>>>> the olio run log or the apache log, right? Am i just looking
at the wrong
>>>>>>>> place? )
>>>>>>>>
>>>>>>>
>>>>>>> Yes, that's applicable only to the Sun Http Transport. The version
of
>>>>>>> Olio you're using is based on the Apache Http Transport (Apache
HttpClient
>>>>>>> 3.1). The ThreadCookieHandler is not used for the Apache transport
and
>>>>>>> that's why you don't see any logs. Try upgrade to Faban 1.0 before
looking
>>>>>>> at other things.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> It's a long email I know. Your feedback would be most appreciated.
>>>>>>>>
>>>>>>>> -Regards
>>>>>>>> -------------------------------------------------------------------
>>>>>>>> Kontorinis Vasileios
>>>>>>>> Phd student, University of California San Diego
>>>>>>>> San Diego, CA 92122
>>>>>>>> Cell. phone: (858) 717 6899
>>>>>>>> bkontorinis@gmail.com <mailto:bkontorinis@gmail.com>,
>>>>>>>> vkontori@ucsd.edu <mailto:vkontori@ucsd.edu>
>>>>>>>> -------------------------------------------------------------------\
>>>>>>>>
>>>>>>>
>>>>>>> Thanks for all the questions/comments.
>>>>>>>
>>>>>>> -Akara
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> And now some more questions/ comments:
>>>>>> 1) I get the following error:
>>>>>>
>>>>>> 15:13:05:SEVERE:CmdService: Getting - exception reading
>>>>>> /usr/data/olio-db.err
>>>>>> java.io.FileNotFoundException: File /usr/data/olio-db.err does not
>>>>>> exist.
>>>>>>     at com.sun.faban.common.FileTransfer.<init> (70)
>>>>>>     at com.sun.faban.harness.agent.FileAgentImpl.get (315)
>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke (39)
>>>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
>>>>>>     at java.lang.reflect.Method.invoke (597)
>>>>>>     at sun.rmi.server.UnicastServerRef.dispatch (305)
>>>>>>     at sun.rmi.transport.Transport$1.run (159)
>>>>>>     at java.security.AccessController.doPrivileged (null)
>>>>>>     at sun.rmi.transport.Transport.serviceCall (155)
>>>>>>     at sun.rmi.transport.tcp.TCPTransport.handleMessages (535)
>>>>>>     at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0
(790)
>>>>>>     at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run (649)
>>>>>>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask (885)
>>>>>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run (907)
>>>>>>     at java.lang.Thread.run (619)
>>>>>>     at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer
>>>>>> (255)
>>>>>>     at sun.rmi.transport.StreamRemoteCall.executeCall (233)
>>>>>>     at sun.rmi.server.UnicastRef.invoke (142)
>>>>>>     at com.sun.faban.harness.agent.FileAgentImpl_Stub.get (null)
>>>>>>     at com.sun.faban.harness.engine.CmdService.get (1334)
>>>>>>     at com.sun.faban.harness.RunContext.getFile (346)
>>>>>>     at com.sun.services.MySQLService.getLogs (197)
>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke (39)
>>>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
>>>>>>     at java.lang.reflect.Method.invoke (597)
>>>>>>     at com.sun.faban.harness.util.Invoker.invoke (98)
>>>>>>     at com.sun.faban.harness.services.ServiceWrapper.getLogs (200)
>>>>>>     at com.sun.faban.harness.services.ServiceManager.getLogs (642)
>>>>>>     at com.sun.faban.harness.engine.GenericBenchmark.start (323)
>>>>>>     at com.sun.faban.harness.engine.RunDaemon.run (338)
>>>>>>     at java.lang.Thread.run (619)
>>>>>> 15:13:05:WARNING:Could not copy /usr/data/olio-db.err to
>>>>>> /home/gdhiman/faban.1.0/faban/output/OlioDriver.2D/mysql_err.log.olio-db
>>>>>>
>>>>>> Apparently something is misconfigured in my db-server. Any ideas?
>>>>>>
>>>>>> 2) I get the following error:
>>>>>> 15:13:16:WARNING:[/home/gdhiman/faban.1.0/faban/bin/fenxi, process,
>>>>>> /home/gdhiman/faban.1.0/faban/output/OlioDriver.2D/,
>>>>>> /home/gdhiman/faban.1.0/faban/output/OlioDriver.2D//post/, OlioDriver.2D]
>>>>>> stderr:
>>>>>> Error in executing perl
>>>>>> /home/gdhiman/faban.1.0/faban/master/webapps/fenxi/txt2db/mpstat.pl
>>>>>> Error in executing perl
>>>>>> /home/gdhiman/faban.1.0/faban/master/webapps/fenxi/txt2db/mpstat.pl
>>>>>>
>>>>>> Actually I traced back this one. The problem is the difference in
>>>>>> output format of the Sun's mpstat and default GNU mpstat.
>>>>>> This is my output of my mpstat:
>>>>>>
>>>>>> gdhiman@olio-client00:~/faban.1.0/faban/output/OlioDriver.2D$ mpstat
>>>>>> 1
>>>>>> Linux 2.6.18.8-xen (olio-client00)     01/16/10
>>>>>>
>>>>>> 16:25:06     CPU   %user   %nice    %sys %iowait    %irq   %soft
>>>>>> %steal   %idle    intr/s
>>>>>> 16:25:07     all    0.00    0.00    0.00    0.00    0.00    0.00
>>>>>> 0.00  100.00     52.48
>>>>>> 16:25:08     all    0.00    0.00    0.00    0.00    0.00    0.00
>>>>>> 0.00  100.00     50.50
>>>>>> 16:25:09     all    0.00    0.00    0.00    0.00    0.00    0.00
>>>>>> 0.00  100.00     79.21
>>>>>> 16:25:10     all    0.00    0.00    0.00    0.00    0.00    0.00
>>>>>> 0.00  100.00     45.54
>>>>>> 16:25:11     all    0.00    0.00    0.00    0.00    0.00    0.00
>>>>>> 0.00  100.00     55.45
>>>>>>
>>>>>> The first line as well as the time at the beginning of each entry
>>>>>> messing up the parsing at mpstat.pl. (also the fields are different)
>>>>>>   Any plans to support this??
>>>>>>
>>>>>> 3) Scaling questions.
>>>>>> - So far I did not have a single experiment passing. Some are pretty
>>>>>> close with only one metric check failing.
>>>>>>
>>>>>> Average images loaded per Home Page2.79>= 3
>>>>>> FAILED
>>>>>> Any ideas? Is it the case that the disc is not fast enough? I am
just
>>>>>> using the local filesystem for the filestore.
>>>>>>
>>>>>> - As I double the number of concurrent users I observe linear scaling
>>>>>> in the thoughput.
>>>>>> Con Users         Throughput
>>>>>>  25                        4.967
>>>>>>  50                       10.06
>>>>>> 100                      19.375
>>>>>> 200                      40.21
>>>>>> 400                      75.818
>>>>>> 800                       0.383
>>>>>> 1000                     0.483
>>>>>>
>>>>>> The linear scaling stops for 400 concurrent users ( only one agent).
>>>>>> Actually it would be exactly linear (value of ~80) but almost half
of the
>>>>>> login operations failed. I am looking into it.
>>>>>> Any insights on what might be the first thing failing?
>>>>>>
>>>>>> For the 800 and 1000 experiments there are no failed operations
>>>>>> logged. It looks like those are being discarded... (?)
>>>>>>
>>>>>> Bonus question:
>>>>>> In the runtime statistics
>>>>>> <runtimeStats enabled="true">
>>>>>>          <interval>30</interval>
>>>>>>  </runtimeStats>
>>>>>>
>>>>>> only the 90% response time is reported. Is there an easy way to also
>>>>>> report the 99% ? ( or I need to add code for that?)
>>>>>>
>>>>>>
>>>>>> Thanks a lot again in advance.
>>>>>> -VK
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message