incubator-olio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shanti Subramanyam <shanti.subraman...@gmail.com>
Subject Re: Olio Scaling
Date Fri, 12 Feb 2010 23:30:32 GMT
If you want to run multiple webservers on different systems, you must have
access to the filestore from all of them. The easiest way to do this is to
nfs-mount the filestore from the server it resides on so it is accessible to
the other machines as well.

Shanti

On Thu, Feb 11, 2010 at 9:42 PM, Vasileios Kontorinis <bkontorinis@gmail.com
> wrote:

> Shanti hi again,
>    Sorry for not submitting the JIRA on time, I am extremely busy lately.
>
> I have a fast question regarding the way the webserver interacts with the
> filestore. I run some scaling studies with one, two and three different
> server while having only one filestore (I do specify that in the run.xml
> configuration file, webServer and dataStorage ).
> The filestore is a local folder on one of the server machines. However, in
> the oliophp/etc/config.php I also specify on each server
>
> $olioconfig['fileSystem'] = 'LocalFS';
> $olioconfig['localfsRoot'] = '/home/gdhiman/filestore';
>
> As a result, I do get WARNINGS for missing files on the webserver that do
> not host a filestore. What is the right configuration for
> oliophp/etc/config.php? Can I somehow detach the filestore from the
> webserver so that it requests files remotely?
>
>
> Thanks again.
> -------------------------------------------------------------------
> Kontorinis Vasileios
> Phd student, University of California San Diego
> San Diego, CA 92122
> Cell. phone: (858) 717 6899
> bkontorinis@gmail.com, vkontori@ucsd.edu
> -------------------------------------------------------------------
>
>
> 2010/2/8 Shanti Subramanyam <shanti.subramanyam@gmail.com>
>
>
>>
>> On Mon, Feb 8, 2010 at 3:53 PM, Vasileios Kontorinis <
>> bkontorinis@gmail.com> wrote:
>>
>>>
>>>
>>>> We need to look into this issue  - I suspect that something subtle has
>>>> changed in 0.2 which hasn't got accounted for in the expected #images
>>>> loaded. Can I please request that you file a JIRA on this ?
>>>>
>>>
>>> How do I do this? Pointers?
>>>
>>
>> http://issues.apache.org
>>
>>
>>> I tried runs of 20mins to verify that longer runs will not make it better
>>> and it's still failing for just 50 users.
>>>
>>
>> What worries me is that you're saying it  fails for 1800 users too - I can
>> understand it may fail for 50 users, but if it fails for larger #users, then
>> it is a bug.
>>
>>>
>>>
>>
>>> and I do get the repetitive patterns you mentioned. However, the cache_MB
>>> though never exceeds 0.05...
>>> I would expect that memcache size is really important for the application
>>> scaling. What is the point of having a separate memcache server if we are
>>> only using less than 50KB(?) of memory for caching?
>>>
>>>
>> Try running without memcached - it can be easily configured in the app's
>> etc/config.php. Then you will see what different the cache makes. The
>> reduction in db traffic is dramatic resulting in the response times you see.
>> The reason the size is small is because we are currently only caching the
>> home page which is shared. We have not bothered to implement any additional
>> caching as this level of caching is sufficient to reduce the db load.
>>
>> Regards
>>> -VK
>>>
>>>  Shanti
>>
>>>
>>>
>>>> Shanti
>>>>
>>>>
>>>>> Thanks again
>>>>> -------------------------------------------------------------------
>>>>> Kontorinis Vasileios
>>>>> Phd student, University of California San Diego
>>>>> San Diego, CA 92122
>>>>> Cell. phone: (858) 717 6899
>>>>> bkontorinis@gmail.com, vkontori@ucsd.edu
>>>>> -------------------------------------------------------------------
>>>>>
>>>>>
>>>>> 2010/1/27 Shanti Subramanyam <shanti.subramanyam@gmail.com>
>>>>>
>>>>>> Yes - these are problems that I'm already aware of.
>>>>>> The best solution to the filestore issue is to change ownership of the
>>>>>> directory to the same user/group as the apache process. We could have the
>>>>>> fileloader.sh change write access I guess, but since that's a big security
>>>>>> hole, we may not want to do that automatically without letting the user know
>>>>>> about it.
>>>>>>
>>>>>> The fact that your response times are so high indicate that you're
>>>>>> running a far larger load than the system can handle and/or you still need
>>>>>> some tuning.
>>>>>> I suggest you start over from say 100 users and see at what point your
>>>>>> response times start getting really large. The apache error log should be
>>>>>> pulled in as part of the 'Statistics' tab, so do keep monitoring that.
>>>>>>
>>>>>> Shanti
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 27, 2010 at 1:34 AM, Vasileios Kontorinis <
>>>>>> bkontorinis@gmail.com> wrote:
>>>>>>
>>>>>>> Shanti hi again,
>>>>>>>    I checked my apache logs and there were a bunch of errors.
>>>>>>> It looks like there some issues with the
>>>>>>> webapp/php/trunk/classes/ImageUtil.php in the last release of olio. (I
>>>>>>> downloaded
>>>>>>> http://www.alliedquotes.com/mirrors/apache/incubator/olio/0.2/apache-olio-php-src-0.2.tar.gz)
>>>>>>> 1) There is a line that needs to be commented. php complains ("1.5.
>>>>>>> Must be greater than zero.").
>>>>>>> 2) Then, it was complaining that it cannot find function
>>>>>>> fastimagecopyresampled . To work around that moved the function
>>>>>>> fastimagecopyresampled above createThumb (this might not  be required ) and
>>>>>>> deplared it static.
>>>>>>>     Finally,  I call the function from createThumb with
>>>>>>> self::fastimagecopyresampled .
>>>>>>> 3) Then, it started complaining because it could not write to the
>>>>>>> filestore. The problem is that wants to write the new images as www-data
>>>>>>> from the apache, while the filestore does not have write persmission for
>>>>>>> others. Manually,
>>>>>>>     giving access solves the problem (chmod -R o+w <path>/filestore)
>>>>>>> but since the directories in filestore are generated automatically, maybe
>>>>>>> the chmod command should be added in fileloader.sh
>>>>>>>
>>>>>>> Funnily enough, after fixing those issues, I still cannot pass the:
>>>>>>> Average images loaded per Home Page 2.65   >=3       FAILED
>>>>>>>
>>>>>>> and on top of that I also have:
>>>>>>> Response Times (secs)
>>>>>>> AddPerson     5.190  13.194  3.387 8.800     3.000 FAILED
>>>>>>> AddEvent       5.904  16.784  3.159 10.400   4.000 FAILED
>>>>>>>
>>>>>>> Think tims for AddPerson and AddEvent fail as well.
>>>>>>>
>>>>>>> Any insights are welcome .... :-(
>>>>>>>
>>>>>>> -------------------------------------------------------------------
>>>>>>> Kontorinis Vasileios
>>>>>>> Phd student, University of California San Diego
>>>>>>> San Diego, CA 92122
>>>>>>> Cell. phone: (858) 717 6899
>>>>>>> bkontorinis@gmail.com, vkontori@ucsd.edu
>>>>>>> -------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> 2010/1/26 Shanti Subramanyam <shanti.subramanyam@gmail.com>
>>>>>>>
>>>>>>>> Yes - 0.2 requires a lot more disk space as we changed the ratio of
>>>>>>>> concurrent users to registered users to 1:100. If you haven't already,
>>>>>>>> please check out our published Blueprints for detailed performance
>>>>>>>> characteristics of the workload:
>>>>>>>> Deploying Web 2.0 Applications on Sun Servers and the OpenSolaris
>>>>>>>> Operating System<http://wikis.sun.com/display/BluePrints/Deploying+Web+2.0+Applications+on+Sun+Servers+and+the+OpenSolaris+Operating+System>
>>>>>>>> <http://wikis.sun.com/display/BluePrints/Deploying+Web+2.0+Applications+on+Sun+Servers+and+the+OpenSolaris+Operating+System>
>>>>>>>> If you run for long enough, you should get passing runs. Have you
>>>>>>>> verified that there are no errors in the run logs when you see the 'Avg.
>>>>>>>> images loaded per home page' fail ?
>>>>>>>>
>>>>>>>> On to your open files error  - you may have to tune your networking
>>>>>>>> tier and/or #open file descriptors. I don't believe we have ever seen as
>>>>>>>> many files open as you are seeing. Can you determine whether these are from
>>>>>>>> the file store or network ? We also typically run the filestore on a
>>>>>>>> different system and nfs-mount it on the webserver box.
>>>>>>>> You will have to tune your system to ensure good performance since
>>>>>>>> you will need memory for both apache and files.
>>>>>>>>
>>>>>>>> Shanti
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jan 25, 2010 at 5:06 PM, Vasileios Kontorinis <
>>>>>>>> bkontorinis@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Akara and Shanti hi,
>>>>>>>>>    I did migrate to Olio 0.2. With the last version of Olio I came
>>>>>>>>> across some new interesting things.
>>>>>>>>>
>>>>>>>>> Scaling issues:
>>>>>>>>>   - I am still getting the:
>>>>>>>>> Average images loaded per Home Page2.55>= 3
>>>>>>>>> FAILED
>>>>>>>>>  - additionally, when I scale the concurrent users to 800 I run out
>>>>>>>>> of diskspace since my filestore occupies more than 62GB.
>>>>>>>>> Actually for 600 users it occupies 50GB. I was curious if that
>>>>>>>>> makes sense. How much space I will need to reach 1000 users?
>>>>>>>>> In the php_setup.html it suggests that we will need 50GB but
>>>>>>>>> apparently we need way more for large number of users.
>>>>>>>>>
>>>>>>>>>  - Finally and most importantly, for 600 users many of the
>>>>>>>>> operations fail with the exception:
>>>>>>>>> Message: java.net.SocketException: Too many open files
>>>>>>>>> Stack Trace:
>>>>>>>>>  Class Method Line java.net.PlainSocketImpl socketAccept
>>>>>>>>> java.net.PlainSocketImpl accept 390 java.net.ServerSocket
>>>>>>>>> implAccept 453 java.net.ServerSocket accept 421
>>>>>>>>> sun.rmi.transport.tcp.TCPTransport$AcceptLoop executeAcceptLoop
>>>>>>>>> 369 sun.rmi.transport.tcp.TCPTransport$AcceptLoop run 341
>>>>>>>>> java.lang.Thread run 619
>>>>>>>>> or
>>>>>>>>>
>>>>>>>>> java.net.SocketException: Too many open files
>>>>>>>>> Stack Trace:
>>>>>>>>>  Class Method Line java.net.Socket createImpl 394 java.net.Socket
>>>>>>>>> getImpl 457 java.net.Socket bind 571
>>>>>>>>> com.sun.faban.driver.transport.hc3.ProtocolTimedSocketFactory
>>>>>>>>> createSocket 60 org.apache.commons.httpclient.HttpConnection open
>>>>>>>>> 707 org.apache.commons.httpclient.HttpMethodDirector
>>>>>>>>> executeWithRetry 387
>>>>>>>>> org.apache.commons.httpclient.HttpMethodDirector executeMethod 171
>>>>>>>>> org.apache.commons.httpclient.HttpClient executeMethod 397
>>>>>>>>> org.apache.commons.httpclient.HttpClient executeMethod 323
>>>>>>>>> com.sun.faban.driver.transport.hc3.ApacheHC3Transport readURL 274
>>>>>>>>> org.apache.olio.workload.driver.UIDriver doLogin 398
>>>>>>>>> org.apache.olio.workload.driver.UIDriver doLogin 424
>>>>>>>>> sun.reflect.GeneratedMethodAccessor8 invoke
>>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl invoke 25
>>>>>>>>> java.lang.reflect.Method invoke 597
>>>>>>>>> com.sun.faban.driver.engine.TimeThread doRun 169
>>>>>>>>> com.sun.faban.driver.engine.AgentThrea
>>>>>>>>>
>>>>>>>>> I am monitoring the number of open files in the web-server with
>>>>>>>>> `watch "lsof | wc"` and the olio starts failing when around 65000-70,000
>>>>>>>>> files are open. lsof shows that for each apache2 thread there are around 100
>>>>>>>>> files open. Therefore there are around 650-700 different apache2 threads
>>>>>>>>> that create the bulk of those open file descriptors.
>>>>>>>>> The soft and hard limit is set to 403238, which means that there
>>>>>>>>> should be many more open files before it will start failing.
>>>>>>>>> (Actually, I verified the limit by opening a bunch of files with a
>>>>>>>>> python script and it does reach the limitation of 403238.)
>>>>>>>>> Any insights?  Is there any chance the the file descriptors take
>>>>>>>>> more time that usual to be reclaimed after being closed in the xen vm I use
>>>>>>>>> for my web-server? Does it make sense for olio at the first place to have so
>>>>>>>>> many files open at the same time?
>>>>>>>>>
>>>>>>>>> Thanks again.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -------------------------------------------------------------------
>>>>>>>>> Kontorinis Vasileios
>>>>>>>>> Phd student, University of California San Diego
>>>>>>>>> San Diego, CA 92122
>>>>>>>>> Cell. phone: (858) 717 6899
>>>>>>>>> bkontorinis@gmail.com, vkontori@ucsd.edu
>>>>>>>>> -------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2010/1/16 Shanti Subramanyam <shanti.subramanyam@gmail.com>
>>>>>>>>>
>>>>>>>>>  I would really recommend that you migrate to Olio 0.2. In addition
>>>>>>>>>> to bug fixes, there are some major features changes in it. See Olio
>>>>>>>>>> 0.2 released<http://perfwork.wordpress.com/2010/01/13/olio-0-2-relesed/>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Shanti
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, Jan 16, 2010 at 4:49 PM, Vasileios Kontorinis <
>>>>>>>>>> bkontorinis@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Akara hi again,
>>>>>>>>>>>    Below I have comments on your suggestions and at the end some
>>>>>>>>>>> bonus questions... Thanks again.
>>>>>>>>>>>
>>>>>>>>>>> 2010/1/13 Akara Sucharitakul <Akara.Sucharitakul@sun.com>
>>>>>>>>>>>
>>>>>>>>>>>> With your permission, I'd like to copy the Olio and Faban user
>>>>>>>>>>>> aliases going forward. I feel it will help a much wider audience. Please see
>>>>>>>>>>>> below for answers/comments:
>>>>>>>>>>>>
>>>>>>>>>>>> Sure. I cced olio user alias. I am not sure which is the right
>>>>>>>>>>> faban list.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Vasileios Kontorinis wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Akara hi,
>>>>>>>>>>>>>   I am a grad student at UCSD and I use Olio for a research
>>>>>>>>>>>>> project where we want to measure olio performance under live virtual machine
>>>>>>>>>>>>> migration. We use ubuntu 8.04 on nehalem servers.
>>>>>>>>>>>>> I have co ed the last version of olio from the online svn
>>>>>>>>>>>>> repository and downloaded the last version of faban (faban-kit-101509.tar.gz
>>>>>>>>>>>>> <http://faban.sunsource.net/nightly/faban-kit-101509.tar.gz>)
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 101509 is fairly recent. But the latest on the web site is
>>>>>>>>>>>> 111109 (Faban 1.0). There were just bug fixes between those releases.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I have upgraded to Faban 1.0, still using olio1.0 though ( the
>>>>>>>>>>> release of 2.0 was announced, will switch to it if I run into bugs that have
>>>>>>>>>>> been fixed)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> So far, I employed a bunch of hacks to get most of it to work
>>>>>>>>>>>>> and I am almost there. In the process I got a bunch of questions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Questions (some of them might be just faban related, not olio
>>>>>>>>>>>>> so bear with me):
>>>>>>>>>>>>> 1) In there any way to deploy OlioDriver.jar through the
>>>>>>>>>>>>> command line? Firefox through ssh forwarding is dead slow and I d rather
>>>>>>>>>>>>> avoid if I can.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Just drop the jar into faban/benchmarks/ and it will deploy
>>>>>>>>>>>> itself. This is documented at
>>>>>>>>>>>> http://faban.sunsource.net/1.0/docs/guide/harnessdev/deploybenchmark.htmlunder "Alternate Deployment Methods."
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  2) The services ApacheHttpdService, MemcachedService,
>>>>>>>>>>>>> MySQLService that come with Faban should be deployed before running Olio?
>>>>>>>>>>>>>    I was getting some very weird errors. e.g.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, you should. Olio will search for those.
>>>>>>>>>>>>
>>>>>>>>>>>> Done
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> 03:50:27:WARNING:UIDriverAgent[0]: Forcefully terminating
>>>>>>>>>>>>> benchmark run
>>>>>>>>>>>>> 03:50:27:WARNING:UIDriverAgent[0]: 25 threads forcefully
>>>>>>>>>>>>> terminated.
>>>>>>>>>>>>> java.lang.Throwable: Stack of non-terminating thread.
>>>>>>>>>>>>>    at java.net.SocketInputStream.socketRead0 (null)
>>>>>>>>>>>>>    at java.net.SocketInputStream.read (129)
>>>>>>>>>>>>>    at java.io.FilterInputStream.read (116)
>>>>>>>>>>>>>    at com.sun.faban.driver.transport.util.TimedInputStream.read
>>>>>>>>>>>>> (139)
>>>>>>>>>>>>>    at java.io.BufferedInputStream.fill (218)
>>>>>>>>>>>>>    at java.io.BufferedInputStream.read (237)
>>>>>>>>>>>>>    at org.apache.commons.httpclient.HttpParser.readRawLine (78)
>>>>>>>>>>>>>    at org.apache.commons.httpclient.HttpParser.readLine (106)
>>>>>>>>>>>>>    at org.apache.commons.httpclient.HttpConnection.readLine
>>>>>>>>>>>>> (1116)
>>>>>>>>>>>>>    at
>>>>>>>>>>>>> org.apache.commons.httpclient.HttpMethodBase.readStatusLine (1973)
>>>>>>>>>>>>>    at org.apache.commons.httpclient.HttpMethodBase.readResponse
>>>>>>>>>>>>> (1735)
>>>>>>>>>>>>>    at org.apache.commons.httpclient.HttpMethodBase.execute
>>>>>>>>>>>>> (1098)
>>>>>>>>>>>>>    at
>>>>>>>>>>>>> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry (398)
>>>>>>>>>>>>>    at
>>>>>>>>>>>>> org.apache.commons.httpclient.HttpMethodDirector.executeMethod (171)
>>>>>>>>>>>>>    at org.apache.commons.httpclient.HttpClient.executeMethod
>>>>>>>>>>>>> (397)
>>>>>>>>>>>>>    at org.apache.commons.httpclient.HttpClient.executeMethod
>>>>>>>>>>>>> (323)
>>>>>>>>>>>>>    at
>>>>>>>>>>>>> com.sun.faban.driver.transport.hc3.ApacheHC3Transport.fetchURL (529)
>>>>>>>>>>>>>    at
>>>>>>>>>>>>> com.sun.faban.driver.transport.hc3.ApacheHC3Transport.fetchURL (552)
>>>>>>>>>>>>>    at org.apache.olio.workload.driver.UIDriver.doHomePage (355)
>>>>>>>>>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
>>>>>>>>>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke (39)
>>>>>>>>>>>>>    at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
>>>>>>>>>>>>>    at java.lang.reflect.Method.invoke (597)
>>>>>>>>>>>>>    at com.sun.faban.driver.engine.TimeThread.doRun (169)
>>>>>>>>>>>>>    at com.sun.faban.driver.engine.AgentThread.run (202)
>>>>>>>>>>>>>
>>>>>>>>>>>>> and afterwards the master was waiting for threads to join for
>>>>>>>>>>>>> ever... (I attached gdb to verify that something was wrong) and hence I had
>>>>>>>>>>>>> to kill the benchmark.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> These threads are hanging reading the server responses, that
>>>>>>>>>>>> never came.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Building the services from Faban probably fixes it.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> In the Olio log there are WARNINGS  complaining about not
>>>>>>>>>>>>> deploying those. After building those and manually copying them to
>>>>>>>>>>>>> /faban/services (ant deploy did not place them there... :-(  )
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yes. But ant deploy should get them there. If not, can you
>>>>>>>>>>>> please let me know the ant messages?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Ant was deploying them indeed. I had a mistake in
>>>>>>>>>>> building.properties.
>>>>>>>>>>> I had:  faban.url=http://<hostname>:9980/   instead of
>>>>>>>>>>> faban.url=http://localhost:9980/
>>>>>>>>>>> After I changed that it started working...
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  it worked. (mostly worked)
>>>>>>>>>>>>>
>>>>>>>>>>>>> 3) I still have warnings like:
>>>>>>>>>>>>> 01:38:08:INFO:Time difference to host olio-web is 269 ms.
>>>>>>>>>>>>> Attempting to set clock.
>>>>>>>>>>>>> 01:38:08:INFO:Time difference to host olio-db is 263 ms.
>>>>>>>>>>>>> Attempting to set clock.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> These two are OK. Just trying to do a clock sync between the
>>>>>>>>>>>> systems.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  01:38:08:WARNING:olio-web wakeup-before time reached 700ms
>>>>>>>>>>>>> limit. System is too busy. Giving up.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This is one of Faban's clock-setting calibrations. If the system
>>>>>>>>>>>> is too busy or you run on some virtualization architectures, the lag time
>>>>>>>>>>>> between an intended end of sleep and the actual time when the thread really
>>>>>>>>>>>> wakes up (gets scheduled/executed) is too high, calibrations will fail.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  01:38:08:INFO:Time difference to host olio-mem is 262 ms.
>>>>>>>>>>>>> Attempting to set clock.
>>>>>>>>>>>>> 01:38:10:WARNING:olio-db wakeup-before time reached 700ms
>>>>>>>>>>>>> limit. System is too busy. Giving up.
>>>>>>>>>>>>> 09:38:09:WARNING:[date, -u, 011309382010.10]
>>>>>>>>>>>>> stderr:
>>>>>>>>>>>>> date: cannot set date: Operation not permitted
>>>>>>>>>>>>> 09:38:09:WARNING:Error on "[date, -u, 011309382010.10]" command
>>>>>>>>>>>>> trying to set the date. Exit value: 1
>>>>>>>>>>>>> 09:38:10:WARNING:[date, -u, 011309382010.11]
>>>>>>>>>>>>> stderr:
>>>>>>>>>>>>> date: cannot set date: Operation not permitted
>>>>>>>>>>>>> 09:38:10:WARNING:Error on "[date, -u, 011309382010.11]" command
>>>>>>>>>>>>> trying to set the date. Exit value: 1
>>>>>>>>>>>>> 09:38:09:WARNING:[date, -u, 011309382010.10]
>>>>>>>>>>>>> stderr:
>>>>>>>>>>>>> date: cannot set date: Operation not permitted
>>>>>>>>>>>>> 09:38:09:WARNING:Error on "[date, -u, 011309382010.10]" command
>>>>>>>>>>>>> trying to set the date. Exit value: 1
>>>>>>>>>>>>> 09:38:10:WARNING:[date, -u, 011309382010.11]
>>>>>>>>>>>>> stderr:
>>>>>>>>>>>>> date: cannot set date: Operation not permitted
>>>>>>>>>>>>>
>>>>>>>>>>>>> Leting faban change the vm clock sounds from the beginning a
>>>>>>>>>>>>> bad idea.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> OK. So it is xen. Yes, this is what Faban is trying to solve.
>>>>>>>>>>>> You can certainly turn it off. Please see:
>>>>>>>>>>>>   http://faban.sunsource.net/1.0/docs/howd services
>>>>>>>>>>>> ApacheHttpdService, MemcachedService, MySQLService that come with Faban
>>>>>>>>>>>> should be deployed before running Olio?
>>>>>>>>>>>>    I was gettingoi/physclocksync.html<http://faban.sunsource.net/1.0/docs/howdoi/physclocksync.html>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> I added the  <fh:timeSync>false<fh:timeSync> in my run.xml file
>>>>>>>>>>> ( btw in the link above there is a mistake :  <fh:timeSync>false
>>>>>>>>>>> </fh:timeSync> is correct, the second <fh:timeSync> needs a
>>>>>>>>>>> closing tag, the "/" is missing)
>>>>>>>>>>> that made the warnings go away.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  Unfortunately, xen is really bad in maintaining an accurate
>>>>>>>>>>>>> clock. As a result there is usually time difference between the different
>>>>>>>>>>>>> virtual machines
>>>>>>>>>>>>> of more than 10ms. I went over the setTime function in Faban
>>>>>>>>>>>>> source (/faban/com/sun/faban/harness/agent/CmdAgentImpl.java), it's big and
>>>>>>>>>>>>> ugly (very ugly)
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the compliments! I think you mean
>>>>>>>>>>>> CmdService.setClockTask. Time sensitive code ain't pretty. It is the
>>>>>>>>>>>> complexities dealing with the clock and trying to achieve good accuracy. If
>>>>>>>>>>>> you think you can simplify this, I'm listening (without loosing the
>>>>>>>>>>>> accuracy, of course). In comparison, CmdAgentImpl has nothing.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Yes, you r right it is CmdService.setClockTask. The previous
>>>>>>>>>>> email was composed at 3am ... :-)
>>>>>>>>>>> I am still a little confused.  the setClockTask is used to set
>>>>>>>>>>> the clock so that all the machines are synchronized with master. From what
>>>>>>>>>>> you mentioned the physical clock sync is only used for the logs.
>>>>>>>>>>> Why do we need to do that since 1) it requires root privileges
>>>>>>>>>>> (which might not be always available) 2) I could imagine an alternative that
>>>>>>>>>>> uses deltas from the actual physical clock without having to set it.
>>>>>>>>>>> ( I am probably missing something... :-)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>  Why there is this strict requirement for 10ms difference? Any
>>>>>>>>>>>>> ideas?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> It is easily achievable in most cases. May not be true for VMs.
>>>>>>>>>>>>
>>>>>>>>>>>> On some VM architectures, the OS however does not get scheduled
>>>>>>>>>>>> till way after that, thus causing problems. You may be able to measure
>>>>>>>>>>>> performance on those VMs. But you don't want to use such VMs to be a driver.
>>>>>>>>>>>> Your response time measurements will be way off.
>>>>>>>>>>>>
>>>>>>>>>>>> The physical clock sync is not really rigorous. And you can turn
>>>>>>>>>>>> it off. It is more to keep the systems in good time sync. If your VM stands
>>>>>>>>>>>> in the way, just turn it off. The driver's virtual clock sync is much more
>>>>>>>>>>>> picky in comparison. This is because the start time for the steady state
>>>>>>>>>>>> should be the same (with a very small tolerance) no matter how many drivers
>>>>>>>>>>>> are driving. Otherwise the measurement period won't be the same when viewed
>>>>>>>>>>>> from different drivers and the results won't be reliable.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  Even with ntp it's hard to provide the 10ms guarantee.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> That's why we don't use ntp ;-)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Just out of curiosity, the physical clocks are set only once at
>>>>>>>>>>> the beginning (right?), therefore for long runs the 10ms difference will not
>>>>>>>>>>> be guaranteed. Nope? Especially under VMs I 've seen significant clock
>>>>>>>>>>> difference withing a few minutes.
>>>>>>>>>>> At least ntp can periodically resync (of course doing so, might
>>>>>>>>>>> screw up the logs with time going backwards etc)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  I am thinking of modifying this function to always return that
>>>>>>>>>>>>> the time difference is less than 10ms (so that I do not have to wait all the
>>>>>>>>>>>>> time for the timeouts.)
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Why bother. Don't like it, just turn it off. It has good use in
>>>>>>>>>>>> most configurations we're dealing with. And, it avoids ntp inaccuracies.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  Will this break anything in Olio?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Nope. Except the times in your logs will appear out of sequence.
>>>>>>>>>>>> They rely on the local time on the originating systems.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> 4) Warning like:
>>>>>>>>>>>>> 09:39:48:WARNING:Image at
>>>>>>>>>>>>> http://olio-web:80/fileService.php?cache=false&file=e168t.jpg<
>>>>>>>>>>>>> http://olio-web:80/fileService.php?cache=false&file=e168t.jpg>
>>>>>>>>>>>>> size of 249 bytes is too small. Image may not exist
>>>>>>>>>>>>> can be ignored, right?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Well, something is wrong. We don't have images that small. Check
>>>>>>>>>>>> whether e168t.jpg is really that small. That's why we have that warning.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It kinda funny, my problem was that I had the olio webkit version
>>>>>>>>>>> installed and then I downloaded the version from the online svn repository.
>>>>>>>>>>> I built the driver but forgot to update the webpage for my apache server.
>>>>>>>>>>> Which
>>>>>>>>>>> as expected was the source for many of my issues.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> 5) Last and most important.
>>>>>>>>>>>>> I can run the benchmark and all the operation succeed but for
>>>>>>>>>>>>> login.
>>>>>>>>>>>>> I get a bunch of:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 09:39:25:WARNING:UIDriverAgent[0].2.doLogin: Found login prompt
>>>>>>>>>>>>> at index 2926, Login as at786o08x, 2178 failed.
>>>>>>>>>>>>> Note: Error not counted in result.
>>>>>>>>>>>>> Either transaction start or end time is not within steady
>>>>>>>>>>>>> state.
>>>>>>>>>>>>> java.lang.RuntimeException: Found login prompt at index 2926,
>>>>>>>>>>>>> Login as at786o08x, 2178 failed.
>>>>>>>>>>>>>    at org.apache.olio.workload.driver.UIDriver.doLogin (404)
>>>>>>>>>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
>>>>>>>>>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke (39)
>>>>>>>>>>>>>    at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
>>>>>>>>>>>>>    at java.lang.reflect.Method.invoke (597)
>>>>>>>>>>>>>    at com.sun.faban.driver.engine.TimeThread.doRun (169)
>>>>>>>>>>>>>    at com.sun.faban.driver.engine.AgentThread.run (202)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any ideas? I do get
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> You likely have cookie issues. It can't seem to hold on to a
>>>>>>>>>>>> session.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Well there was a permission issue with the http_session dir. I
>>>>>>>>>>> could not right to it. chmod 777 it fixed this.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> (I ve found online:
>>>>>>>>>>>>> http://www.mail-archive.com/olio-dev@incubator.apache.org/msg00647.htmlwhich is similar, but when I added
>>>>>>>>>>>>>
>>>>>>>>>>>>> com.sun.faban.driver.transport.sunhttp.ThreadCookieHandler.level=FINER
>>>>>>>>>>>>>  in build.properties
>>>>>>>>>>>>> I did not see any cookie related warnings. Those should appear
>>>>>>>>>>>>> in the olio run log or the apache log, right? Am i just looking at the wrong
>>>>>>>>>>>>> place? )
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, that's applicable only to the Sun Http Transport. The
>>>>>>>>>>>> version of Olio you're using is based on the Apache Http Transport (Apache
>>>>>>>>>>>> HttpClient 3.1). The ThreadCookieHandler is not used for the Apache
>>>>>>>>>>>> transport and that's why you don't see any logs. Try upgrade to Faban 1.0
>>>>>>>>>>>> before looking at other things.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> It's a long email I know. Your feedback would be most
>>>>>>>>>>>>> appreciated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Regards
>>>>>>>>>>>>>
>>>>>>>>>>>>> -------------------------------------------------------------------
>>>>>>>>>>>>> Kontorinis Vasileios
>>>>>>>>>>>>> Phd student, University of California San Diego
>>>>>>>>>>>>> San Diego, CA 92122
>>>>>>>>>>>>> Cell. phone: (858) 717 6899
>>>>>>>>>>>>> bkontorinis@gmail.com <mailto:bkontorinis@gmail.com>,
>>>>>>>>>>>>> vkontori@ucsd.edu <mailto:vkontori@ucsd.edu>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -------------------------------------------------------------------\
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for all the questions/comments.
>>>>>>>>>>>>
>>>>>>>>>>>> -Akara
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> And now some more questions/ comments:
>>>>>>>>>>> 1) I get the following error:
>>>>>>>>>>>
>>>>>>>>>>> 15:13:05:SEVERE:CmdService: Getting - exception reading
>>>>>>>>>>> /usr/data/olio-db.err
>>>>>>>>>>> java.io.FileNotFoundException: File /usr/data/olio-db.err does
>>>>>>>>>>> not exist.
>>>>>>>>>>>     at com.sun.faban.common.FileTransfer.<init> (70)
>>>>>>>>>>>     at com.sun.faban.harness.agent.FileAgentImpl.get (315)
>>>>>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
>>>>>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke (39)
>>>>>>>>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
>>>>>>>>>>>     at java.lang.reflect.Method.invoke (597)
>>>>>>>>>>>     at sun.rmi.server.UnicastServerRef.dispatch (305)
>>>>>>>>>>>     at sun.rmi.transport.Transport$1.run (159)
>>>>>>>>>>>     at java.security.AccessController.doPrivileged (null)
>>>>>>>>>>>     at sun.rmi.transport.Transport.serviceCall (155)
>>>>>>>>>>>     at sun.rmi.transport.tcp.TCPTransport.handleMessages (535)
>>>>>>>>>>>     at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0
>>>>>>>>>>> (790)
>>>>>>>>>>>     at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run
>>>>>>>>>>> (649)
>>>>>>>>>>>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
>>>>>>>>>>> (885)
>>>>>>>>>>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run (907)
>>>>>>>>>>>     at java.lang.Thread.run (619)
>>>>>>>>>>>     at
>>>>>>>>>>> sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer (255)
>>>>>>>>>>>     at sun.rmi.transport.StreamRemoteCall.executeCall (233)
>>>>>>>>>>>     at sun.rmi.server.UnicastRef.invoke (142)
>>>>>>>>>>>     at com.sun.faban.harness.agent.FileAgentImpl_Stub.get (null)
>>>>>>>>>>>     at com.sun.faban.harness.engine.CmdService.get (1334)
>>>>>>>>>>>     at com.sun.faban.harness.RunContext.getFile (346)
>>>>>>>>>>>     at com.sun.services.MySQLService.getLogs (197)
>>>>>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
>>>>>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke (39)
>>>>>>>>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
>>>>>>>>>>>     at java.lang.reflect.Method.invoke (597)
>>>>>>>>>>>     at com.sun.faban.harness.util.Invoker.invoke (98)
>>>>>>>>>>>     at com.sun.faban.harness.services.ServiceWrapper.getLogs
>>>>>>>>>>> (200)
>>>>>>>>>>>     at com.sun.faban.harness.services.ServiceManager.getLogs
>>>>>>>>>>> (642)
>>>>>>>>>>>     at com.sun.faban.harness.engine.GenericBenchmark.start (323)
>>>>>>>>>>>     at com.sun.faban.harness.engine.RunDaemon.run (338)
>>>>>>>>>>>     at java.lang.Thread.run (619)
>>>>>>>>>>> 15:13:05:WARNING:Could not copy /usr/data/olio-db.err to
>>>>>>>>>>> /home/gdhiman/faban.1.0/faban/output/OlioDriver.2D/mysql_err.log.olio-db
>>>>>>>>>>>
>>>>>>>>>>> Apparently something is misconfigured in my db-server. Any ideas?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2) I get the following error:
>>>>>>>>>>> 15:13:16:WARNING:[/home/gdhiman/faban.1.0/faban/bin/fenxi,
>>>>>>>>>>> process, /home/gdhiman/faban.1.0/faban/output/OlioDriver.2D/,
>>>>>>>>>>> /home/gdhiman/faban.1.0/faban/output/OlioDriver.2D//post/, OlioDriver.2D]
>>>>>>>>>>> stderr:
>>>>>>>>>>> Error in executing perl
>>>>>>>>>>> /home/gdhiman/faban.1.0/faban/master/webapps/fenxi/txt2db/
>>>>>>>>>>> mpstat.pl
>>>>>>>>>>> Error in executing perl
>>>>>>>>>>> /home/gdhiman/faban.1.0/faban/master/webapps/fenxi/txt2db/
>>>>>>>>>>> mpstat.pl
>>>>>>>>>>>
>>>>>>>>>>> Actually I traced back this one. The problem is the difference in
>>>>>>>>>>> output format of the Sun's mpstat and default GNU mpstat.
>>>>>>>>>>> This is my output of my mpstat:
>>>>>>>>>>>
>>>>>>>>>>> gdhiman@olio-client00:~/faban.1.0/faban/output/OlioDriver.2D$
>>>>>>>>>>> mpstat 1
>>>>>>>>>>> Linux 2.6.18.8-xen (olio-client00)     01/16/10
>>>>>>>>>>>
>>>>>>>>>>> 16:25:06     CPU   %user   %nice    %sys %iowait    %irq   %soft
>>>>>>>>>>> %steal   %idle    intr/s
>>>>>>>>>>> 16:25:07     all    0.00    0.00    0.00    0.00    0.00
>>>>>>>>>>> 0.00    0.00  100.00     52.48
>>>>>>>>>>> 16:25:08     all    0.00    0.00    0.00    0.00    0.00
>>>>>>>>>>> 0.00    0.00  100.00     50.50
>>>>>>>>>>> 16:25:09     all    0.00    0.00    0.00    0.00    0.00
>>>>>>>>>>> 0.00    0.00  100.00     79.21
>>>>>>>>>>> 16:25:10     all    0.00    0.00    0.00    0.00    0.00
>>>>>>>>>>> 0.00    0.00  100.00     45.54
>>>>>>>>>>> 16:25:11     all    0.00    0.00    0.00    0.00    0.00
>>>>>>>>>>> 0.00    0.00  100.00     55.45
>>>>>>>>>>>
>>>>>>>>>>> The first line as well as the time at the beginning of each entry
>>>>>>>>>>> messing up the parsing at mpstat.pl. (also the fields are
>>>>>>>>>>> different)   Any plans to support this??
>>>>>>>>>>>
>>>>>>>>>>> 3) Scaling questions.
>>>>>>>>>>> - So far I did not have a single experiment passing. Some are
>>>>>>>>>>> pretty close with only one metric check failing.
>>>>>>>>>>>
>>>>>>>>>>> Average images loaded per Home Page2.79>= 3
>>>>>>>>>>> FAILED
>>>>>>>>>>> Any ideas? Is it the case that the disc is not fast enough? I am
>>>>>>>>>>> just using the local filesystem for the filestore.
>>>>>>>>>>>
>>>>>>>>>>> - As I double the number of concurrent users I observe linear
>>>>>>>>>>> scaling in the thoughput.
>>>>>>>>>>> Con Users         Throughput
>>>>>>>>>>>  25                        4.967
>>>>>>>>>>>  50                       10.06
>>>>>>>>>>> 100                      19.375
>>>>>>>>>>> 200                      40.21
>>>>>>>>>>> 400                      75.818
>>>>>>>>>>> 800                       0.383
>>>>>>>>>>> 1000                     0.483
>>>>>>>>>>>
>>>>>>>>>>> The linear scaling stops for 400 concurrent users ( only one
>>>>>>>>>>> agent). Actually it would be exactly linear (value of ~80) but almost half
>>>>>>>>>>> of the login operations failed. I am looking into it.
>>>>>>>>>>> Any insights on what might be the first thing failing?
>>>>>>>>>>>
>>>>>>>>>>> For the 800 and 1000 experiments there are no failed operations
>>>>>>>>>>> logged. It looks like those are being discarded... (?)
>>>>>>>>>>>
>>>>>>>>>>> Bonus question:
>>>>>>>>>>> In the runtime statistics
>>>>>>>>>>> <runtimeStats enabled="true">
>>>>>>>>>>>          <interval>30</interval>
>>>>>>>>>>>  </runtimeStats>
>>>>>>>>>>>
>>>>>>>>>>> only the 90% response time is reported. Is there an easy way to
>>>>>>>>>>> also report the 99% ? ( or I need to add code for that?)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks a lot again in advance.
>>>>>>>>>>> -VK
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message