From olio-user-return-321-apmail-incubator-olio-user-archive=incubator.apache.org@incubator.apache.org Fri Feb 12 23:31:05 2010 Return-Path: Delivered-To: apmail-incubator-olio-user-archive@minotaur.apache.org Received: (qmail 88677 invoked from network); 12 Feb 2010 23:31:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Feb 2010 23:31:05 -0000 Received: (qmail 34367 invoked by uid 500); 12 Feb 2010 23:31:05 -0000 Delivered-To: apmail-incubator-olio-user-archive@incubator.apache.org Received: (qmail 34328 invoked by uid 500); 12 Feb 2010 23:31:04 -0000 Mailing-List: contact olio-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: olio-user@incubator.apache.org Delivered-To: mailing list olio-user@incubator.apache.org Received: (qmail 34319 invoked by uid 99); 12 Feb 2010 23:31:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Feb 2010 23:31:04 +0000 X-ASF-Spam-Status: No, hits=3.7 required=10.0 tests=HTML_MESSAGE,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of shanti.subramanyam@gmail.com designates 209.85.160.47 as permitted sender) Received: from [209.85.160.47] (HELO mail-pw0-f47.google.com) (209.85.160.47) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Feb 2010 23:30:53 +0000 Received: by pwi5 with SMTP id 5so219526pwi.6 for ; Fri, 12 Feb 2010 15:30:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=UtVD1FuiCZLLveJVGo5NzABie04nDjyrDJJI5283Fbk=; b=gxqNFefpR9Sh6R67WqI5pQZFvmlx6sbT5u5Of4jAY6cTQqs+3h/ySsNXhe7640JIaH briHhpqBurtnZSBaBfPHFKG1J7ow4yOR6uM7Q2GwWOkTRerkL4Y+n8c6iAQBehExkB5/ hCV6jK3YNsMo0XKVnNj58qE8barTwLV4LvL8Q= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=ddtLrZkIgNy8p0UVndJHXqkD7neY8lUVUO+sMTpbI8AD1ZIeBOCXrzFGxoy1/w9iCl d9GBKTcDLk96jbh+FwMfBm32qatxHdrdBo4fKNM30XsYhfGsu9QuR+O8PWTX98JGuOcG ZEe7g17MYByNgIoXFzNJvhx9W6ps4mwF7H/VM= MIME-Version: 1.0 Received: by 10.114.11.2 with SMTP id 2mr1407829wak.73.1266017432186; Fri, 12 Feb 2010 15:30:32 -0800 (PST) In-Reply-To: <89c38a6f1002112142l5878f9bek606c6ee2d0cbbfa1@mail.gmail.com> References: <89c38a6f1002080118x6c8bb8a9k9f3b6e8934734915@mail.gmail.com> <59d35f41002081003j74691013ice25fc5320e3d13c@mail.gmail.com> <89c38a6f1002081553r853607tfa5b960556cb43b9@mail.gmail.com> <59d35f41002081828l7aab11d4md323f15e60a7aea@mail.gmail.com> <89c38a6f1002112142l5878f9bek606c6ee2d0cbbfa1@mail.gmail.com> Date: Fri, 12 Feb 2010 15:30:32 -0800 Message-ID: <59d35f41002121530ha145205ka3fbf856b0195956@mail.gmail.com> Subject: Re: Olio Scaling From: Shanti Subramanyam To: Vasileios Kontorinis Cc: olio-user@incubator.apache.org Content-Type: multipart/alternative; boundary=00504502e31a533104047f6fa728 --00504502e31a533104047f6fa728 Content-Type: text/plain; charset=ISO-8859-1 If you want to run multiple webservers on different systems, you must have access to the filestore from all of them. The easiest way to do this is to nfs-mount the filestore from the server it resides on so it is accessible to the other machines as well. Shanti On Thu, Feb 11, 2010 at 9:42 PM, Vasileios Kontorinis wrote: > Shanti hi again, > Sorry for not submitting the JIRA on time, I am extremely busy lately. > > I have a fast question regarding the way the webserver interacts with the > filestore. I run some scaling studies with one, two and three different > server while having only one filestore (I do specify that in the run.xml > configuration file, webServer and dataStorage ). > The filestore is a local folder on one of the server machines. However, in > the oliophp/etc/config.php I also specify on each server > > $olioconfig['fileSystem'] = 'LocalFS'; > $olioconfig['localfsRoot'] = '/home/gdhiman/filestore'; > > As a result, I do get WARNINGS for missing files on the webserver that do > not host a filestore. What is the right configuration for > oliophp/etc/config.php? Can I somehow detach the filestore from the > webserver so that it requests files remotely? > > > Thanks again. > ------------------------------------------------------------------- > Kontorinis Vasileios > Phd student, University of California San Diego > San Diego, CA 92122 > Cell. phone: (858) 717 6899 > bkontorinis@gmail.com, vkontori@ucsd.edu > ------------------------------------------------------------------- > > > 2010/2/8 Shanti Subramanyam > > >> >> On Mon, Feb 8, 2010 at 3:53 PM, Vasileios Kontorinis < >> bkontorinis@gmail.com> wrote: >> >>> >>> >>>> We need to look into this issue - I suspect that something subtle has >>>> changed in 0.2 which hasn't got accounted for in the expected #images >>>> loaded. Can I please request that you file a JIRA on this ? >>>> >>> >>> How do I do this? Pointers? >>> >> >> http://issues.apache.org >> >> >>> I tried runs of 20mins to verify that longer runs will not make it better >>> and it's still failing for just 50 users. >>> >> >> What worries me is that you're saying it fails for 1800 users too - I can >> understand it may fail for 50 users, but if it fails for larger #users, then >> it is a bug. >> >>> >>> >> >>> and I do get the repetitive patterns you mentioned. However, the cache_MB >>> though never exceeds 0.05... >>> I would expect that memcache size is really important for the application >>> scaling. What is the point of having a separate memcache server if we are >>> only using less than 50KB(?) of memory for caching? >>> >>> >> Try running without memcached - it can be easily configured in the app's >> etc/config.php. Then you will see what different the cache makes. The >> reduction in db traffic is dramatic resulting in the response times you see. >> The reason the size is small is because we are currently only caching the >> home page which is shared. We have not bothered to implement any additional >> caching as this level of caching is sufficient to reduce the db load. >> >> Regards >>> -VK >>> >>> Shanti >> >>> >>> >>>> Shanti >>>> >>>> >>>>> Thanks again >>>>> ------------------------------------------------------------------- >>>>> Kontorinis Vasileios >>>>> Phd student, University of California San Diego >>>>> San Diego, CA 92122 >>>>> Cell. phone: (858) 717 6899 >>>>> bkontorinis@gmail.com, vkontori@ucsd.edu >>>>> ------------------------------------------------------------------- >>>>> >>>>> >>>>> 2010/1/27 Shanti Subramanyam >>>>> >>>>>> Yes - these are problems that I'm already aware of. >>>>>> The best solution to the filestore issue is to change ownership of the >>>>>> directory to the same user/group as the apache process. We could have the >>>>>> fileloader.sh change write access I guess, but since that's a big security >>>>>> hole, we may not want to do that automatically without letting the user know >>>>>> about it. >>>>>> >>>>>> The fact that your response times are so high indicate that you're >>>>>> running a far larger load than the system can handle and/or you still need >>>>>> some tuning. >>>>>> I suggest you start over from say 100 users and see at what point your >>>>>> response times start getting really large. The apache error log should be >>>>>> pulled in as part of the 'Statistics' tab, so do keep monitoring that. >>>>>> >>>>>> Shanti >>>>>> >>>>>> >>>>>> On Wed, Jan 27, 2010 at 1:34 AM, Vasileios Kontorinis < >>>>>> bkontorinis@gmail.com> wrote: >>>>>> >>>>>>> Shanti hi again, >>>>>>> I checked my apache logs and there were a bunch of errors. >>>>>>> It looks like there some issues with the >>>>>>> webapp/php/trunk/classes/ImageUtil.php in the last release of olio. (I >>>>>>> downloaded >>>>>>> http://www.alliedquotes.com/mirrors/apache/incubator/olio/0.2/apache-olio-php-src-0.2.tar.gz) >>>>>>> 1) There is a line that needs to be commented. php complains ("1.5. >>>>>>> Must be greater than zero."). >>>>>>> 2) Then, it was complaining that it cannot find function >>>>>>> fastimagecopyresampled . To work around that moved the function >>>>>>> fastimagecopyresampled above createThumb (this might not be required ) and >>>>>>> deplared it static. >>>>>>> Finally, I call the function from createThumb with >>>>>>> self::fastimagecopyresampled . >>>>>>> 3) Then, it started complaining because it could not write to the >>>>>>> filestore. The problem is that wants to write the new images as www-data >>>>>>> from the apache, while the filestore does not have write persmission for >>>>>>> others. Manually, >>>>>>> giving access solves the problem (chmod -R o+w /filestore) >>>>>>> but since the directories in filestore are generated automatically, maybe >>>>>>> the chmod command should be added in fileloader.sh >>>>>>> >>>>>>> Funnily enough, after fixing those issues, I still cannot pass the: >>>>>>> Average images loaded per Home Page 2.65 >=3 FAILED >>>>>>> >>>>>>> and on top of that I also have: >>>>>>> Response Times (secs) >>>>>>> AddPerson 5.190 13.194 3.387 8.800 3.000 FAILED >>>>>>> AddEvent 5.904 16.784 3.159 10.400 4.000 FAILED >>>>>>> >>>>>>> Think tims for AddPerson and AddEvent fail as well. >>>>>>> >>>>>>> Any insights are welcome .... :-( >>>>>>> >>>>>>> ------------------------------------------------------------------- >>>>>>> Kontorinis Vasileios >>>>>>> Phd student, University of California San Diego >>>>>>> San Diego, CA 92122 >>>>>>> Cell. phone: (858) 717 6899 >>>>>>> bkontorinis@gmail.com, vkontori@ucsd.edu >>>>>>> ------------------------------------------------------------------- >>>>>>> >>>>>>> >>>>>>> 2010/1/26 Shanti Subramanyam >>>>>>> >>>>>>>> Yes - 0.2 requires a lot more disk space as we changed the ratio of >>>>>>>> concurrent users to registered users to 1:100. If you haven't already, >>>>>>>> please check out our published Blueprints for detailed performance >>>>>>>> characteristics of the workload: >>>>>>>> Deploying Web 2.0 Applications on Sun Servers and the OpenSolaris >>>>>>>> Operating System >>>>>>>> >>>>>>>> If you run for long enough, you should get passing runs. Have you >>>>>>>> verified that there are no errors in the run logs when you see the 'Avg. >>>>>>>> images loaded per home page' fail ? >>>>>>>> >>>>>>>> On to your open files error - you may have to tune your networking >>>>>>>> tier and/or #open file descriptors. I don't believe we have ever seen as >>>>>>>> many files open as you are seeing. Can you determine whether these are from >>>>>>>> the file store or network ? We also typically run the filestore on a >>>>>>>> different system and nfs-mount it on the webserver box. >>>>>>>> You will have to tune your system to ensure good performance since >>>>>>>> you will need memory for both apache and files. >>>>>>>> >>>>>>>> Shanti >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Jan 25, 2010 at 5:06 PM, Vasileios Kontorinis < >>>>>>>> bkontorinis@gmail.com> wrote: >>>>>>>> >>>>>>>>> Akara and Shanti hi, >>>>>>>>> I did migrate to Olio 0.2. With the last version of Olio I came >>>>>>>>> across some new interesting things. >>>>>>>>> >>>>>>>>> Scaling issues: >>>>>>>>> - I am still getting the: >>>>>>>>> Average images loaded per Home Page2.55>= 3 >>>>>>>>> FAILED >>>>>>>>> - additionally, when I scale the concurrent users to 800 I run out >>>>>>>>> of diskspace since my filestore occupies more than 62GB. >>>>>>>>> Actually for 600 users it occupies 50GB. I was curious if that >>>>>>>>> makes sense. How much space I will need to reach 1000 users? >>>>>>>>> In the php_setup.html it suggests that we will need 50GB but >>>>>>>>> apparently we need way more for large number of users. >>>>>>>>> >>>>>>>>> - Finally and most importantly, for 600 users many of the >>>>>>>>> operations fail with the exception: >>>>>>>>> Message: java.net.SocketException: Too many open files >>>>>>>>> Stack Trace: >>>>>>>>> Class Method Line java.net.PlainSocketImpl socketAccept >>>>>>>>> java.net.PlainSocketImpl accept 390 java.net.ServerSocket >>>>>>>>> implAccept 453 java.net.ServerSocket accept 421 >>>>>>>>> sun.rmi.transport.tcp.TCPTransport$AcceptLoop executeAcceptLoop >>>>>>>>> 369 sun.rmi.transport.tcp.TCPTransport$AcceptLoop run 341 >>>>>>>>> java.lang.Thread run 619 >>>>>>>>> or >>>>>>>>> >>>>>>>>> java.net.SocketException: Too many open files >>>>>>>>> Stack Trace: >>>>>>>>> Class Method Line java.net.Socket createImpl 394 java.net.Socket >>>>>>>>> getImpl 457 java.net.Socket bind 571 >>>>>>>>> com.sun.faban.driver.transport.hc3.ProtocolTimedSocketFactory >>>>>>>>> createSocket 60 org.apache.commons.httpclient.HttpConnection open >>>>>>>>> 707 org.apache.commons.httpclient.HttpMethodDirector >>>>>>>>> executeWithRetry 387 >>>>>>>>> org.apache.commons.httpclient.HttpMethodDirector executeMethod 171 >>>>>>>>> org.apache.commons.httpclient.HttpClient executeMethod 397 >>>>>>>>> org.apache.commons.httpclient.HttpClient executeMethod 323 >>>>>>>>> com.sun.faban.driver.transport.hc3.ApacheHC3Transport readURL 274 >>>>>>>>> org.apache.olio.workload.driver.UIDriver doLogin 398 >>>>>>>>> org.apache.olio.workload.driver.UIDriver doLogin 424 >>>>>>>>> sun.reflect.GeneratedMethodAccessor8 invoke >>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl invoke 25 >>>>>>>>> java.lang.reflect.Method invoke 597 >>>>>>>>> com.sun.faban.driver.engine.TimeThread doRun 169 >>>>>>>>> com.sun.faban.driver.engine.AgentThrea >>>>>>>>> >>>>>>>>> I am monitoring the number of open files in the web-server with >>>>>>>>> `watch "lsof | wc"` and the olio starts failing when around 65000-70,000 >>>>>>>>> files are open. lsof shows that for each apache2 thread there are around 100 >>>>>>>>> files open. Therefore there are around 650-700 different apache2 threads >>>>>>>>> that create the bulk of those open file descriptors. >>>>>>>>> The soft and hard limit is set to 403238, which means that there >>>>>>>>> should be many more open files before it will start failing. >>>>>>>>> (Actually, I verified the limit by opening a bunch of files with a >>>>>>>>> python script and it does reach the limitation of 403238.) >>>>>>>>> Any insights? Is there any chance the the file descriptors take >>>>>>>>> more time that usual to be reclaimed after being closed in the xen vm I use >>>>>>>>> for my web-server? Does it make sense for olio at the first place to have so >>>>>>>>> many files open at the same time? >>>>>>>>> >>>>>>>>> Thanks again. >>>>>>>>> >>>>>>>>> >>>>>>>>> ------------------------------------------------------------------- >>>>>>>>> Kontorinis Vasileios >>>>>>>>> Phd student, University of California San Diego >>>>>>>>> San Diego, CA 92122 >>>>>>>>> Cell. phone: (858) 717 6899 >>>>>>>>> bkontorinis@gmail.com, vkontori@ucsd.edu >>>>>>>>> ------------------------------------------------------------------- >>>>>>>>> >>>>>>>>> >>>>>>>>> 2010/1/16 Shanti Subramanyam >>>>>>>>> >>>>>>>>> I would really recommend that you migrate to Olio 0.2. In addition >>>>>>>>>> to bug fixes, there are some major features changes in it. See Olio >>>>>>>>>> 0.2 released >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Shanti >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sat, Jan 16, 2010 at 4:49 PM, Vasileios Kontorinis < >>>>>>>>>> bkontorinis@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Akara hi again, >>>>>>>>>>> Below I have comments on your suggestions and at the end some >>>>>>>>>>> bonus questions... Thanks again. >>>>>>>>>>> >>>>>>>>>>> 2010/1/13 Akara Sucharitakul >>>>>>>>>>> >>>>>>>>>>>> With your permission, I'd like to copy the Olio and Faban user >>>>>>>>>>>> aliases going forward. I feel it will help a much wider audience. Please see >>>>>>>>>>>> below for answers/comments: >>>>>>>>>>>> >>>>>>>>>>>> Sure. I cced olio user alias. I am not sure which is the right >>>>>>>>>>> faban list. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Vasileios Kontorinis wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Akara hi, >>>>>>>>>>>>> I am a grad student at UCSD and I use Olio for a research >>>>>>>>>>>>> project where we want to measure olio performance under live virtual machine >>>>>>>>>>>>> migration. We use ubuntu 8.04 on nehalem servers. >>>>>>>>>>>>> I have co ed the last version of olio from the online svn >>>>>>>>>>>>> repository and downloaded the last version of faban (faban-kit-101509.tar.gz >>>>>>>>>>>>> ) >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 101509 is fairly recent. But the latest on the web site is >>>>>>>>>>>> 111109 (Faban 1.0). There were just bug fixes between those releases. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I have upgraded to Faban 1.0, still using olio1.0 though ( the >>>>>>>>>>> release of 2.0 was announced, will switch to it if I run into bugs that have >>>>>>>>>>> been fixed) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> So far, I employed a bunch of hacks to get most of it to work >>>>>>>>>>>>> and I am almost there. In the process I got a bunch of questions. >>>>>>>>>>>>> >>>>>>>>>>>>> Questions (some of them might be just faban related, not olio >>>>>>>>>>>>> so bear with me): >>>>>>>>>>>>> 1) In there any way to deploy OlioDriver.jar through the >>>>>>>>>>>>> command line? Firefox through ssh forwarding is dead slow and I d rather >>>>>>>>>>>>> avoid if I can. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Just drop the jar into faban/benchmarks/ and it will deploy >>>>>>>>>>>> itself. This is documented at >>>>>>>>>>>> http://faban.sunsource.net/1.0/docs/guide/harnessdev/deploybenchmark.htmlunder "Alternate Deployment Methods." >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2) The services ApacheHttpdService, MemcachedService, >>>>>>>>>>>>> MySQLService that come with Faban should be deployed before running Olio? >>>>>>>>>>>>> I was getting some very weird errors. e.g. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Yes, you should. Olio will search for those. >>>>>>>>>>>> >>>>>>>>>>>> Done >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> 03:50:27:WARNING:UIDriverAgent[0]: Forcefully terminating >>>>>>>>>>>>> benchmark run >>>>>>>>>>>>> 03:50:27:WARNING:UIDriverAgent[0]: 25 threads forcefully >>>>>>>>>>>>> terminated. >>>>>>>>>>>>> java.lang.Throwable: Stack of non-terminating thread. >>>>>>>>>>>>> at java.net.SocketInputStream.socketRead0 (null) >>>>>>>>>>>>> at java.net.SocketInputStream.read (129) >>>>>>>>>>>>> at java.io.FilterInputStream.read (116) >>>>>>>>>>>>> at com.sun.faban.driver.transport.util.TimedInputStream.read >>>>>>>>>>>>> (139) >>>>>>>>>>>>> at java.io.BufferedInputStream.fill (218) >>>>>>>>>>>>> at java.io.BufferedInputStream.read (237) >>>>>>>>>>>>> at org.apache.commons.httpclient.HttpParser.readRawLine (78) >>>>>>>>>>>>> at org.apache.commons.httpclient.HttpParser.readLine (106) >>>>>>>>>>>>> at org.apache.commons.httpclient.HttpConnection.readLine >>>>>>>>>>>>> (1116) >>>>>>>>>>>>> at >>>>>>>>>>>>> org.apache.commons.httpclient.HttpMethodBase.readStatusLine (1973) >>>>>>>>>>>>> at org.apache.commons.httpclient.HttpMethodBase.readResponse >>>>>>>>>>>>> (1735) >>>>>>>>>>>>> at org.apache.commons.httpclient.HttpMethodBase.execute >>>>>>>>>>>>> (1098) >>>>>>>>>>>>> at >>>>>>>>>>>>> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry (398) >>>>>>>>>>>>> at >>>>>>>>>>>>> org.apache.commons.httpclient.HttpMethodDirector.executeMethod (171) >>>>>>>>>>>>> at org.apache.commons.httpclient.HttpClient.executeMethod >>>>>>>>>>>>> (397) >>>>>>>>>>>>> at org.apache.commons.httpclient.HttpClient.executeMethod >>>>>>>>>>>>> (323) >>>>>>>>>>>>> at >>>>>>>>>>>>> com.sun.faban.driver.transport.hc3.ApacheHC3Transport.fetchURL (529) >>>>>>>>>>>>> at >>>>>>>>>>>>> com.sun.faban.driver.transport.hc3.ApacheHC3Transport.fetchURL (552) >>>>>>>>>>>>> at org.apache.olio.workload.driver.UIDriver.doHomePage (355) >>>>>>>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0 (null) >>>>>>>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke (39) >>>>>>>>>>>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke (25) >>>>>>>>>>>>> at java.lang.reflect.Method.invoke (597) >>>>>>>>>>>>> at com.sun.faban.driver.engine.TimeThread.doRun (169) >>>>>>>>>>>>> at com.sun.faban.driver.engine.AgentThread.run (202) >>>>>>>>>>>>> >>>>>>>>>>>>> and afterwards the master was waiting for threads to join for >>>>>>>>>>>>> ever... (I attached gdb to verify that something was wrong) and hence I had >>>>>>>>>>>>> to kill the benchmark. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> These threads are hanging reading the server responses, that >>>>>>>>>>>> never came. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Building the services from Faban probably fixes it. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> In the Olio log there are WARNINGS complaining about not >>>>>>>>>>>>> deploying those. After building those and manually copying them to >>>>>>>>>>>>> /faban/services (ant deploy did not place them there... :-( ) >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Yes. But ant deploy should get them there. If not, can you >>>>>>>>>>>> please let me know the ant messages? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Ant was deploying them indeed. I had a mistake in >>>>>>>>>>> building.properties. >>>>>>>>>>> I had: faban.url=http://:9980/ instead of >>>>>>>>>>> faban.url=http://localhost:9980/ >>>>>>>>>>> After I changed that it started working... >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> it worked. (mostly worked) >>>>>>>>>>>>> >>>>>>>>>>>>> 3) I still have warnings like: >>>>>>>>>>>>> 01:38:08:INFO:Time difference to host olio-web is 269 ms. >>>>>>>>>>>>> Attempting to set clock. >>>>>>>>>>>>> 01:38:08:INFO:Time difference to host olio-db is 263 ms. >>>>>>>>>>>>> Attempting to set clock. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> These two are OK. Just trying to do a clock sync between the >>>>>>>>>>>> systems. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 01:38:08:WARNING:olio-web wakeup-before time reached 700ms >>>>>>>>>>>>> limit. System is too busy. Giving up. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> This is one of Faban's clock-setting calibrations. If the system >>>>>>>>>>>> is too busy or you run on some virtualization architectures, the lag time >>>>>>>>>>>> between an intended end of sleep and the actual time when the thread really >>>>>>>>>>>> wakes up (gets scheduled/executed) is too high, calibrations will fail. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 01:38:08:INFO:Time difference to host olio-mem is 262 ms. >>>>>>>>>>>>> Attempting to set clock. >>>>>>>>>>>>> 01:38:10:WARNING:olio-db wakeup-before time reached 700ms >>>>>>>>>>>>> limit. System is too busy. Giving up. >>>>>>>>>>>>> 09:38:09:WARNING:[date, -u, 011309382010.10] >>>>>>>>>>>>> stderr: >>>>>>>>>>>>> date: cannot set date: Operation not permitted >>>>>>>>>>>>> 09:38:09:WARNING:Error on "[date, -u, 011309382010.10]" command >>>>>>>>>>>>> trying to set the date. Exit value: 1 >>>>>>>>>>>>> 09:38:10:WARNING:[date, -u, 011309382010.11] >>>>>>>>>>>>> stderr: >>>>>>>>>>>>> date: cannot set date: Operation not permitted >>>>>>>>>>>>> 09:38:10:WARNING:Error on "[date, -u, 011309382010.11]" command >>>>>>>>>>>>> trying to set the date. Exit value: 1 >>>>>>>>>>>>> 09:38:09:WARNING:[date, -u, 011309382010.10] >>>>>>>>>>>>> stderr: >>>>>>>>>>>>> date: cannot set date: Operation not permitted >>>>>>>>>>>>> 09:38:09:WARNING:Error on "[date, -u, 011309382010.10]" command >>>>>>>>>>>>> trying to set the date. Exit value: 1 >>>>>>>>>>>>> 09:38:10:WARNING:[date, -u, 011309382010.11] >>>>>>>>>>>>> stderr: >>>>>>>>>>>>> date: cannot set date: Operation not permitted >>>>>>>>>>>>> >>>>>>>>>>>>> Leting faban change the vm clock sounds from the beginning a >>>>>>>>>>>>> bad idea. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> OK. So it is xen. Yes, this is what Faban is trying to solve. >>>>>>>>>>>> You can certainly turn it off. Please see: >>>>>>>>>>>> http://faban.sunsource.net/1.0/docs/howd services >>>>>>>>>>>> ApacheHttpdService, MemcachedService, MySQLService that come with Faban >>>>>>>>>>>> should be deployed before running Olio? >>>>>>>>>>>> I was gettingoi/physclocksync.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> I added the false in my run.xml file >>>>>>>>>>> ( btw in the link above there is a mistake : false >>>>>>>>>>> is correct, the second needs a >>>>>>>>>>> closing tag, the "/" is missing) >>>>>>>>>>> that made the warnings go away. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Unfortunately, xen is really bad in maintaining an accurate >>>>>>>>>>>>> clock. As a result there is usually time difference between the different >>>>>>>>>>>>> virtual machines >>>>>>>>>>>>> of more than 10ms. I went over the setTime function in Faban >>>>>>>>>>>>> source (/faban/com/sun/faban/harness/agent/CmdAgentImpl.java), it's big and >>>>>>>>>>>>> ugly (very ugly) >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks for the compliments! I think you mean >>>>>>>>>>>> CmdService.setClockTask. Time sensitive code ain't pretty. It is the >>>>>>>>>>>> complexities dealing with the clock and trying to achieve good accuracy. If >>>>>>>>>>>> you think you can simplify this, I'm listening (without loosing the >>>>>>>>>>>> accuracy, of course). In comparison, CmdAgentImpl has nothing. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Yes, you r right it is CmdService.setClockTask. The previous >>>>>>>>>>> email was composed at 3am ... :-) >>>>>>>>>>> I am still a little confused. the setClockTask is used to set >>>>>>>>>>> the clock so that all the machines are synchronized with master. From what >>>>>>>>>>> you mentioned the physical clock sync is only used for the logs. >>>>>>>>>>> Why do we need to do that since 1) it requires root privileges >>>>>>>>>>> (which might not be always available) 2) I could imagine an alternative that >>>>>>>>>>> uses deltas from the actual physical clock without having to set it. >>>>>>>>>>> ( I am probably missing something... :-) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Why there is this strict requirement for 10ms difference? Any >>>>>>>>>>>>> ideas? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> It is easily achievable in most cases. May not be true for VMs. >>>>>>>>>>>> >>>>>>>>>>>> On some VM architectures, the OS however does not get scheduled >>>>>>>>>>>> till way after that, thus causing problems. You may be able to measure >>>>>>>>>>>> performance on those VMs. But you don't want to use such VMs to be a driver. >>>>>>>>>>>> Your response time measurements will be way off. >>>>>>>>>>>> >>>>>>>>>>>> The physical clock sync is not really rigorous. And you can turn >>>>>>>>>>>> it off. It is more to keep the systems in good time sync. If your VM stands >>>>>>>>>>>> in the way, just turn it off. The driver's virtual clock sync is much more >>>>>>>>>>>> picky in comparison. This is because the start time for the steady state >>>>>>>>>>>> should be the same (with a very small tolerance) no matter how many drivers >>>>>>>>>>>> are driving. Otherwise the measurement period won't be the same when viewed >>>>>>>>>>>> from different drivers and the results won't be reliable. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Even with ntp it's hard to provide the 10ms guarantee. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> That's why we don't use ntp ;-) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Just out of curiosity, the physical clocks are set only once at >>>>>>>>>>> the beginning (right?), therefore for long runs the 10ms difference will not >>>>>>>>>>> be guaranteed. Nope? Especially under VMs I 've seen significant clock >>>>>>>>>>> difference withing a few minutes. >>>>>>>>>>> At least ntp can periodically resync (of course doing so, might >>>>>>>>>>> screw up the logs with time going backwards etc) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I am thinking of modifying this function to always return that >>>>>>>>>>>>> the time difference is less than 10ms (so that I do not have to wait all the >>>>>>>>>>>>> time for the timeouts.) >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Why bother. Don't like it, just turn it off. It has good use in >>>>>>>>>>>> most configurations we're dealing with. And, it avoids ntp inaccuracies. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Will this break anything in Olio? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Nope. Except the times in your logs will appear out of sequence. >>>>>>>>>>>> They rely on the local time on the originating systems. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> 4) Warning like: >>>>>>>>>>>>> 09:39:48:WARNING:Image at >>>>>>>>>>>>> http://olio-web:80/fileService.php?cache=false&file=e168t.jpg< >>>>>>>>>>>>> http://olio-web:80/fileService.php?cache=false&file=e168t.jpg> >>>>>>>>>>>>> size of 249 bytes is too small. Image may not exist >>>>>>>>>>>>> can be ignored, right? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Well, something is wrong. We don't have images that small. Check >>>>>>>>>>>> whether e168t.jpg is really that small. That's why we have that warning. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> It kinda funny, my problem was that I had the olio webkit version >>>>>>>>>>> installed and then I downloaded the version from the online svn repository. >>>>>>>>>>> I built the driver but forgot to update the webpage for my apache server. >>>>>>>>>>> Which >>>>>>>>>>> as expected was the source for many of my issues. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> 5) Last and most important. >>>>>>>>>>>>> I can run the benchmark and all the operation succeed but for >>>>>>>>>>>>> login. >>>>>>>>>>>>> I get a bunch of: >>>>>>>>>>>>> >>>>>>>>>>>>> 09:39:25:WARNING:UIDriverAgent[0].2.doLogin: Found login prompt >>>>>>>>>>>>> at index 2926, Login as at786o08x, 2178 failed. >>>>>>>>>>>>> Note: Error not counted in result. >>>>>>>>>>>>> Either transaction start or end time is not within steady >>>>>>>>>>>>> state. >>>>>>>>>>>>> java.lang.RuntimeException: Found login prompt at index 2926, >>>>>>>>>>>>> Login as at786o08x, 2178 failed. >>>>>>>>>>>>> at org.apache.olio.workload.driver.UIDriver.doLogin (404) >>>>>>>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0 (null) >>>>>>>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke (39) >>>>>>>>>>>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke (25) >>>>>>>>>>>>> at java.lang.reflect.Method.invoke (597) >>>>>>>>>>>>> at com.sun.faban.driver.engine.TimeThread.doRun (169) >>>>>>>>>>>>> at com.sun.faban.driver.engine.AgentThread.run (202) >>>>>>>>>>>>> >>>>>>>>>>>>> Any ideas? I do get >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> You likely have cookie issues. It can't seem to hold on to a >>>>>>>>>>>> session. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Well there was a permission issue with the http_session dir. I >>>>>>>>>>> could not right to it. chmod 777 it fixed this. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> (I ve found online: >>>>>>>>>>>>> http://www.mail-archive.com/olio-dev@incubator.apache.org/msg00647.htmlwhich is similar, but when I added >>>>>>>>>>>>> >>>>>>>>>>>>> com.sun.faban.driver.transport.sunhttp.ThreadCookieHandler.level=FINER >>>>>>>>>>>>> in build.properties >>>>>>>>>>>>> I did not see any cookie related warnings. Those should appear >>>>>>>>>>>>> in the olio run log or the apache log, right? Am i just looking at the wrong >>>>>>>>>>>>> place? ) >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Yes, that's applicable only to the Sun Http Transport. The >>>>>>>>>>>> version of Olio you're using is based on the Apache Http Transport (Apache >>>>>>>>>>>> HttpClient 3.1). The ThreadCookieHandler is not used for the Apache >>>>>>>>>>>> transport and that's why you don't see any logs. Try upgrade to Faban 1.0 >>>>>>>>>>>> before looking at other things. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> It's a long email I know. Your feedback would be most >>>>>>>>>>>>> appreciated. >>>>>>>>>>>>> >>>>>>>>>>>>> -Regards >>>>>>>>>>>>> >>>>>>>>>>>>> ------------------------------------------------------------------- >>>>>>>>>>>>> Kontorinis Vasileios >>>>>>>>>>>>> Phd student, University of California San Diego >>>>>>>>>>>>> San Diego, CA 92122 >>>>>>>>>>>>> Cell. phone: (858) 717 6899 >>>>>>>>>>>>> bkontorinis@gmail.com , >>>>>>>>>>>>> vkontori@ucsd.edu >>>>>>>>>>>>> >>>>>>>>>>>>> -------------------------------------------------------------------\ >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks for all the questions/comments. >>>>>>>>>>>> >>>>>>>>>>>> -Akara >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> And now some more questions/ comments: >>>>>>>>>>> 1) I get the following error: >>>>>>>>>>> >>>>>>>>>>> 15:13:05:SEVERE:CmdService: Getting - exception reading >>>>>>>>>>> /usr/data/olio-db.err >>>>>>>>>>> java.io.FileNotFoundException: File /usr/data/olio-db.err does >>>>>>>>>>> not exist. >>>>>>>>>>> at com.sun.faban.common.FileTransfer. (70) >>>>>>>>>>> at com.sun.faban.harness.agent.FileAgentImpl.get (315) >>>>>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0 (null) >>>>>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke (39) >>>>>>>>>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke (25) >>>>>>>>>>> at java.lang.reflect.Method.invoke (597) >>>>>>>>>>> at sun.rmi.server.UnicastServerRef.dispatch (305) >>>>>>>>>>> at sun.rmi.transport.Transport$1.run (159) >>>>>>>>>>> at java.security.AccessController.doPrivileged (null) >>>>>>>>>>> at sun.rmi.transport.Transport.serviceCall (155) >>>>>>>>>>> at sun.rmi.transport.tcp.TCPTransport.handleMessages (535) >>>>>>>>>>> at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0 >>>>>>>>>>> (790) >>>>>>>>>>> at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run >>>>>>>>>>> (649) >>>>>>>>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask >>>>>>>>>>> (885) >>>>>>>>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run (907) >>>>>>>>>>> at java.lang.Thread.run (619) >>>>>>>>>>> at >>>>>>>>>>> sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer (255) >>>>>>>>>>> at sun.rmi.transport.StreamRemoteCall.executeCall (233) >>>>>>>>>>> at sun.rmi.server.UnicastRef.invoke (142) >>>>>>>>>>> at com.sun.faban.harness.agent.FileAgentImpl_Stub.get (null) >>>>>>>>>>> at com.sun.faban.harness.engine.CmdService.get (1334) >>>>>>>>>>> at com.sun.faban.harness.RunContext.getFile (346) >>>>>>>>>>> at com.sun.services.MySQLService.getLogs (197) >>>>>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0 (null) >>>>>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke (39) >>>>>>>>>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke (25) >>>>>>>>>>> at java.lang.reflect.Method.invoke (597) >>>>>>>>>>> at com.sun.faban.harness.util.Invoker.invoke (98) >>>>>>>>>>> at com.sun.faban.harness.services.ServiceWrapper.getLogs >>>>>>>>>>> (200) >>>>>>>>>>> at com.sun.faban.harness.services.ServiceManager.getLogs >>>>>>>>>>> (642) >>>>>>>>>>> at com.sun.faban.harness.engine.GenericBenchmark.start (323) >>>>>>>>>>> at com.sun.faban.harness.engine.RunDaemon.run (338) >>>>>>>>>>> at java.lang.Thread.run (619) >>>>>>>>>>> 15:13:05:WARNING:Could not copy /usr/data/olio-db.err to >>>>>>>>>>> /home/gdhiman/faban.1.0/faban/output/OlioDriver.2D/mysql_err.log.olio-db >>>>>>>>>>> >>>>>>>>>>> Apparently something is misconfigured in my db-server. Any ideas? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2) I get the following error: >>>>>>>>>>> 15:13:16:WARNING:[/home/gdhiman/faban.1.0/faban/bin/fenxi, >>>>>>>>>>> process, /home/gdhiman/faban.1.0/faban/output/OlioDriver.2D/, >>>>>>>>>>> /home/gdhiman/faban.1.0/faban/output/OlioDriver.2D//post/, OlioDriver.2D] >>>>>>>>>>> stderr: >>>>>>>>>>> Error in executing perl >>>>>>>>>>> /home/gdhiman/faban.1.0/faban/master/webapps/fenxi/txt2db/ >>>>>>>>>>> mpstat.pl >>>>>>>>>>> Error in executing perl >>>>>>>>>>> /home/gdhiman/faban.1.0/faban/master/webapps/fenxi/txt2db/ >>>>>>>>>>> mpstat.pl >>>>>>>>>>> >>>>>>>>>>> Actually I traced back this one. The problem is the difference in >>>>>>>>>>> output format of the Sun's mpstat and default GNU mpstat. >>>>>>>>>>> This is my output of my mpstat: >>>>>>>>>>> >>>>>>>>>>> gdhiman@olio-client00:~/faban.1.0/faban/output/OlioDriver.2D$ >>>>>>>>>>> mpstat 1 >>>>>>>>>>> Linux 2.6.18.8-xen (olio-client00) 01/16/10 >>>>>>>>>>> >>>>>>>>>>> 16:25:06 CPU %user %nice %sys %iowait %irq %soft >>>>>>>>>>> %steal %idle intr/s >>>>>>>>>>> 16:25:07 all 0.00 0.00 0.00 0.00 0.00 >>>>>>>>>>> 0.00 0.00 100.00 52.48 >>>>>>>>>>> 16:25:08 all 0.00 0.00 0.00 0.00 0.00 >>>>>>>>>>> 0.00 0.00 100.00 50.50 >>>>>>>>>>> 16:25:09 all 0.00 0.00 0.00 0.00 0.00 >>>>>>>>>>> 0.00 0.00 100.00 79.21 >>>>>>>>>>> 16:25:10 all 0.00 0.00 0.00 0.00 0.00 >>>>>>>>>>> 0.00 0.00 100.00 45.54 >>>>>>>>>>> 16:25:11 all 0.00 0.00 0.00 0.00 0.00 >>>>>>>>>>> 0.00 0.00 100.00 55.45 >>>>>>>>>>> >>>>>>>>>>> The first line as well as the time at the beginning of each entry >>>>>>>>>>> messing up the parsing at mpstat.pl. (also the fields are >>>>>>>>>>> different) Any plans to support this?? >>>>>>>>>>> >>>>>>>>>>> 3) Scaling questions. >>>>>>>>>>> - So far I did not have a single experiment passing. Some are >>>>>>>>>>> pretty close with only one metric check failing. >>>>>>>>>>> >>>>>>>>>>> Average images loaded per Home Page2.79>= 3 >>>>>>>>>>> FAILED >>>>>>>>>>> Any ideas? Is it the case that the disc is not fast enough? I am >>>>>>>>>>> just using the local filesystem for the filestore. >>>>>>>>>>> >>>>>>>>>>> - As I double the number of concurrent users I observe linear >>>>>>>>>>> scaling in the thoughput. >>>>>>>>>>> Con Users Throughput >>>>>>>>>>> 25 4.967 >>>>>>>>>>> 50 10.06 >>>>>>>>>>> 100 19.375 >>>>>>>>>>> 200 40.21 >>>>>>>>>>> 400 75.818 >>>>>>>>>>> 800 0.383 >>>>>>>>>>> 1000 0.483 >>>>>>>>>>> >>>>>>>>>>> The linear scaling stops for 400 concurrent users ( only one >>>>>>>>>>> agent). Actually it would be exactly linear (value of ~80) but almost half >>>>>>>>>>> of the login operations failed. I am looking into it. >>>>>>>>>>> Any insights on what might be the first thing failing? >>>>>>>>>>> >>>>>>>>>>> For the 800 and 1000 experiments there are no failed operations >>>>>>>>>>> logged. It looks like those are being discarded... (?) >>>>>>>>>>> >>>>>>>>>>> Bonus question: >>>>>>>>>>> In the runtime statistics >>>>>>>>>>> >>>>>>>>>>> 30 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> only the 90% response time is reported. Is there an easy way to >>>>>>>>>>> also report the 99% ? ( or I need to add code for that?) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks a lot again in advance. >>>>>>>>>>> -VK >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > --00504502e31a533104047f6fa728 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable If you want to run multiple webservers on different systems, you must have = access to the filestore from all of them. The easiest way to do this is to = nfs-mount the filestore from the server it resides on so it is accessible t= o the other machines as well.

Shanti

On Thu, Feb 11, 201= 0 at 9:42 PM, Vasileios Kontorinis <bkontorinis@gmail.com> wrote:
Shanti hi again,
=A0=A0 Sorry for not submitting the JIRA on time, I am= extremely busy lately.

I have a fast question regarding the way th= e webserver interacts with the filestore. I run some scaling studies with o= ne, two and three different server while having only one filestore (I do sp= ecify that in the run.xml configuration file, webServer and dataStorage ). =
The filestore is a local folder on one of the server machines. However, in = the oliophp/etc/config.php I also specify on each server

$olioconfig= ['fileSystem'] =3D 'LocalFS';
$olioconfig['localfsRo= ot'] =3D '/home/gdhiman/filestore';

As a result, I do get WARNINGS for missing files on the webserver = that do not host a filestore. What is the right configuration for oliophp/e= tc/config.php? Can I somehow detach the filestore from the webserver so tha= t it requests files remotely?


Thanks again.
-------------------------------------------------------------------
Kont= orinis Vasileios
Phd student, University of California San Diego
San = Diego, CA 92122
Cell. phone: (858) 717 6899
bkontorinis@gmail.com, vkontori@ucsd.edu
-------------------------------------------------------------------


2010/2/8 Shanti Subramanyam <shanti.subramanyam@gmail.com>



On Mon, Feb 8, 2010 at 3:53 PM, Vas= ileios Kontorinis <bkontorinis@gmail.com> wrote:


We need to look into this issue=A0 - I suspect that something subtle has ch= anged in 0.2 which hasn't got accounted for in the expected #images loa= ded. Can I please request that you file a JIRA on this ?

How do I do this? Pointers?

=A0
I tried runs of 20mins to verify that longe= r runs will not make it better and it's still failing for just 50 users= .

What worries me is tha= t you're saying it =A0fails for 1800 users too - I can understand it ma= y fail for 50 users, but if it fails for larger #users, then it is a bug.= =A0
=A0
=A0
and I do get the repetitive p= atterns you mentioned. However, the cache_MB though never exceeds 0.05...I would expect that memcache size is really important for the application= scaling. What is the point of having a separate memcache server if we are = only using less than 50KB(?) of memory for caching?


Try running without = memcached - it can be easily configured in the app's etc/config.php. Th= en you will see what different the cache makes. The reduction in db traffic= is dramatic resulting in the response times you see. The reason the size i= s small is because we are currently only caching the home page which is sha= red. We have not bothered to implement any additional caching as this level= of caching is sufficient to reduce the db load.

Regards
-VK

Shanti=A0
=A0
Shanti


Thanks again
-------------------------------------= ------------------------------
Kontorinis Vasileios
Phd student, Univ= ersity of California San Diego
San Diego, CA 92122
Cell. phone: (858) 717 6899
bkontorinis@gmail.com, vkontori@ucsd.edu
-----= --------------------------------------------------------------


2010/1/27 Shanti Subramanyam <shanti.subramanyam@gmail.com>
Yes - these are problems that I'm already aware of.
The best soluti= on to the filestore issue is to change ownership of the directory to the sa= me user/group as the apache process. We could have the fileloader.sh change= write access I guess, but since that's a big security hole, we may not= want to do that automatically without letting the user know about it.

The fact that your response times are so high indicate = that you're running a far larger load than the system can handle and/or= you still need some tuning.=A0
I suggest you start over from say= 100 users and see at what point your response times start getting really l= arge. The apache error log should be pulled in as part of the 'Statisti= cs' tab, so do keep monitoring that.

Shanti


On Wed, Jan 27, 2010 at 1:34 AM, Vasi= leios Kontorinis <bkontorinis@gmail.com> wrote:
Shanti hi again,
=A0=A0 I checked my apache logs and there were a bunch = of errors.
It looks like there some issues with the webapp/php/trunk/cla= sses/ImageUtil.php in the last release of olio. (I downloaded http://www.alliedquotes.com/mirrors/apac= he/incubator/olio/0.2/apache-olio-php-src-0.2.tar.gz )
1) There is a line that needs to be commented. php complains ("1.5. Mu= st be greater than zero.").
2) Then, it was complaining that it ca= nnot find function fastimagecopyresampled . To work around that moved the f= unction fastimagecopyresampled above createThumb (this might not=A0 be requ= ired ) and deplared it static.
=A0=A0=A0 Finally,=A0 I call the function from createThumb with self::fasti= magecopyresampled .
3) Then, it started complaining because it could no= t write to the filestore. The problem is that wants to write the new images= as www-data from the apache, while the filestore does not have write persm= ission for others. Manually,
=A0=A0=A0 giving access solves the problem (chmod -R o+w <path>/files= tore) but since the directories in filestore are generated automatically, m= aybe the chmod command should be added in fileloader.sh

Funnily enou= gh, after fixing those issues, I still cannot pass the:
Average images loaded per Home Page 2.65=A0=A0 >=3D3=A0=A0=A0=A0=A0=A0 F= AILED

and on top of that I also have:
Response Times (secs)
Ad= dPerson=A0=A0=A0=A0 5.190=A0 13.194=A0 3.387 8.800=A0=A0=A0=A0 3.000 FAILED=
AddEvent=A0=A0=A0=A0=A0=A0 5.904=A0 16.784=A0 3.159 10.400=A0=A0 4.000 = FAILED

Think tims for AddPerson and AddEvent fail as well.

Any insights= are welcome .... :-(

--------------------------------= -----------------------------------
Kontorinis Vasileios
Phd student,= University of California San Diego
San Diego, CA 92122
Cell. phone: (858) 717 6899
bkontorinis@gmail.com, vkontori@ucsd.edu
-----= --------------------------------------------------------------


2010/1/26 Shanti Subramanyam <shanti.subramanyam@gmail.com>
Yes - 0.2 requires a lot more disk space as we changed the ratio of concurr= ent users to registered users to 1:100. If you haven't already, please = check out our published Blueprints for detailed performance characteristics= of the workload:

If you run for long enough, you should get passing run= s. Have you verified that there are no errors in the run logs when you see = the 'Avg. images loaded per home page' fail ?=A0

On to your open files error =A0- you may have to tune y= our networking tier and/or #open file descriptors. I don't believe we h= ave ever seen as many files open as you are seeing. Can you determine wheth= er these are from the file store or network ? We also typically run the fil= estore on a different system and nfs-mount it on the webserver box.
You will have to tune your system to ensure good performance since you= will need memory for both apache and files.=A0

Shanti


On Mon, Jan 25, 2010 at 5:06 PM, Vasileios Konto= rinis <bkontorinis@gmail.com> wrote:
Akara and Shanti hi,=A0=A0 I did migrate to Olio 0.2. With the last version of Olio I came acr= oss some new interesting things.

Scaling issues:
=A0 - I am still getting the:
Average images loaded per Home Pag= e2.55>=3D 3
FAILED

=A0- additionally, when I scale the con= current users to 800 I run out of diskspace since my filestore occupies mor= e than 62GB.
Actually for 600 users it occupies 50GB. I was curious if t= hat makes sense. How much space I will need to reach 1000 users?
In the php_setup.html it suggests that we will need 50GB but apparently we = need way more for large number of users.

=A0- Finally and most impo= rtantly, for 600 users many of the operations fail with the exception:
Message:
java.net.SocketException: Too many open files

Stack Trace:
<= tbody>
Class Method Line
java.net.PlainSocketImpl socketAccept =A0
java.net.PlainSocketImpl accept 390
java.net.ServerSocket implAccept 453
java.net.ServerSocket accept 421
sun.rmi.transport.tcp.TCPTransport$Accept= Loop executeAcceptLoop 369
sun.rmi.transport.tcp.TCPTransport$Accept= Loop run 341
java.lang.Thread run 619

or

java.net.SocketException: To= o many open files
Stack Trace:
Class Method Line
java.net.Socket createImpl 394
java.net.Socket getImpl 457
java.net.Socket bind 571
com.sun.faban.driver.transport.hc3.Protoc= olTimedSocketFactory createSocket 60
org.apache.commons.httpclient.HttpConnect= ion open 707
org.apache.commons.httpclient.HttpMethodD= irector executeWithRetry 387
org.apache.commons.httpc= lient.HttpMethodDirector executeMethod 171
org.apache.commons.httpclient.HttpClient<= /td> executeMethod 397
org.apache.commons.httpclient.HttpClient<= /td> executeMethod 323
com.sun.faban.driver.transport.hc3.Apache= HC3Transport readURL 274
org.apache.olio.workload.driver.UIDriver<= /td> doLogin 398
org.apache.olio.workload.driver.UIDriver<= /td> doLogin 424
sun.reflect.GeneratedMethodAccessor8 invoke =A0
sun.reflect.DelegatingMe= thodAccessorImpl invoke 25
java.lang.reflect.Method invoke 597
com.sun.faban.driver.engine.TimeThread doRun 169
com.sun.faban.driver.engine.AgentThrea


I am monitoring the number of open files in = the web-server with=A0=A0 `watch "lsof | wc"` and the olio starts= failing when around 65000-70,000 files are open. lsof shows that for each = apache2 thread there are around 100 files open. Therefore there are around = 650-700 different apache2 threads that create the bulk of those open file d= escriptors.
The soft and hard limit is set to 403238, which means that there should be = many more open files before it will start failing.
(Actually, I verifie= d the limit by opening a bunch of files with a python script and it does re= ach the limitation of 403238.)
Any insights?=A0 Is there any chance the the file descriptors take more tim= e that usual to be reclaimed after being closed in the xen vm I use for my = web-server? Does it make sense for olio at the first place to have so many = files open at the same time?

Thanks again.


-------------------------------= ------------------------------------
Kontorinis Vasileios
Phd student= , University of California San Diego
San Diego, CA 92122
Cell. phone: (858) 717 6899
bkontorini= s@gmail.com, vko= ntori@ucsd.edu
-----------------------------------------------------= --------------


2010/1/16 Shanti Subramanyam <shanti.subramanyam@gmail.com>

I would really recommend that you migrate to Olio 0.2. In addition to bug f= ixes, there are some major features changes in it. See=A0Olio= 0.2=A0released=A0

Shanti

On Sat, Jan 16, 2010 at 4:49 PM, Vasileios= Kontorinis <bkontorinis@gmail.com> wrote:
Akara hi again,
=A0=A0 Below I have comments on your suggestions and at = the end some bonus questions... Thanks again.

2010/1/13 Akara Sucharitakul <Akara.Sucharitakul@sun.com= >
With your permission, I'd like to copy the Olio and Faban user aliases = going forward. I feel it will help a much wider audience. Please see below = for answers/comments:

Sure. I cced olio user alias. I am not sure which is = the right faban list.
=A0
Vasileios Kontorinis wrote:
Akara hi,
=A0 I am a grad student at UCSD and I use Olio for a research project wher= e we want to measure olio performance under live virtual machine migration.= We use ubuntu 8.04 on nehalem servers.
I have co ed the last version of olio from the online svn repository and do= wnloaded the last version of faban (faban-kit-101509.tar.gz <http://faban.sunsource.net/nightly/faban-kit-101509.tar.gz>)

101509 is fairly recent. But the latest on the web site is 111109 (Faban 1.= 0). There were just bug fixes between those releases.

= I have upgraded to Faban 1.0, still using olio1.0 though ( the release of 2= .0 was announced, will switch to it if I run into bugs that have been fixed= )
=A0



So far, I employed a bunch of hacks to get most of it to work and I am almo= st there. In the process I got a bunch of questions.

Questions (some of them might be just faban related, not olio so bear with = me):
1) In there any way to deploy OlioDriver.jar through the command line? Fire= fox through ssh forwarding is dead slow and I d rather avoid if I can.

Just drop the jar into faban/benchmarks/ and it will deploy itself. This is= documented at http://faban.sunsource.net/1.0/= docs/guide/harnessdev/deploybenchmark.html under "Alternate Deploy= ment Methods."


2) The services ApacheHttpdService, MemcachedService, MySQLService that com= e with Faban should be deployed before running Olio?
=A0 =A0I was getting some very weird errors. e.g.

Yes, you should. Olio will search for those.

=
Done
=A0


03:50:27:WARNING:UIDriverAgent[0]: Forcefully terminating benchmark run
03:50:27:WARNING:UIDriverAgent[0]: 25 threads forcefully terminated.
java.lang.Throwable: Stack of non-terminating thread.
=A0 =A0at java.net.SocketInputStream.socketRead0 (null)
=A0 =A0at java.net.SocketInputStream.read (129)
=A0 =A0at java.io.FilterInputStream.read (116)
=A0 =A0at com.sun.faban.driver.transport.util.TimedInputStream.read (139)<= br> =A0 =A0at java.io.BufferedInputStream.fill (218)
=A0 =A0at java.io.BufferedInputStream.read (237)
=A0 =A0at org.apache.commons.httpclient.HttpParser.readRawLine (78)
=A0 =A0at org.apache.commons.httpclient.HttpParser.readLine (106)
=A0 =A0at org.apache.commons.httpclient.HttpConnection.readLine (1116)
=A0 =A0at org.apache.commons.httpclient.HttpMethodBase.readStatusLine (197= 3)
=A0 =A0at org.apache.commons.httpclient.HttpMethodBase.readResponse (1735)=
=A0 =A0at org.apache.commons.httpclient.HttpMethodBase.execute (1098)
=A0 =A0at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetr= y (398)
=A0 =A0at org.apache.commons.httpclient.HttpMethodDirector.executeMethod (= 171)
=A0 =A0at org.apache.commons.httpclient.HttpClient.executeMethod (397)
=A0 =A0at org.apache.commons.httpclient.HttpClient.executeMethod (323)
=A0 =A0at com.sun.faban.driver.transport.hc3.ApacheHC3Transport.fetchURL (= 529)
=A0 =A0at com.sun.faban.driver.transport.hc3.ApacheHC3Transport.fetchURL (= 552)
=A0 =A0at org.apache.olio.workload.driver.UIDriver.doHomePage (355)
=A0 =A0at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
=A0 =A0at sun.reflect.NativeMethodAccessorImpl.invoke (39)
=A0 =A0at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
=A0 =A0at java.lang.reflect.Method.invoke (597)
=A0 =A0at com.sun.faban.driver.engine.TimeThread.doRun (169)
=A0 =A0at com.sun.faban.driver.engine.AgentThread.run (202)

and afterwards the master was waiting for threads to join for ever... (I at= tached gdb to verify that something was wrong) and hence I had to kill the = benchmark.

These threads are hanging reading the server responses, that never came.

Building the services from Faban probably= fixes it.

=A0


In the Olio log there are WARNINGS =A0complaining about not deploying those= . After building those and manually copying them to /faban/services (ant de= ploy did not place them there... :-( =A0)

Yes. But ant deploy should get them there. If not, can you please let me kn= ow the ant messages?
=A0
Ant was deploying them indeed.= I had a mistake in building.properties.
I had:=A0 faban.url=3Dhttp://&l= t;hostname>:9980/=A0=A0 instead of=A0 faban.url=3Dhttp://localhost:9980/
After I changed that it started working...
=A0

it worked. (mostly worked)

3) I still have warnings like:
01:38:08:INFO:Time difference to host olio-web is 269 ms. Attempting to set= clock.
01:38:08:INFO:Time difference to host olio-db is 263 ms. Attempting to set = clock.

These two are OK. Just trying to do a clock sync between the systems.
<= br>
01:38:08:WARNING:olio-web wakeup-before time reached 700ms limit. System is= too busy. Giving up.

This is one of Faban's clock-setting calibrations. If the system is too= busy or you run on some virtualization architectures, the lag time between= an intended end of sleep and the actual time when the thread really wakes = up (gets scheduled/executed) is too high, calibrations will fail.


01:38:08:INFO:Time difference to host olio-mem is 262 ms. Attempting to set= clock.
01:38:10:WARNING:olio-db wakeup-before time reached 700ms limit. System is = too busy. Giving up.
09:38:09:WARNING:[date, -u, 011309382010.10]
stderr:
date: cannot set date: Operation not permitted
09:38:09:WARNING:Error on "[date, -u, 011309382010.10]" command t= rying to set the date. Exit value: 1
09:38:10:WARNING:[date, -u, 011309382010.11]
stderr:
date: cannot set date: Operation not permitted
09:38:10:WARNING:Error on "[date, -u, 011309382010.11]" command t= rying to set the date. Exit value: 1
09:38:09:WARNING:[date, -u, 011309382010.10]
stderr:
date: cannot set date: Operation not permitted
09:38:09:WARNING:Error on "[date, -u, 011309382010.10]" command t= rying to set the date. Exit value: 1
09:38:10:WARNING:[date, -u, 011309382010.11]
stderr:
date: cannot set date: Operation not permitted

Leting faban change the vm clock sounds from the beginning a bad idea.

OK. So it is xen. Yes, this is what Faban is trying to solve. You can certa= inly turn it off. Please see:
=A0 http://faban.sunsource.net/1.0/docs/howd services Apa= cheHttpdService, MemcachedService, MySQLService that come with Faban should= be deployed before running Olio?
=A0 =A0I was gettingoi/physclocksync.html

<= div>
I added the=A0 <fh:timeSync>false<fh:tim= eSync> in my run.xml file ( btw in the link above there is a mistake := =A0 <fh:timeSync>false</fh:timeSync> is correct, the second <fh:timeSync> needs a closing tag, the &= quot;/" is missing)
that made the warnings go away.

=A0

Unfortunately, xen is really bad in maintaining an accurate clock. As a res= ult there is usually time difference between the different virtual machines=
of more than 10ms. I went over the setTime function in Faban source (/faban= /com/sun/faban/harness/agent/CmdAgentImpl.java), it's big and ugly (ver= y ugly)

Thanks for the compliments! I think you mean CmdService.setClockTask. Time = sensitive code ain't pretty. It is the complexities dealing with the cl= ock and trying to achieve good accuracy. If you think you can simplify this= , I'm listening (without loosing the accuracy, of course). In compariso= n, CmdAgentImpl has nothing.


Yes, you r right it is CmdService.setClock= Task. The previous email was composed at 3am ... :-)
I am still a little= confused.=A0 the setClockTask is used to set the clock so that all the mac= hines are synchronized with master. From what you mentioned the physical cl= ock sync is only used for the logs.
Why do we need to do that since 1) it requires root privileges (which might= not be always available) 2) I could imagine an alternative that uses delta= s from the actual physical clock without having to set it.
( I am proba= bly missing something... :-)



Why there is this strict requirement for 10ms difference? Any ideas?

It is easily achievable in most cases. May not be true for VMs.

On some VM architectures, the OS however does not get scheduled till way af= ter that, thus causing problems. You may be able to measure performance on = those VMs. But you don't want to use such VMs to be a driver. Your resp= onse time measurements will be way off.

The physical clock sync is not really rigorous. And you can turn it off. It= is more to keep the systems in good time sync. If your VM stands in the wa= y, just turn it off. The driver's virtual clock sync is much more picky= in comparison. This is because the start time for the steady state should = be the same (with a very small tolerance) no matter how many drivers are dr= iving. Otherwise the measurement period won't be the same when viewed f= rom different drivers and the results won't be reliable.


Even with ntp it's hard to provide the 10ms guarantee.

That's why we don't use ntp ;-)

Just out of cu= riosity, the physical clocks are set only once at the beginning (right?), t= herefore for long runs the 10ms difference will not be guaranteed. Nope? Es= pecially under VMs I 've seen significant clock difference withing a fe= w minutes.=A0
At least ntp can periodically resync (of course doing so, might screw up th= e logs with time going backwards etc)
=A0


I am thinking of modifying this function to always return that the time dif= ference is less than 10ms (so that I do not have to wait all the time for t= he timeouts.)

Why bother. Don't like it, just turn it off. It has good use in most co= nfigurations we're dealing with. And, it avoids ntp inaccuracies.
<= br>
Will this break anything in Olio?

Nope. Except the times in your logs will appear out of sequence. They rely = on the local time on the originating systems.


4) Warning like:
09:39:48:WARNING:Image at http://olio-web:80/fileS= ervice.php?cache=3Dfalse&file=3De168t.jpg <http://olio-web:80/fileService.php?cache=3Dfalse&file=3De168t.jpg> size of 249 bytes is too small. Image may not exist
can be ignored, right?

Well, something is wrong. We don't have images that small. Check whethe= r e168t.jpg is really that small. That's why we have that warning.
=


5) Last and most important.
I can run the benchmark and all the operation succeed but for login.
I get a bunch of:

09:39:25:WARNING:UIDriverAgent[0].2.doLogin: Found login prompt at index 29= 26, Login as at786o08x, 2178 failed.
Note: Error not counted in result.
Either transaction start or end time is not within steady state.
java.lang.RuntimeException: Found login prompt at index 2926, Login as at78= 6o08x, 2178 failed.
=A0 =A0at org.apache.olio.workload.driver.UIDriver.doLogin (404)
=A0 =A0at sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
=A0 =A0at sun.reflect.NativeMethodAccessorImpl.invoke (39)
=A0 =A0at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
=A0 =A0at java.lang.reflect.Method.invoke (597)
=A0 =A0at com.sun.faban.driver.engine.TimeThread.doRun (169)
=A0 =A0at com.sun.faban.driver.engine.AgentThread.run (202)

Any ideas? I do get

You likely have cookie issues. It can't seem to hold on to a session.





(I ve found online:
http://www.mail-archive.com/= olio-dev@incubator.apache.org/msg00647.html which is similar, but when = I added

com.sun.faban.driver.transport.sunhttp.ThreadCookieHandler.level=3DFINER = =A0in build.properties
I did not see any cookie related warnings. Those should appear in the olio = run log or the apache log, right? Am i just looking at the wrong place? )

Yes, that's applicable only to the Sun Http Transport. The version of O= lio you're using is based on the Apache Http Transport (Apache HttpClie= nt 3.1). The ThreadCookieHandler is not used for the Apache transport and t= hat's why you don't see any logs. Try upgrade to Faban 1.0 before l= ooking at other things.



It's a long email I know. Your feedback would be most appreciated.

-Regards
-------------------------------------------------------------------
Kontorinis Vasileios
Phd student, University of California San Diego
San Diego, CA 92122
Cell. phone: (858) 717 6899
bkontorini= s@gmail.com <mailto:bkontorinis@gmail.com>, vkontori@ucsd.edu <mailto:vkontori@ucsd.edu>
-------------------------------------------------------------------\

Thanks for all the questions/comments.

-Akara



And now some more questions/ comments:1) I get the following error:

15:13:05:SEVERE:CmdService: Getting -= exception reading /usr/data/olio-db.err
java.io.FileNotFoundException: = File /usr/data/olio-db.err does not exist.
=A0=A0=A0 at com.sun.faban.common.FileTransfer.<init> (70)
=A0=A0= =A0 at com.sun.faban.harness.agent.FileAgentImpl.get (315)
=A0=A0=A0 at = sun.reflect.NativeMethodAccessorImpl.invoke0 (null)
=A0=A0=A0 at sun.ref= lect.NativeMethodAccessorImpl.invoke (39)
=A0=A0=A0 at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
=A0=A0= =A0 at java.lang.reflect.Method.invoke (597)
=A0=A0=A0 at sun.rmi.server= .UnicastServerRef.dispatch (305)
=A0=A0=A0 at sun.rmi.transport.Transpor= t$1.run (159)
=A0=A0=A0 at java.security.AccessController.doPrivileged (= null)
=A0=A0=A0 at sun.rmi.transport.Transport.serviceCall (155)
=A0=A0=A0 at = sun.rmi.transport.tcp.TCPTransport.handleMessages (535)
=A0=A0=A0 at sun= .rmi.transport.tcp.TCPTransport$ConnectionHandler.run0 (790)
=A0=A0=A0 a= t sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run (649)
=A0=A0=A0 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask (885)=A0=A0=A0 at java.util.concurrent.ThreadPoolExecutor$Worker.run (907)
= =A0=A0=A0 at java.lang.Thread.run (619)
=A0=A0=A0 at sun.rmi.transport.S= treamRemoteCall.exceptionReceivedFromServer (255)
=A0=A0=A0 at sun.rmi.transport.StreamRemoteCall.executeCall (233)
=A0=A0= =A0 at sun.rmi.server.UnicastRef.invoke (142)
=A0=A0=A0 at com.sun.faban= .harness.agent.FileAgentImpl_Stub.get (null)
=A0=A0=A0 at com.sun.faban.= harness.engine.CmdService.get (1334)
=A0=A0=A0 at com.sun.faban.harness.RunContext.getFile (346)
=A0=A0=A0 at= com.sun.services.MySQLService.getLogs (197)
=A0=A0=A0 at sun.reflect.Na= tiveMethodAccessorImpl.invoke0 (null)
=A0=A0=A0 at sun.reflect.NativeMet= hodAccessorImpl.invoke (39)
=A0=A0=A0 at sun.reflect.DelegatingMethodAccessorImpl.invoke (25)
=A0=A0= =A0 at java.lang.reflect.Method.invoke (597)
=A0=A0=A0 at com.sun.faban.= harness.util.Invoker.invoke (98)
=A0=A0=A0 at com.sun.faban.harness.serv= ices.ServiceWrapper.getLogs (200)
=A0=A0=A0 at com.sun.faban.harness.services.ServiceManager.getLogs (642)=A0=A0=A0 at com.sun.faban.harness.engine.GenericBenchmark.start (323)
= =A0=A0=A0 at com.sun.faban.harness.engine.RunDaemon.run (338)
=A0=A0=A0 = at java.lang.Thread.run (619)
15:13:05:WARNING:Could not copy /usr/data/olio-db.err to /home/gdhiman/faba= n.1.0/faban/output/OlioDriver.2D/mysql_err.log.olio-db

Apparently so= mething is misconfigured in my db-server. Any ideas?

2) I get the f= ollowing error:
15:13:16:WARNING:[/home/gdhiman/faban.1.0/faban/bin/fenxi, process, /home/g= dhiman/faban.1.0/faban/output/OlioDriver.2D/, /home/gdhiman/faban.1.0/faban= /output/OlioDriver.2D//post/, OlioDriver.2D]
stderr:
Error in executi= ng perl /home/gdhiman/faban.1.0/faban/master/webapps/fenxi/txt2db/mpstat.pl
Error in executing perl /home/gdhiman/faban.1.0/faban/master/webapps/fenxi/= txt2db/mpstat.pl

= Actually I traced back this one. The problem is the difference in output fo= rmat of the Sun's mpstat and default GNU mpstat.
This is my output of my mpstat:

gdhiman@olio-client00:~/faban.1.0/fa= ban/output/OlioDriver.2D$ mpstat 1=A0=A0=A0
Linux 2.6.18.8-xen (olio-cl= ient00) =A0=A0=A0 01/16/10

16:25:06=A0=A0=A0=A0 CPU=A0=A0 %user=A0= =A0 %nice=A0=A0=A0 %sys %iowait=A0=A0=A0 %irq=A0=A0 %soft=A0 %steal=A0=A0 %= idle=A0=A0=A0 intr/s
16:25:07=A0=A0=A0=A0 all=A0=A0=A0 0.00=A0=A0=A0 0.00=A0=A0=A0 0.00=A0=A0=A0= 0.00=A0=A0=A0 0.00=A0=A0=A0 0.00=A0=A0=A0 0.00=A0 100.00=A0=A0=A0=A0 52.48=
16:25:08=A0=A0=A0=A0 all=A0=A0=A0 0.00=A0=A0=A0 0.00=A0=A0=A0 0.00=A0= =A0=A0 0.00=A0=A0=A0 0.00=A0=A0=A0 0.00=A0=A0=A0 0.00=A0 100.00=A0=A0=A0=A0= 50.50
16:25:09=A0=A0=A0=A0 all=A0=A0=A0 0.00=A0=A0=A0 0.00=A0=A0=A0 0.0= 0=A0=A0=A0 0.00=A0=A0=A0 0.00=A0=A0=A0 0.00=A0=A0=A0 0.00=A0 100.00=A0=A0= =A0=A0 79.21
16:25:10=A0=A0=A0=A0 all=A0=A0=A0 0.00=A0=A0=A0 0.00=A0=A0=A0 0.00=A0=A0=A0= 0.00=A0=A0=A0 0.00=A0=A0=A0 0.00=A0=A0=A0 0.00=A0 100.00=A0=A0=A0=A0 45.54=
16:25:11=A0=A0=A0=A0 all=A0=A0=A0 0.00=A0=A0=A0 0.00=A0=A0=A0 0.00=A0= =A0=A0 0.00=A0=A0=A0 0.00=A0=A0=A0 0.00=A0=A0=A0 0.00=A0 100.00=A0=A0=A0=A0= 55.45

The first line as well as the time at the beginning of each e= ntry messing up the parsing at mpstat.pl. (also the fields are different) =A0 Any plans to support = this??

3) Scaling questions.
- So far I did not have a single experiment pa= ssing. Some are pretty close with only one metric check failing.

Average images loaded per Home Pag= e2.79>=3D 3
FAILED
Any ideas? Is it the case that the disc= is not fast enough? I am just using the local filesystem for the filestore= .

- As I double the number of concurrent users I observe linear scal= ing in the thoughput.
Con Users=A0=A0=A0=A0=A0=A0=A0=A0 Throughput
=A025=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 4.967
=A050=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 10.06
100=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 19.375
200=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 40.21
400= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 75.818
8= 00 =A0=A0 =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0.383
1000=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0.483
<= br>The linear scaling stops for 400 concurrent users ( only one agent). Act= ually it would be exactly linear (value of ~80) but almost half of the logi= n operations failed. I am looking into it.
Any insights on what might be the first thing failing?

For the 800 a= nd 1000 experiments there are no failed operations logged. It looks like th= ose are being discarded... (?)

Bonus question:
In the runtime sta= tistics
<runtimeStats enabled=3D"true">
=A0 =A0 =A0 =A0 <interval>30</= interval>
</runtimeStats>=

only the 90% response time is reported. Is there an easy way= to also report the 99% ? ( or I need to add code for that?)


Thanks a lot again in advance.
-VK











--00504502e31a533104047f6fa728--