Return-Path: Delivered-To: apmail-roller-user-archive@www.apache.org Received: (qmail 44374 invoked from network); 27 May 2010 21:59:47 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 27 May 2010 21:59:47 -0000 Received: (qmail 61297 invoked by uid 500); 27 May 2010 21:59:46 -0000 Delivered-To: apmail-roller-user-archive@roller.apache.org Received: (qmail 61226 invoked by uid 500); 27 May 2010 21:59:46 -0000 Mailing-List: contact user-help@roller.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@roller.apache.org Delivered-To: mailing list user@roller.apache.org Received: (qmail 61218 invoked by uid 99); 27 May 2010 21:59:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 May 2010 21:59:45 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [68.142.199.183] (HELO web307.biz.mail.mud.yahoo.com) (68.142.199.183) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 27 May 2010 21:59:37 +0000 Received: (qmail 31692 invoked by uid 60001); 27 May 2010 21:59:14 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1274997554; bh=NuhKVS6knUAxOJR3RZMkdbHSJ77d0r12JoIXc4DLKaw=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=dpIy5lH5PK1TA+G8m9YKWzvwYZLW97oa/mIrpR6EuYC5pi6qPC3p42p55gxexWwlPCGchM88mdGyNQKzLJRNZqhB8NuSGrjWLPR8fhADm2XdXget3sBGRpk0iGt8AN9MlgOVEstt+sdhjkYS4/mGRO/MT3XOtqWDO2TS+vRdGAg= Message-ID: <589188.30509.qm@web307.biz.mail.mud.yahoo.com> X-YMail-OSG: OdpjaQ4VM1k3ZmRUu15wLEUHobmThZDD0GySSkG6DajZVmb ZMjW7TXAspcnjA_hBJrBXy2IVHLRzwPuv2BzaKmvu8_5_XsZ0bkkl_TsUrI3 T_37n10pnHUT4U4vjAZ0hzun4o0hpRzsk9ZRMLI0iNDbJtwTlYMHM6EAIjf8 5sgUPDsMCd1cWA1IrsSXOmtxw_PRnGe7_MhMkHnogh49h77_y9S9GIVsN9dP dBbG.ZE3GTa2ZNYjKrkzQV8jm2b3aSxV2iDoJaIkki25lTtP6WKIepbcx83C rMkE60ZBT0emQQbcLetrD7DYjeBTnRWyWwA-- Received: from [70.91.36.14] by web307.biz.mail.mud.yahoo.com via HTTP; Thu, 27 May 2010 14:59:14 PDT X-Mailer: YahooMailClassic/11.0.8 YahooMailWebService/0.8.103.269680 Date: Thu, 27 May 2010 14:59:14 -0700 (PDT) From: "\(David\) Ming Xia" Subject: Resend -- About weblog view data access To: user@roller.apache.org, Mailing List Apache Roller Developer In-Reply-To: <414948.92951.qm@web302.biz.mail.mud.yahoo.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="0-373262909-1274997554=:30509" X-Virus-Checked: Checked by ClamAV on apache.org --0-373262909-1274997554=:30509 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi, Dave. =C2=A0=20 =C2=A0=C2=A0 Sorry for the messed up text.=C2=A0 The following I re-send my= last mail. =C2=A0=C2=A0 Still, this is about the weblog view data access.=20 =C2=A0=C2=A0=20 =C2=A0=C2=A0 The web handles specified in roller properties rendering weblo= gMapper.rollerProtectedUrls are all for user account console and they are n= ot going to appear in user created websites.=C2=A0 They are not of any conc= ern.=C2=A0=C2=A0 What concern us are the requests with URI pattern =E2=80= =98/roller-ui/rendering/resources=E2=80=99, which are specified in theme.xm= l as elements of .=C2=A0=C2=A0 WeblogRequestMapper validates the= handle of an incoming web page text/html content and then validates the ha= ndle of each incoming request sent from the corresponding browser client fo= llowing the URL links specified in that incoming text/html content.=C2=A0 T= he validating function is WeblogRequestMapper.isWeblog(String potentialHand= le). =C2=A0 Take an example, for a web page has ten links for css, js and images= , we are going to have one request and then eleven requests.=C2=A0 For each= request Roller will do the following things: =C2=A0=C2=A0=C2=A0=C2=A0 1.=C2=A0 Retrieve a connection instance from conne= ction pool, or create a new JDBC connection =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2. Retrieve the prepared statement from serv= er statement cache, or create a prepared statement for the named query =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3. Set parameter =E2=80=98handle=E2=80= =99 and execute the sql queryGet all the data for the =C2=A0=C2=A0=C2=A0=C2=A0 specified weblog, this includes instances of root = category and categories =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 4. Recycle the connection or close and disca= rd it for GC=20 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 5. Create a new weblog object and populate d= ata to this object =C2=A0=C2=A0 So in this example, for one web page request Roller consumes e= leven JDBC connection instances, and creates eleven weblog objects to just = check whether the object exists or not.=C2=A0 If some websites on Roller ta= ke high volume of http requests, the Roller database could easily be overwh= elmed and turn into deadlock.=C2=A0 With all those later incoming requests = in line, the memory usage will touch the ceiling.=C2=A0=C2=A0 And now the d= atabase is the single point of failure.=C2=A0 Without the database standing= there validate web handle for each request and Last-Modified for each text= /html request, we are going to see a dead-white page that will go nowhere.= =C2=A0 I believe this is highly possible.=C2=A0 Take a look at those techni= cal parameters and usage of database servers, it is obvious that database s= ervers are not designed for a kind of tasks Roller is doing now in validati= ng each http request.=C2=A0=20 =C2=A0=C2=A0=C2=A0 I would suggest that cache should be used for weblog pag= e view.=C2=A0 Put it simply, Roller should have cache for weblog and weblog= entries.=C2=A0 Roller users manage their account, persist changes to datab= ase and update the changes into cache.=C2=A0=C2=A0 Roller users' passwords = are not cached, this is for security reason.=C2=A0 Roller viewers retrieve = web content, all they see are from cache, they should never touch database.= =C2=A0 Something like referrer address or hit counts will be cached and be = persisted to database at server stopping, or at administrators=E2=80=99 com= mand.=C2=A0=20 =C2=A0=C2=A0 The current caching system does not fit the task I described.= =C2=A0 Current Roller caches are just local hash maps or hash tables, they = are not distributed; It has no synchronization of weblog content, especiall= y the value =E2=80=98Last-Modified=E2=80=99 for multiple server threads.=C2= =A0=C2=A0 While nowadays most production environments are clustering enviro= nment, composed of multiple JVMs and application server runtimes.=20 =C2=A0=C2=A0 I learned that Ehcache support distributed map.=C2=A0 I know t= hat WebSphere cache instance implements IBM distributed map.=C2=A0 The best= solution for Roller is an interface for third party distributed cache acce= ssed with JNDI lookup, otherwise, Roller bundled with Ehcache is also very = good.=20 Thank you. David --- On Wed, 5/26/10, (David) Ming Xia wrote: From: (David) Ming Xia Subject: About weblog view data access To: user@roller.apache.org, "Mailing List Apache Roller Developer" Date: Wednesday, May 26, 2010, 8:30 PM Hi, Dave. =C2=A0=C2=A0=C2=A0=20 =C2=A0 Still, this is about the weblog view data access.=C2=A0=20 =C2=A0=C2=A0 The web handles specified in roller properties rendering weblogMapper.rollerProtectedUrls are all for user account console and they = are not going to appear in user created websites.=C2=A0 They are not of any concern.=C2=A0=C2=A0 What concern us are the requests with URI pattern =E2=80=98/roller-ui/rendering/resources=E2=80=99, which are specified in th= eme.xml as elements of .=C2=A0=C2=A0 WeblogRequestMapper validates the handle of an incoming web page text/html content and then validates the handle of each incoming request sent from the corresponding browser client following the URL links specified in that incoming text/html content.=C2=A0 The validating function is WeblogRequestMapper.isWeblog(Stri= ng potentialHandle). =C2=A0 =C2=A0 Take an example, for a web page has ten links for css, js and images, we are going to have one request and then ele= ven requests.=C2=A0 For each request Roller will do the following things: =C2=A0 Retrieve a connection instance =C2=A0 =C2=A0=C2=A0=C2=A0from connection pool, or create a new JDBC connect= ionRetrieve the prepared statement =C2=A0 =C2=A0=C2=A0=C2=A0from server statement cache, or create a prepared = statement for the named =C2=A0 =C2=A0=C2=A0=C2=A0querySet parameter =E2=80=98handle=E2=80=99 and =C2=A0 =C2=A0=C2=A0=C2=A0execute the sql queryGet all the data for the =C2=A0 =C2=A0=C2=A0=C2=A0specified weblog, this includes instances of root = category and categoriesRecycle the connection or close =C2=A0 =C2=A0=C2=A0=C2=A0and discard it for GC Create a new weblog object a= nd =C2=A0 =C2=A0=C2=A0=C2=A0populate data to this object =C2=A0 =C2=A0=C2=A0 So in this example, for one web page request Roller consumes eleven JDBC connection instances, and creates eleven weblog objects to just check whether the obje= ct exists or not.=C2=A0 If some websites on Roller take high volume of http requests, the Roller database could easily = be overwhelmed and turn into deadlock.=C2=A0 With all those later incoming requests in line, the memory usage will touch the ceiling.=C2=A0=C2=A0 And now the database is the single point of failure.=C2=A0 Without the database standing there validate web handle for each request and Last-Modified for each text/html request, we are going to see a dead-wh= ite page that will go nowhere.=C2=A0 I believe this is highly possible.=C2=A0 Take a look at those technical parameters and usage of database servers, it is obvious tha= t database servers are not designed for a kind of tasks Roller is doing now i= n validating each http request.=C2=A0=C2=A0=20 =C2=A0 =C2=A0 =C2=A0=C2=A0=C2=A0 I would suggest that cache should be used for weblog pag= e view.=C2=A0 Put is simply, Roller should have cache for weblog and weblog entries.=C2=A0 Roller users manage their account, persist changes to database and update the changes into cache.=C2=A0=C2=A0 Roller users' passwords are not cached, this is for security reason.=C2=A0 Roller = viewers retrieve web content, all they see are from cache, they should never touch database.=C2=A0 Something like referrer address or hit counts will be cached and be persisted to data= base at server stopping, or at administrators=E2=80=99 command.=C2=A0=C2=A0=20 =C2=A0 =C2=A0 =C2=A0=C2=A0 The current caching system does not fit the task I described.= =C2=A0 Current Roller caches are just local hash maps or hash tables, they are not distributed; It has no synchronization of weblog content, especially the value =E2=80=98Last-Modified=E2=80=99 for mu= ltiple server threads.=C2=A0=C2=A0 While nowadays most production environme= nts are clustering environment, composed of multiple JVMs and application serve= r runtimes.=C2=A0=20 =C2=A0 I learned that Ehcache support distributed map.=C2=A0 I know that WebSphere= cache instance implements IBM distributed map.=C2=A0 The best solution for Roller is an interface for third party distributed cache accessed with JNDI lookup, otherwise, Roller bundled with Ehcache is also v= ery good.=C2=A0=20 Thank you. David --- On Wed, 5/26/10, Dave wrote: From: Dave Subject: Re: Roller's implementation on conditional Get To: user@roller.apache.org, david.ming.xia@ibol.biz Date: Wednesday, May 26, 2010, 7:59 AM On Wed, May 26, 2010 at 12:11 AM, (David) Ming Xia wrote: > =C2=A0=C2=A0 I took a look into it and I found another place that has ver= y intensive database queries. > > =C2=A0=C2=A0 RequestMappingFilter.doFilter() --> WeblogRequestMapper.hand= leRequest(). > > =C2=A0 RequestMapingFilter's URL mapping is /*, so it check every http re= quest. > > =C2=A0 WeblogRequestMapper.handleRequest() verifies ALL requests, I mean,= including those css, js and image files with named JPA queries. > > > =C2=A0 Actually,=C2=A0 both PageServlet and RequestMappingFilter query we= blog with handle.=C2=A0 It looks like database is used as hashtable in thes= e two functions. =C2=A0 While database is usually used for account data tra= nsaction, relational data management. > > =C2=A0 Now for each web page request there are at least 'eleven' database= queries, one for the text/html content in PageServelt and ten requests in = mapping filter for everything including the text/html. > > =C2=A0 I feel that there could be even more database wires.=C2=A0 Since m= any people work on Roller and everyone tends to add some more wires. > > =C2=A0=C2=A0 It seems that there should be a top-down design solution for= this issue. > > =C2=A0=C2=A0=C2=A0 Like to hear something from you. Hi David, You are correct, WeblogRequestMapper is invoked on every request, but does nothing when it encounters URLs that begin with these patterns: =C2=A0=C2=A0=C2=A0rendering.weblogMapper.rollerProtectedUrls=3D\ =C2=A0=C2=A0=C2=A0roller-ui,images,theme,themes,CommentAuthenticatorServlet= ,\ =C2=A0=C2=A0=C2=A0index.jsp,favicon.ico,robots.txt,\ =C2=A0=C2=A0=C2=A0page,flavor,rss,atom,language,search,comments,rsd,resourc= e,xmlrpc,planetrss It ignores static theme resources (images, CSS, JS, etc.) and everything else that is not dynamically generated by a weblog page template. Perhaps the problem is not quite as bad as you think. There have not been that many people working on Roller and the ones that have worked on the code have been pretty disciplined about when database calls are made. But of course, even disciplined developers make mistakes. I'm sure there is much room for improvement and I encourage you to continue your research into performance bottlenecks. If you have a proposal for a top-down solution, or some patches to improve things -- I'd be happy to review them or even commit them for you if they look good. - Dave --0-373262909-1274997554=:30509--