Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 6181 invoked from network); 17 Feb 2007 04:14:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 17 Feb 2007 04:14:34 -0000 Received: (qmail 52456 invoked by uid 500); 17 Feb 2007 04:14:35 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 52413 invoked by uid 500); 17 Feb 2007 04:14:35 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 52402 invoked by uid 99); 17 Feb 2007 04:14:35 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Feb 2007 20:14:35 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Feb 2007 20:14:26 -0800 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id B456D7141E6 for ; Fri, 16 Feb 2007 20:14:06 -0800 (PST) Message-ID: <12163273.1171685646735.JavaMail.jira@brutus> Date: Fri, 16 Feb 2007 20:14:06 -0800 (PST) From: "Paul Cowan (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Updated: (LUCENE-806) Synchronization bottleneck in FieldSortedHitQueue with many concurrent readers In-Reply-To: <9797706.1171683305520.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-806?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Cowan updated LUCENE-806: ------------------------------ Attachment: lucene-806.patch Just to clarify, the specific issue here is that RuleBasedCollator (the onl= y concrete implementation of Collator in the JDK, and the one that is alway= s returned by Collator.getInstance(Locale)) has a synchronized compare(), m= eaning that if you have many threads building FieldSortedHitQueues with lar= ge results and locale-sensitive sorting, and they share a Collator, they en= d up waiting for each other in that method (which can obviously be called t= ens of thousands of times during a very large search). The way to get a Col= lator is by calling Collator.getInstance(Locale), which makes it look like = this problem is the JDK's fault; however, Collator.getInstance(Locale) actu= ally returns a clone() of the object from the cache. The caching mechanism = seems to be to prevent having to rebuld the rule tables, rather than the ob= jects themselves. Therefore, the JDK version balances performance and threa= d safety quite well.=20 On the Lucene end, though, the FieldSortedHitQueue implements its own cachi= ng mechanism, meaning that the generated ScoreDocComparators are cached (wi= th no way to disable this behaviour, even if you wanted to). Therefore, one= you've got your comparator (the unique key being {reader, fieldname, type,= locale, factory}), every search using sorting on the same field in the sam= e way on the same reader will use the same Collator, possibly causing a syn= chronization bottleneck. Even providong your own factory to the SortField d= oesn't REALLY help, as they're cached one level 'above' that (you can work = around this; see below) Attached is a patch which provides a 'quick and dirty' way of dealing with = this. NOTE: THIS PATCH IS NOT PRODUCTION QUALITY, it's just a proof of conc= ept. If people like the idea, I'll tidy it up substantially. This works by adding a flag, usePerThreadLocaleComparators, set by a static= method, to FieldSortedHitQueue. If this flag is NOT set, behaviour remains= the same. If it's set to true, however, createValue calls a per-thread ver= sion of comparatorStringLocale, which returns a simple wrapper ScoreDocComp= arator which delegates to the one created by comparatorStringLocale, using = a ThreadLocal to make sure they're not shared between threads. For demonstration purposes, I have added a quick demo main() method to Fiel= dSortedHitQueue, which does a simple timing test -- 20 threads each inserti= ng 20000 dummy documents into a 200-element FieldSortedHitQueue. Note that = it uses CountDownLatches to coordinate the threads, so this dummy test will= only run under Java 5. Sorry, but as a quick demo it will do for now. By c= hanging the values of final int threadCount =3D 20; final int docCount =3D 10000; final int queueSize =3D 200; you can change the parameters I mentioned above. However, the figures seem = to roughly the same proportion regardless of how high or low those numbers = are, within reason; the parameters provided are enough to spend a LOT of t= ime waiting for locks; making them higher doesn't really make that much dif= ference. If anything, making the queuesize larger makes the new version of = the code (with the flag set) look better in comparison. On my dev machine (= 1.8 GHz Celeron laptop) the test as-is gives the following figures: Shared=3D5219ms PerThread=3D2140ms this is a pretty substantial difference, and (I think) makes it worth pursu= ing this further. Your mileage may vary, but between 2x-4x faster seems typ= ical for anywhere above, say, 5 threads, 1000 docs, and queue size of 50. So if people are happy for me to proceed down this path, I'm equally happy = to tidy up and produce a cleaner, documented etc. version of this patch. Ho= wever, the more I look at this, the more I'd like to refactor this code -- = it's not the nicest code in Lucene, and I think it could be tidied up (pers= onally). My proposal would be something along the lines of changing all tho= se static methods in FieldSortedHitQueue (comparatorXXXX) to be implementat= ions of SortComparatorSource. There'd be a StringComparatorSource, AutoComp= aratorSource, etc. Everything in FSHQ would be made to deal with the SortCo= mparatorSources only, abstracting out all the hard work. The logic of the c= reate() method could be replaced by a PerFieldComparatorSource, which prod= uces one or more of the others depending on the field type, much as it does= now. Everything else could be implemented (possibly) using the Decorator p= attern to implement new SortComparatorSources. Namely, a CachingComparatorS= ource would use some sort of caching mechanism (possibly the FieldCacheImpl= .Cache, as it does now, though that seems like an odd coupling) to cache th= e SortComparatorSources produced by the PerFieldComparatorSource; then we'r= e basically back where we are now. Along comes the PerThreadComparatorSourc= e, which uses ThreadLocals to do basically what my patch above does. All th= ese classes would be available externally, for people to wrap around their = own SortComparatorSources when setting up their SortFields; if no factory i= s provided in the SortField, things work much as they do now. What do people think? Is the quick and dirty way (a) worthwhile, and (b) go= od enough? Should I look at implementing the bigger, fancier patch which wi= ll be more work and more complicated but ultimately (I think) make FieldSor= tedHitQueues much cleaner? Or is there another alternative (for example, an= other low-impact option would be doing something like what I've done now, = but instead using a ThreadLocal in a Comparator implementation, which could= mean that the API for FieldSortedHitQueue doesn't need to change at all). = Or is this not worth making part of the source tree, given that there are w= ays around it (supplying your own SortComparatorSource which manages its ow= n ThreadLocals). The performance gain IS substantial, though... Apologies for length, this is quite a confusing area and I wanted to make s= ure I hadn't forgotten anything. > Synchronization bottleneck in FieldSortedHitQueue with many concurrent re= aders > -------------------------------------------------------------------------= ----- > > Key: LUCENE-806 > URL: https://issues.apache.org/jira/browse/LUCENE-806 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.0.0 > Reporter: Paul Cowan > Priority: Minor > Attachments: lucene-806.patch > > > The below is from a post by (my colleague) Paul Smith to the java-users l= ist: > --- > Hi ho peoples. > We have an application that is internationalized, and stores data from ma= ny languages (each project has it's own index, mostly aligned with a single= language, maybe 2). > Anyway, I've noticed during some thread dumps diagnosing some performance= issues, that there appears to be a _potential_ synchronization bottleneck = using Locale-based sorting of Strings. I don't think this problem is the r= oot cause of our performance problem, but I thought I'd mention it here. H= ere's the stack dump of a thread waiting: > "http-1001-Processor245" daemon prio=3D1 tid=3D0x31434da0 nid=3D0x3744 wa= iting for monitor entry [0x2cd44000..0x2cd45f30] > at java.text.RuleBasedCollator.compare(RuleBasedCollator.java) > - waiting to lock <0x6b1e8c68> (a java.text.RuleBasedCollator) > at org.apache.lucene.search.FieldSortedHitQueue$4.compare(FieldSo= rtedHitQueue.java:320) > at org.apache.lucene.search.FieldSortedHitQueue.lessThan(FieldSor= tedHitQueue.java:114) > at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java= :120) > at org.apache.lucene.util.PriorityQueue.put(PriorityQueue.java:47= ) > at org.apache.lucene.util.PriorityQueue.insert(PriorityQueue.java= :58) > at org.apache.lucene.search.FieldSortedHitQueue.insert(FieldSorte= dHitQueue.java:90) > at org.apache.lucene.search.FieldSortedHitQueue.insert(FieldSorte= dHitQueue.java:97) > at org.apache.lucene.search.TopFieldDocCollector.collect(TopField= DocCollector.java:47) > at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.j= ava:291) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.ja= va:132) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.ja= va:110) > at com.aconex.index.search.FastLocaleSortIndexSearcher.search(Fas= tLocaleSortIndexSearcher.java:90) > ..... > In our case we had 12 threads waiting like this, while one thread had the= lock on the RuleBasedCollator. Turns out RuleBasedCollator's.compare(...)= method is synchronized. I wonder if a ThreadLocal based collator would be= better here... ? There doesn't appear to be a reason for other threads se= arching the same index to wait on this sort. Be just as easy to use their = own. (Is RuleBasedCollator a "heavy" object memory wise? Wouldn't have th= ought so, per thread) > Thoughts? > --- > I've investigated this somewhat, and agree that this is a potential probl= em with a series of possible workarounds. Further discussion (including pro= of-of-concept patch) to follow. --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org