Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 45456 invoked from network); 23 Nov 2009 21:18:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Nov 2009 21:18:03 -0000 Received: (qmail 4278 invoked by uid 500); 23 Nov 2009 21:18:02 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 4215 invoked by uid 500); 23 Nov 2009 21:18:02 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 4207 invoked by uid 99); 23 Nov 2009 21:18:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Nov 2009 21:18:02 +0000 X-ASF-Spam-Status: No, hits=-10.5 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Nov 2009 21:17:59 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 91DED234C04C for ; Mon, 23 Nov 2009 13:17:39 -0800 (PST) Message-ID: <823602863.1259011059582.JavaMail.jira@brutus> Date: Mon, 23 Nov 2009 21:17:39 +0000 (UTC) From: "Michael McCandless (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-2075) Share the Term -> TermInfo cache across threads In-Reply-To: <1441997239.1258407279571.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781627#action_12781627 ] Michael McCandless commented on LUCENE-2075: -------------------------------------------- bq. I am quite sure that also Robert's test is random (as he explained). It's not random -- it's the specified pattern, parsed to WildcardQuery, run 10 times, then take best or avg time. {quote} I fixed the test to only test few queries and repeat them quite often. For precStep=4 and long values, I got about 28 seeks per query, but there was no speed improvement. Maybe 28 seeks / query is too less for an effect. The number of terms seen per query was 70, so about 2.5 terms/seek which is typical for precStep=4 with this index value density (5 Mio random number in the range 2^-63..2^63). It is also important, that the random ranges hit many documents (in avg 1/3 of all docs), so most time in my opinion is used in collecting the results. Maybe I should try shorter and limited ranges. {quote} OK... it sounds like the differences may simply be in the noise for NRQ. bq. If you deprecate SimpleLRUCache, you can also deprecate the MapCache abstract super class. But I wouldn't like to deprecate these classes, as I for myself use them in my own code for e.g. caching queries etc. Hmm... I felt like because nothing in Lucene uses SimpleLRUCache anymore, we should deprecate & remove it. I don't think we should be in the business of creating/exporting (as public classes) such collections, unless we continue to use them. I even wonder why we don't put these classes into oal.index, and make them package private. Ie, I think we make them public only to share them across packages within lucene, not because we want/expect apps to consume them. The term "public" is heavily overloaded, unfortunately. Also, I already put a strong note to this effect in DoubleBarrelLRUCache, ie we reserve future right to up and remove the class. bq. And even if you deprecate the Map, why remove the tests, they should stay alive until the class is removed? Oh good point -- I'll resurrect & keep it. > Share the Term -> TermInfo cache across threads > ----------------------------------------------- > > Key: LUCENE-2075 > URL: https://issues.apache.org/jira/browse/LUCENE-2075 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Fix For: 3.1 > > Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch > > > Right now each thread creates its own (thread private) SimpleLRUCache, > holding up to 1024 terms. > This is rather wasteful, since if there are a high number of threads > that come through Lucene, you're multiplying the RAM usage. You're > also cutting way back on likelihood of a cache hit (except the known > multiple times we lookup a term within-query, which uses one thread). > In NRT search we open new SegmentReaders (on tiny segments) often > which each thread must then spend CPU/RAM creating & populating. > Now that we are on 1.5 we can use java.util.concurrent.*, eg > ConcurrentHashMap. One simple approach could be a double-barrel LRU > cache, using 2 maps (primary, secondary). You check the cache by > first checking primary; if that's a miss, you check secondary and if > you get a hit you promote it to primary. Once primary is full you > clear secondary and swap them. > Or... any other suggested approach? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org