Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 3083 invoked from network); 10 Jun 2009 20:26:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Jun 2009 20:26:52 -0000 Received: (qmail 43696 invoked by uid 500); 10 Jun 2009 20:27:03 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 43628 invoked by uid 500); 10 Jun 2009 20:27:02 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 43620 invoked by uid 99); 10 Jun 2009 20:27:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Jun 2009 20:27:02 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [74.125.92.25] (HELO qw-out-2122.google.com) (74.125.92.25) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Jun 2009 20:26:53 +0000 Received: by qw-out-2122.google.com with SMTP id 5so675149qwd.53 for ; Wed, 10 Jun 2009 13:26:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.151.119.10 with SMTP id w10mr3403170ybm.191.1244665591479; Wed, 10 Jun 2009 13:26:31 -0700 (PDT) In-Reply-To: <85d3c3b60906101313t77d8b16atc4a2644ecd158e9@mail.gmail.com> References: <20090610122347.GB5557@kopfschmerz.zuhause> <9ac0c6aa0906100540q41d1aa4fq2910521623b2edc3@mail.gmail.com> <85d3c3b60906101102v49cc3cc4uedbf473da4350c35@mail.gmail.com> <9ac0c6aa0906101126m5afc415bu4575cd2bd7caadff@mail.gmail.com> <85d3c3b60906101313t77d8b16atc4a2644ecd158e9@mail.gmail.com> Date: Wed, 10 Jun 2009 16:26:31 -0400 Message-ID: <9ac0c6aa0906101326s10ba87bek2e76b37239adc735@mail.gmail.com> Subject: Re: Lucene memory usage From: Michael McCandless To: java-dev@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On Wed, Jun 10, 2009 at 4:13 PM, Jason Rutherglen wrote: > Great! If I understand correctly it looks like RAM savings? Will > there be an improvement in lookup speed? (We're using binary > search here?). Yes, sizable RAM reduction for apps that have many unique terms. And, init'ing (warming) the reader should be faster. Lookup speed should be faster (binary search against the terms in a single field, not all terms). > Is there a precedence in database systems for what was mentioned > about placing the term dict, delDocs, and filters onto disk and > reading them from there (with the IO cache taking care of > keeping the data in RAM)? (Would there be a future advantage to > this approach when SSDs are more prevalent?) It seems like we > could have some generalized pluggable system where one could try > out this or the current heap approach, and benchmark. LUCENE-1458 creates exactly such a pluggable system. Ie it's lets you swap in your own codec for terms, freq, prox, etc. But: I'm leary of having terms dict live entirely on disk, though we should certainly explore it. > Given our continued inability to properly measure Java RAM > usage, this approach may be a good one for Lucene? Where heap > based LRU caches are a shot in the dark when it comes to mem > size, as we never really know how much they're using. Well remember mmap uses an LRU policy to decide when pages are swapped to disk... so a search that's unlucky can easily hit many page faults just in consulting the terms dict. You could be at 200 msec cost before you even hit a postings list... I prefer to have the terms index RAM resident (of course the OS can still swap THAT out too...). > Once we generalize delDocs, filters, and field caches > (LUCENE-831?), then perhaps CSF is a good place to test out this > approach? We could have a generic class that handles the > underlying IO that simply returns values based on a position or > iteration. I agree, a CSF codec that uses mmap seems like a good place to start... Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org