Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 11241 invoked from network); 4 Feb 2009 18:01:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Feb 2009 18:01:54 -0000 Received: (qmail 90767 invoked by uid 500); 4 Feb 2009 18:01:47 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 90745 invoked by uid 500); 4 Feb 2009 18:01:47 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 90734 invoked by uid 99); 4 Feb 2009 18:01:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Feb 2009 10:01:47 -0800 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of todd.benge@gmail.com designates 209.85.221.20 as permitted sender) Received: from [209.85.221.20] (HELO mail-qy0-f20.google.com) (209.85.221.20) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Feb 2009 18:01:40 +0000 Received: by qyk13 with SMTP id 13so3896192qyk.5 for ; Wed, 04 Feb 2009 10:01:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=am6EkRt9F34GV2EzCS2treysBwtXc+Vrn7vz1Nn7eT0=; b=rav1X4NOzwnEiL9b8DChYvJ2ZvIXFLKSG/BjIip8eamUnUJpxd5fkB6hplkSTi7uRJ lahO2hWCiCq6rVKv/ICl4mxzkLh2kxpF79UlCAORnNb73nRYRrYBY5IWpp5bTL68yc63 9C8aP2FR0+BNyU4NgGyqHC43qtNTyokD8lOuA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=YpoZwN3kW6zHxiWcgdjdkglBUgqAi5b/r1q97wQFccZuF7CtfClnGala5ZrA7CM8q1 iUoS8sNVryX46z+sz5oDV+zRWEKVOgWMXekPzrWutZKp39M+TQKiGKF6LwlpHXURX5IE +O2XO0KXwP3pUhiXKD1Sy56SMe9PsgGanZnAU= MIME-Version: 1.0 Received: by 10.214.81.14 with SMTP id e14mr5696057qab.292.1233770477312; Wed, 04 Feb 2009 10:01:17 -0800 (PST) In-Reply-To: <4989D35D.6040009@gmail.com> References: <847931a70902040846r505638a9ub462d7db9b5f3125@mail.gmail.com> <4989C9DD.80405@gmail.com> <847931a70902040930y39d76478tf822ad24b2151926@mail.gmail.com> <4989D35D.6040009@gmail.com> Date: Wed, 4 Feb 2009 11:01:17 -0700 Message-ID: <847931a70902041001r4945bacele00c24962c070066@mail.gmail.com> Subject: Re: FieldCache Question From: Todd Benge To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0015175cb80608dcc904621b9322 X-Virus-Checked: Checked by ClamAV on apache.org --0015175cb80608dcc904621b9322 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On Wed, Feb 4, 2009 at 10:41 AM, Mark Miller wrote: > Todd Benge wrote: > >> >> The intent is to reduce the amount of memory that is held in cache. As it >> is now, it looks like there is an array of comparators for each index >> reader. Most of the data in the array appears to be the same for each >> cache >> so there is duplication for each type ( string, float). >> >> > Use an array cachekey and override it as not mergeable. > > I suppose in terms of the unuiqes terms array, you could see some > duplication. > > I don't think there should be much duplication though - in the non String > cases, each SegmentIndexReader will only hold the values for itself. The > size of the sub arrays would be the same as the full array. > In the String case, you will have duplicates for the unique terms array, so > if you have a lot, that may cause issues, but the ordinal array will not be > any larger. And the unuiqe terms array shouldnt be terrible - the number of > terms per segment should drop logarithmically. I'm not sure you'll see much > of a difference, and it would only be with String sorts. > > That is, unless you are creating your own separate FieldCaches on > multisegmentreaders - then you would double everything. > > >> Yes - we're runnning about 80G in the indices so there's not enough RAM >> for >> all the data in the fieldcache. >> >> > That is a large index. Can you share how many documents? I don't have the exact number but I think it's 200 - 250 million documents. I'll see if I can get some more realistic numbers and re-post. Thanks for the help. Todd > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --0015175cb80608dcc904621b9322--