Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 1795 invoked from network); 17 Dec 2009 10:34:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Dec 2009 10:34:47 -0000 Received: (qmail 24438 invoked by uid 500); 17 Dec 2009 10:34:44 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 24368 invoked by uid 500); 17 Dec 2009 10:34:44 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 24358 invoked by uid 99); 17 Dec 2009 10:34:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Dec 2009 10:34:44 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [209.85.210.192] (HELO mail-yx0-f192.google.com) (209.85.210.192) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Dec 2009 10:34:35 +0000 Received: by yxe30 with SMTP id 30so2125782yxe.29 for ; Thu, 17 Dec 2009 02:34:14 -0800 (PST) MIME-Version: 1.0 Received: by 10.150.25.30 with SMTP id 30mr3704092yby.83.1261046054017; Thu, 17 Dec 2009 02:34:14 -0800 (PST) In-Reply-To: <1260799781.5731.4174.camel@pc286> References: <17022a310912071327x6be7f6afjf45e0ab69269e47c@mail.gmail.com> <660086cb0912071638h6c6e10e1u4899889bc9a4ed74@mail.gmail.com> <9ac0c6aa0912080236j5d3a87d1jbce8558fba738765@mail.gmail.com> <9ac0c6aa0912080243g4f7fab20q73b228cf1426a4ae@mail.gmail.com> <17022a310912091358y20f4eebbt7bcd0253cb8add04@mail.gmail.com> <2E6A89A648463A4EBF093A9062C16683013484F9F742@SBMAILBOX1.sb.statsbiblioteket.dk> <9ac0c6aa0912100245n4e3902etfed6b26dd80a2f2d@mail.gmail.com> <1260538239.5731.99.camel@pc286> <9ac0c6aa0912110553u5e494988w14964467e6ce62fd@mail.gmail.com> <1260799781.5731.4174.camel@pc286> Date: Thu, 17 Dec 2009 05:34:13 -0500 Message-ID: <9ac0c6aa0912170234v606f58f4se75b6f02e46342cc@mail.gmail.com> Subject: Re: heap memory issues when sorting by a string field From: Michael McCandless To: te@statsbiblioteket.dk Cc: "java-user@lucene.apache.org" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org I think this'd make a nice contribution -- eg it could be bundled up as a FieldComparator impl, eg LowMemoryStringComparator, that would compute the global ords in multiple passes with limited RAM usage. It'd give users the space/time tradeoff... Mike On Mon, Dec 14, 2009 at 9:09 AM, Toke Eskildsen wr= ote: > On Fri, 2009-12-11 at 14:53 +0100, Michael McCandless wrote: >> How long does Lucene take to build the ords for the toplevel reader? >> >> You should be able to just time FieldCache.getStringIndex(topLevelReader= ). >> >> I think your 8.5 seconds for first Lucene search was with the >> StringIndex computed per segment? > > Cold disk-cache (directly after reboot): > [2009-12-14 14:44:10,914] Requesting StringIndex for field sort_title > [2009-12-14 14:44:20,326] Got StringIndex of length 2916008 in 9 > seconds, 412 ms > > Warm disk-cache (3 minutes after first test): > [2009-12-14 14:44:10,914] Requesting StringIndex for field sort_title > [2009-12-14 14:44:20,326] Got StringIndex of length 2916008 in 8 > seconds, 414 ms > > The response time for the first sorted search was about 8,5 seconds, but > that was after 6 non-sorted searches without the use of explicit field > cache, so some amount of warm-up was performed. > > Caveat: I must stress that this is very much ad hoc testing. > > > ----------------- FieldCache test code > > =A0 =A0// Meant for testing > =A0 =A0private FieldCache.StringIndex getStringIndex( > =A0 =A0 =A0 =A0 =A0 =A0IndexReader reader, String field) { > =A0 =A0 =A0 =A0log.info("Requesting StringIndex for field " + field); > =A0 =A0 =A0 =A0Profiler profiler =3D new Profiler(); > =A0 =A0 =A0 =A0FieldCache.StringIndex stringIndex; > =A0 =A0 =A0 =A0try { > =A0 =A0 =A0 =A0 =A0 =A0stringIndex =3D FieldCache.DEFAULT.getStringIndex(= reader, > field); > =A0 =A0 =A0 =A0} catch (IOException e) { > =A0 =A0 =A0 =A0 =A0 =A0log.error("Could not retrieve StringIndex", e); > =A0 =A0 =A0 =A0 =A0 =A0return null; > =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0log.info("Got StringIndex of length " + stringIndex.order.= length > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 + " in " + profiler.getSpendTime()); > =A0 =A0 =A0 =A0return stringIndex; > =A0 =A0} > > ----------------- Lucene 2.4 index > > ls -l index/sb/20091201-115941/lucene/ > > -rw-rw-r-- 1 summatst summatst 12840211452 Dec =A02 11:21 _0.cfx > -rw-rw-r-- 1 summatst summatst =A0 361027455 Dec =A02 11:19 _32.cfs > -rw-rw-r-- 1 summatst summatst =A0 373374178 Dec =A02 11:19 _65.cfs > -rw-rw-r-- 1 summatst summatst =A0 438076782 Dec =A02 11:21 _98.cfs > -rw-rw-r-- 1 summatst summatst =A0 463141239 Dec =A02 11:19 _cb.cfs > -rw-rw-r-- 1 summatst summatst =A01862427706 Dec =A02 11:19 _rm.cfs > -rw-rw-r-- 1 summatst summatst =A0 =A0 =A0 =A0 203 Dec =A02 11:21 segment= s_3 > -rw-rw-r-- 1 summatst summatst =A0 =A0 =A0 =A0 =A020 Dec =A02 11:18 segme= nts.gen > > ----------------- > > Regards, > Toke Eskildsen > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org