Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 92167 invoked from network); 24 Dec 2008 14:43:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 24 Dec 2008 14:43:53 -0000 Received: (qmail 64218 invoked by uid 500); 24 Dec 2008 14:43:46 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 64182 invoked by uid 500); 24 Dec 2008 14:43:46 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 64170 invoked by uid 99); 24 Dec 2008 14:43:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Dec 2008 06:43:46 -0800 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [98.136.44.36] (HELO n71.bullet.mail.sp1.yahoo.com) (98.136.44.36) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 24 Dec 2008 14:43:34 +0000 Received: from [216.252.122.217] by n71.bullet.mail.sp1.yahoo.com with NNFMP; 24 Dec 2008 14:43:12 -0000 Received: from [67.195.9.83] by t2.bullet.sp1.yahoo.com with NNFMP; 24 Dec 2008 14:43:12 -0000 Received: from [67.195.9.98] by t3.bullet.mail.gq1.yahoo.com with NNFMP; 24 Dec 2008 14:43:12 -0000 Received: from [127.0.0.1] by omp102.mail.gq1.yahoo.com with NNFMP; 24 Dec 2008 14:43:12 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 507359.55344.bm@omp102.mail.gq1.yahoo.com Received: (qmail 63224 invoked by uid 60001); 24 Dec 2008 14:43:12 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=ymail.com; h=X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:MIME-Version:Content-Type:Message-ID; b=Cl+he9t8UXX/fvjrlLmdL3A6/1LT6QO7VV0g6JOoI61ns3H/+80Do671JFEdATidkgywSGxhgwGwMp/5g4WXfkUOOpTuIR0vFeCDrvoL4uhYGFCPxbi1LCisN4LJChdogsdifiwDaDXonQKeLpeGp/DP3LjCnNs8ONl1EylqR/s=; X-YMail-OSG: PeFlUsoVM1lkH1bCvd7E1xY.93JQzRq6B__vj6O7Q5XL4FcyiD011Oy5X3YIyTurGgSUiaeejhPZQ2yWFG2KRe.GrHBa3jR7ZVjdQY78NtnBqUFIksZO4PhCkkwc3PvUclCcjgryx0KKyoM_3X.XlYVk94MgjD2_Xqym9Kq5zqLTAHCFlZS2U6cpcyp.BzYUvYtFf5eacX76nzBTEs.oBusH1EWko6y0Yrug Received: from [217.205.40.94] by web112207.mail.gq1.yahoo.com via HTTP; Wed, 24 Dec 2008 06:43:12 PST X-Mailer: YahooMailRC/1156.77 YahooMailWebService/0.7.260.1 References: <355058.69393.qm@web112205.mail.gq1.yahoo.com> <13D027DC-BBF9-40E9-A10C-D946821AE59A@mikemccandless.com> <734386.88697.qm@web26003.mail.ukl.yahoo.com> <739457.19219.qm@web112206.mail.gq1.yahoo.com> <49511D56.7030006@gmail.com> <49511F0A.1040909@gmail.com> <962651.98805.qm@web112210.mail.gq1.yahoo.com> <49524527.40400@gmail.com> Date: Wed, 24 Dec 2008 06:43:12 -0800 (PST) From: Lebiram Subject: Re: Optimize and Out Of Memory Errors To: java-user@lucene.apache.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="0-1199044308-1230129792=:61808" Message-ID: <420034.61808.qm@web112207.mail.gq1.yahoo.com> X-Virus-Checked: Checked by ClamAV on apache.org --0-1199044308-1230129792=:61808 Content-Type: text/plain; charset=us-ascii Hello Mark, As of the moment the index could not be rebuilt to remove norms. Right now, I'm trying to figure out what luke is doing by going through source code. Using whatever settings I find, create a very small app just to do a bit of search. This small app has 1600 mb heapspace while luke just has 256 max for heap space. On reading the same big 1 segment index with 166 million docs, luke fails during checkIndex when it checks the norms, but searching is okay as long as I limit it to say a few thousand documents. However it's not the same for my app, been trying to limit it It still reads way too much data. I'm wondering if this has anything to do with Similarity and Scoring. I was wondering if you could lead me to some settings or any clever tweaks. This problem will haunt me this christmas. :O ________________________________ From: Mark Miller To: java-user@lucene.apache.org Sent: Wednesday, December 24, 2008 2:20:23 PM Subject: Re: Optimize and Out Of Memory Errors We don't know those norms are "the" problem. Luke is loading norms if its searching that index. But what else is Luke doing? What else is your App doing? I suspect your app requires more RAM than Luke? How much RAM do you have and much are you allocating to the JVM? The norms are not necessarily the problem you have to solve - but it would appear they are taking up over 2 gig of memory. Unless you have some to spare (and it sounds like you may not), it could be a good idea to turn them off for particular fields. - Mark Lebiram wrote: > Is there away to not factor in norms data in scoring somehow? > > I'm just stumped as to how Luke is able to do a seach (with limit) on the docs but in my code it just dies with OutOfMemory errors. > How does Luke not allocate these norms? > > > > > ________________________________ > From: Mark Miller > To: java-user@lucene.apache.org > Sent: Tuesday, December 23, 2008 5:25:30 PM > Subject: Re: Optimize and Out Of Memory Errors > > Mark Miller wrote: > >> Lebiram wrote: >> >>> Also, what are norms >> Norms are a byte value per field stored in the index that is factored into the score. Its used for length normalization (shorter documents = more important) and index time boosting. If you want either of those, you need norms. When norms are loaded up into an IndexReader, its loaded into a byte[maxdoc] array for each field - so even if one document out of 400 million has a field, its still going to load byte[maxdoc] for that field (so a lot of wasted RAM). Did you say you had 400 million docs and 7 fields? Google says that would be: >> >> >> **400 million x 7 byte = 2 670.28809 megabytes** >> >> On top of your other RAM usage. >> > Just to avoid confusion, that should really read a byte per document per field. If I remember right, it gives 255 boost possibilities, limited to 25 with length normalization. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --0-1199044308-1230129792=:61808--