Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 1954 invoked from network); 15 Aug 2006 20:00:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 15 Aug 2006 20:00:09 -0000 Received: (qmail 2019 invoked by uid 500); 15 Aug 2006 20:00:02 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 1991 invoked by uid 500); 15 Aug 2006 20:00:02 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 1980 invoked by uid 99); 15 Aug 2006 20:00:02 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Aug 2006 13:00:02 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of vnguyen@ur.com designates 63.241.148.20 as permitted sender) Received: from [63.241.148.20] (HELO ironport.ur.com) (63.241.148.20) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Aug 2006 13:00:01 -0700 Received: from unknown (HELO UREXCHSRV5.ur.com) ([10.6.134.35]) by ironport.ur.com with ESMTP; 15 Aug 2006 15:59:39 -0400 X-OriginatingIP: 10.6.134.35 Content-Type: multipart/mixed; boundary="6+Aea.46nFPqEFQ.+KeE4.3YBfQw6" Received: from UREXCHVS3.ur.com ([10.6.138.34]) by UREXCHSRV5.ur.com with Microsoft SMTPSVC(6.0.3790.1830); Tue, 15 Aug 2006 15:59:11 -0400 X-MIMEOLE: Produced By Microsoft Exchange V6.5 MIME-Version: 1.0 Subject: RE: 7GB index taking forever to return hits Date: Tue, 15 Aug 2006 16:00:22 -0400 Message-ID: <0D6A3C278F4DC346B98DF4D2F1397E8114509896@UREXCHVS3.ur.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 7GB index taking forever to return hits Thread-index: Aca/z6yqE+ELYx84RTyP5WgOpSninQALGBegABYh4cAAFCtqgA== From: "Van Nguyen" To: X-OriginalArrivalTime: 15 Aug 2006 19:59:11.0068 (UTC) FILETIME=[467271C0:01C6C0A5] X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N --6+Aea.46nFPqEFQ.+KeE4.3YBfQw6 MIME-Version: 1.0 Content-class: urn:content-classes:message Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable I am tokenizing CONTENTS... But unfortunately, it's part of the requirements that we search for: +CONTENTS:*white* +CONTENTS:*hard* +CONTENTS:*hat* =20 -----Original Message----- From: Rob Staveley (Tom) [mailto:rstaveley@seseit.com]=20 Sent: Tuesday, August 15, 2006 3:26 AM To: java-user@lucene.apache.org Subject: RE: 7GB index taking forever to return hits Sounds like you want to tokenise CONTENTS, if you are not already doing so.=20 Then you could simply have: +CONTENTS:white +CONTENTS:hard +CONTENTS:hat -----Original Message----- From: Van Nguyen [mailto:vnguyen@ur.com] Sent: 15 August 2006 01:30 To: java-user@lucene.apache.org Subject: RE: 7GB index taking forever to return hits It was how I was implementing the search. =20 I am using a boolean query. Prior to the 7GB index, I was searching over a 150MB index that consist of a very small part of the bigger index. I was able to set my BooleanQuery to BooleanQuery.setMaxClauseCount(Integer.MAX_VALUE) and that worked fine. But I think that's the cause of my problem with this bigger index. Commenting that out, I get an TooManyClause Exception. A typical query would look something like this: +CONTENTS:*white* +CONTENTS:*hard* +CONTENTS:*hat* +COMPANY_CODE:u1=20 +LANGUAGE:enu -SKU_DESC_ID:0 +IS_DC:d +LOCATION:b72 BooleanQuery q =3D new BooleanQuery(); WildcardQuery wc1 =3D new WildcardQuery("CONTENTS", "*white*"); WildcardQuery wc2 =3D new WildcardQuery("CONTENTS", "*hard*"); WildcardQuery wc3 =3D = new WildcardQuery("CONTENTS", "*hat*"); q.add(wc1, BooleanClause.Occur.MUST); q.add(wc2, BooleanClause.Occur.MUST); q.add(wc3, BooleanClause.Occur.MUST); TermQuery t1 =3D new TermQuery("COMPANY_CODE", "u1"); q.add(t1, BooleanClause.Occur.MUST); TermQuery t2 =3D new TermQuery("LANGUAGE", "enu"); q.add(t2, BooleanClause.Occur.MUST); . . . I take it this is not the most optimal way about this. =20 So that leads me to my next question... What is the most optimal way about this? Van -----Original Message----- From: yueyu lin [mailto:popeyelin@gmail.com] Sent: Monday, August 14, 2006 11:30 AM To: java-user@lucene.apache.org Subject: Re: 7GB index taking forever to return hits 2GB limitation only exists when you want to put them to memory in 32bits box. Our index size is larger than 13 giga bytes, and it works fine. I think it must be something error in your design. You can use Luke to see what happened in your index. On 8/14/06, Van Nguyen wrote: > > Hi, > > > > I have a 7GB index (about 45 fields per document X roughly 5.5 million > docs) running on a Windows 2003 32bit machine (dual proc, 2GB memory). > The index is optimized. Performing a search on this index will just=20 > "hang" when performing the search (wild card query with a sort). At=20 > first the CPU usage is 100%, then drops down to 50% after a minute or=20 > so, and then no CPU utilization... but the thread is still trying to=20 > perform the search. I've tried this in my J2EE app and in a main=20 > program. Is this due to the 2GB limitation of the 32bit OS (I didn't=20 > realize the index would be this big... just let it run over the weekend). > > > > If this is due to the 2GB limitation of the 32bit OS and since I have=20 > this 7GB index built already (and optimized), is there a way to split=20 > this into 2GB indices w/o having to re-index? Or is this due to another factor? > > > > Van > > United Rentals > Consider it done.(tm) > 800-UR-RENTS > unitedrentals.com > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > -- -- Yueyu Lin --6+Aea.46nFPqEFQ.+KeE4.3YBfQw6 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline United Rentals Consider it done.=E2=84=A2 800-UR-RENTS unitedrentals.com --6+Aea.46nFPqEFQ.+KeE4.3YBfQw6 Content-Type: text/plain; charset=us-ascii --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --6+Aea.46nFPqEFQ.+KeE4.3YBfQw6--