Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 37514 invoked from network); 13 Aug 2007 01:26:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Aug 2007 01:26:41 -0000 Received: (qmail 97863 invoked by uid 500); 13 Aug 2007 01:26:32 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 97829 invoked by uid 500); 13 Aug 2007 01:26:32 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 97818 invoked by uid 99); 13 Aug 2007 01:26:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 Aug 2007 18:26:32 -0700 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [60.52.184.45] (HELO tecforte.com) (60.52.184.45) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Aug 2007 01:26:35 +0000 Received: from DFPYWL2S [192.168.2.96] by tecforte.com with ESMTP (SMTPD32-8.03) id A320BE2008A; Mon, 13 Aug 2007 09:25:52 +0800 From: "Chew Yee Chuang" To: References: <004201c7d322$69c8bae0$3d5a30a0$@com> <4E0025CD-A1FF-44E4-8322-2A71F9F26332@gmail.com> <003501c7d3e3$ca58e990$5f0abcb0$@com> <000901c7d8de$bc2be0c0$3483a240$@com> <12035676.post@talk.nabble.com> In-Reply-To: <12035676.post@talk.nabble.com> Subject: RE: High CPU usage duing index and search Date: Mon, 13 Aug 2007 09:26:43 +0800 Message-ID: <005401c7dd49$01cafda0$0560f8e0$@com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AcfZAI6oiKNL5QDUSVWiLgL+W7SoAgESB3/A Content-Language: en-us X-Virus-Checked: Checked by ClamAV on apache.org Hi testn, I have tested Filter, it is pretty fast, but still take a lot of CPU = resource, Maybe it could due to the number of filter I run. Thank you eChuang, Chew -----Original Message----- From: testn [mailto:test1@doramail.com]=20 Sent: Tuesday, August 07, 2007 10:37 PM To: java-user@lucene.apache.org Subject: RE: High CPU usage duing index and search Check out Filter class. You can create a separate filter for each field = and then chain them together using ChainFilter. If you cache the filter, it = will be pretty fast.=20 Chew Yee Chuang wrote: >=20 > Greetings, >=20 > Yes, process a little bit and stop for a while really reduce the CPU > usage, > but I need to find out a balance so that the indexing or searching = will > not > have so much delay. >=20 > Execute 20,000 queries at a time is because the process is generating = the > aggregation data for reporting, > E.g Gender (M,F), Department (Accounting, R&D, Financial,...etc),=20 > 1Q - Gender:M AND Department: Accounting > 2Q - Gender:M AND Department: R&D > 3Q - Gender:M AND Department: Financial > 4Q - Gender:F AND Department: Accounting > 5Q - .... > Thus, the more combination, the more query need to run. For now, I = still > can't get any idea on how to reduce it, just thinking maybe there is a > different way to index it so that I can get It easily. >=20 > Any help would be appreciated. >=20 > Thanks > eChuang, Chew >=20 > -----Original Message----- > From: karl wettin [mailto:karl.wettin@gmail.com]=20 > Sent: Thursday, August 02, 2007 7:11 AM > To: java-user@lucene.apache.org > Subject: Re: High CPU usage duing index and search >=20 > It sounds like you have a fairly busy system, perhaps 100% load on the > process is not that strange, at least not during short periods of = time. >=20 > A simpler solution would be to nice the process a little bit in order = to > give your background jobs some more time to think. >=20 > Running a profiler is still the best advice I can think of. It should > clearly show you what is going on when you run out of CPU. >=20 > -- =20 > karl >=20 > 1 aug 2007 kl. 04.29 skrev Chew Yee Chuang: >=20 >> Hi, >> >> Thanks for the link provided, actually I've go through those =20 >> article when I >> developing the index and search function for my application. I =20 >> haven=E2=80=99t try >> profiler yet, but I monitor the CPU usage and notice that whatever =20 >> index or >> search performing, the CPU usage raise to 100%. Below I will try to >> elaborate more on what my application is doing and how I index and =20 >> search. >> >> There are many concurrent process running, first, the application =20 >> will write >> records that received into a text file with tab separated each =20 >> different >> field. Application will point to a new file every 10mins and start =20 >> writing >> to it. So every file will contains only 10mins record, approximate =20 >> 600,000 >> records per file. Then, the indexing process will check whether =20 >> there is a >> text file to be index, if it is, the thread will wake up and start =20 >> perform >> indexing. >> >> The indexing process will first add documents to RAMDir, Then =20 >> later, add >> RAMDir into FSDir by calling addIndexNoOptimize() when there is =20 >> 100,000 >> documents(32 fields per doc) in RAMDir. There is only 1 IndexWriter=20 >> (FSDir) >> was created but a few IndexWriter(RAMDir) was created during the = whole >> process. Below are some configuration for IndexWriters that I =20 >> mentioned:- >> >> IndexWriter (RAMDir) >> - SimpleAnalyzer >> - setMaxBufferedDocs(10000) >> - Filed.Store.YES >> - Field.Index.NO_NORMS >> >> IndexWriter (FSDir) >> - SimpleAnalyzer >> - setMergeFactor(20) >> - addIndexesNoOptimize() >> >> For the searching, because there are many queries(20,000) run =20 >> continuously >> to generate the aggregate table for reporting purpose. All this =20 >> queries is >> run in nested loop, and there is only 1 Searcher created, I try =20 >> searcher and >> filter as well, filter give me better result, but both also utilize =20 >> lots of >> CPU resources. >> >> Hope this info will help and sorry for my bad English. >> >> Thanks >> eChuang, Chew >> >> -----Original Message----- >> From: karl wettin [mailto:karl.wettin@gmail.com] >> Sent: Tuesday, July 31, 2007 5:54 PM >> To: java-user@lucene.apache.org >> Subject: Re: High CPU usage duing index and search >> >> >> 31 jul 2007 kl. 05.25 skrev Chew Yee Chuang: >>> But just notice that when Lucene performing search or index, >>> the CPU usage on my machine raise to 100%, because of this issue, >>> some of my >>> others backend process will slow down eventually. Just want to know >>> does >>> anyone face this problem before ? and is it any idea on how to >>> overcome this >>> problem ? >> >> Did you run a profiler to see what it is that consume all the =20 >> resources? >> It is very hard to guess based on the information you supplied. Start >> here: >> >> http://wiki.apache.org/lucene-java/BasicsOfPerformance >> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed >> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed >> >> >> --=20 >> karl >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> >> No virus found in this incoming message. >> Checked by AVG Free Edition. >> Version: 7.5.476 / Virus Database: 269.11.0/927 - Release Date: =20 >> 7/30/2007 >> 5:02 PM >> >> >> No virus found in this outgoing message. >> Checked by AVG Free Edition. >> Version: 7.5.476 / Virus Database: 269.11.0/929 - Release Date: =20 >> 7/31/2007 >> 5:26 PM >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >=20 >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org >=20 >=20 > No virus found in this incoming message. > Checked by AVG Free Edition.=20 > Version: 7.5.476 / Virus Database: 269.11.2/933 - Release Date: = 8/2/2007 > 2:22 PM > =20 >=20 > No virus found in this outgoing message. > Checked by AVG Free Edition.=20 > Version: 7.5.476 / Virus Database: 269.11.8/940 - Release Date: = 8/6/2007 > 4:53 PM > =20 >=20 >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org >=20 >=20 >=20 --=20 View this message in context: = http://www.nabble.com/High-CPU-usage-duing-index-and-search-tf4190756.htm= l#a12035676 Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org No virus found in this incoming message. Checked by AVG Free Edition.=20 Version: 7.5.476 / Virus Database: 269.11.8/940 - Release Date: 8/6/2007 = 4:53 PM =20 No virus found in this outgoing message. Checked by AVG Free Edition.=20 Version: 7.5.476 / Virus Database: 269.11.15/949 - Release Date: = 8/12/2007 11:03 AM =20 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org