Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 65665 invoked from network); 7 Jun 2004 16:50:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 7 Jun 2004 16:50:59 -0000 Received: (qmail 38068 invoked by uid 500); 7 Jun 2004 16:50:59 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 38043 invoked by uid 500); 7 Jun 2004 16:50:59 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 38025 invoked by uid 99); 7 Jun 2004 16:50:59 -0000 Received: from [209.10.110.95] (HELO londo.swishmail.com) (209.10.110.95) by apache.org (qpsmtpd/0.27.1) with ESMTP; Mon, 07 Jun 2004 09:50:59 -0700 Received: (qmail 74960 invoked by uid 89); 7 Jun 2004 16:50:43 -0000 Received: from unknown (HELO apache.org) (postmaster@cottrell-cutting.net@24.5.163.156) by londo.swishmail.com with AES256-SHA encrypted SMTP; 7 Jun 2004 16:50:43 -0000 Message-ID: <40C49CBD.8050806@apache.org> Date: Mon, 07 Jun 2004 09:50:05 -0700 From: Doug Cutting User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040116 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Users List Subject: Re: problems with lucene in multithreaded environment References: <20040605110703.80644.qmail@web8301.mail.in.yahoo.com> In-Reply-To: <20040605110703.80644.qmail@web8301.mail.in.yahoo.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Jayant Kumar wrote: > Thanks for the patch. It helped in increasing the > search speed to a good extent. Good. I'll commit it. Thanks for testing it. > But when we tried to > give about 100 queries in 10 seconds, then again we > found that after about 15 seconds, the response time > per query increased. This still sounds very slow to me. Is your index optimized? What JVM are you using? You might also consider ramping up your benchmark more slowly, to warm the filesystem's cache. So, when you first launch the server, give it a few queries at a lower rate, then, after those have completed, try a higher rate. > We were able to simplify the searches further by > consolidating the fields in the index but that > resulted in increasing the index size to 2.5 GB as we > required fields 2-5 and fields 1-7 in different > searches. That will slow updates a bit, but searching should be faster. How about your range searches? Do you know how many terms they match? The easiest way to determine this might be to insert a print statement in RangeQuery.rewrite() that shows the query before it is returned. > Our indexes are on the local disk therefor > there is no network i/o involved. It does like file i/o is now your bottleneck. The traces below show that you're using the compound file format, which combines many files into one. When two threads try to read two logically different files (.prx and .frq below) they must sychronize when the compound format is used. But if your application did not use the compound format this synchronization would not be required. So you should try rebuilding your index with the compound format turned off. (The fastest way to do this is simply to add and/or delete a single document, then re-optimize the index with compound format turned off. This will cause the index to be re-written in non-compound format.) Is this on linux? If so, please try running 'iostat -x 1' while you perform your benchmark (iostat is installed by the 'sysstat' package). What percentage is the disk utilized (%util)? What is the percentage of idle CPU (%idle)? What is the rate of data that is read (rkB/s)? If things really are i/o bound then you might consider spreading the data over multiple disks, e.g., with lvm striping or a RAID controller. If you have a lot of RAM, then you could also consider moving certain files of the index onto a ramfs-based drive. For example, moving the .tis, .frq and .prx can greatly improve performance. Also, having these files in RAM means that the cache does not need to be warmed. Hope this helps! Doug > "Thread-23" prio=1 tid=0x08169f38 nid=0x2867 waiting for monitor entry [69bd4000..69bd48c8] > at org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:217) > - waiting to lock <0x46f1b828> (a org.apache.lucene.store.FSInputStream) > at org.apache.lucene.store.InputStream.refill(InputStream.java:158) > at org.apache.lucene.store.InputStream.readByte(InputStream.java:43) > at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83) > at org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:58) > > "Thread-22" prio=1 tid=0x08159f78 nid=0x2866 waiting for monitor entry [69b53000..69b538c8] > at org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:217) > - waiting to lock <0x46f1b828> (a org.apache.lucene.store.FSInputStream) > at org.apache.lucene.store.InputStream.refill(InputStream.java:158) > at org.apache.lucene.store.InputStream.readByte(InputStream.java:43) > at org.apache.lucene.store.InputStream.readVInt(InputStream.java:86) > at org.apache.lucene.index.SegmentTermDocs.read(SegmentTermDocs.java:126) --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org