lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jayant Kumar <>
Subject Re: problems with lucene in multithreaded environment
Date Tue, 08 Jun 2004 07:59:09 GMT
 --- Doug Cutting <> wrote: > Jayant
Kumar wrote:
> > Thanks for the patch. It helped in increasing the
> > search speed to a good extent.
> Good.  I'll commit it.  Thanks for testing it.
> > But when we tried to
> > give about 100 queries in 10 seconds, then again
> we
> > found that after about 15 seconds, the response
> time
> > per query increased.
> This still sounds very slow to me.  Is your index
> optimized?  What JVM 
> are you using?

Yes our index is optimized and we are using java
version 1.4.2. What would be the difference between
search on a optimized index and that on an unoptimized

> You might also consider ramping up your benchmark
> more slowly, to warm 
> the filesystem's cache.  So, when you first launch
> the server, give it a 
> few queries at a lower rate, then, after those have
> completed, try a 
> higher rate.

We have tried this out and since we have lots of ram,
once the indexes go into the os cache, results come
out fast.

> > We were able to simplify the searches further by
> > consolidating the fields in the index but that
> > resulted in increasing the index size to 2.5 GB as
> we
> > required fields 2-5 and fields 1-7 in different
> > searches.
> That will slow updates a bit, but searching should
> be faster.
> How about your range searches?  Do you know how many
> terms they match? 
> The easiest way to determine this might be to insert
> a print statement 
> in RangeQuery.rewrite() that shows the query before
> it is returned.
> > Our indexes are on the local disk therefor
> > there is no network i/o involved.
> It does like file i/o is now your bottleneck.  The
> traces below show 
> that you're using the compound file format, which
> combines many files 
> into one.  When two threads try to read two
> logically different files 
> (.prx and .frq below) they must sychronize when the
> compound format is 
> used.  But if your application did not use the
> compound format this 
> synchronization would not be required.  So you
> should try rebuilding 
> your index with the compound format turned off. 
> (The fastest way to do 
> this is simply to add and/or delete a single
> document, then re-optimize 
> the index with compound format turned off.  This
> will cause the index to 
> be re-written in non-compound format.)

We have changed the indexwriter to write index in
non-compound format and have noticed a little
difference in the response time.

> Is this on linux?  If so, please try running 'iostat
> -x 1' while you 
> perform your benchmark (iostat is installed by the
> 'sysstat' package). 
> What percentage is the disk utilized (%util)?  What
> is the percentage of 
> idle CPU (%idle)?  What is the rate of data that is
> read (rkB/s)?  If 
> things really are i/o bound then you might consider
> spreading the data 
> over multiple disks, e.g., with lvm striping or a
> RAID controller.
> If you have a lot of RAM, then you could also
> consider moving certain 
> files of the index onto a ramfs-based drive.  For
> example, moving the 
> .tis, .frq and .prx can greatly improve performance.
>  Also, having these 
> files in RAM means that the cache does not need to
> be warmed.
> Hope this helps!

Thanks for the help. We will continue interacting with
you as and when we face problems. We would also like
you to know that lucene is an excellent search engine
compared to any other.


Yahoo! India Matrimony: Find your partner online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message