lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan OConnor <docon...@acquiremedia.com>
Subject simultaneous indexing and searching causing intermitently long searches.
Date Sat, 04 Apr 2009 02:21:35 GMT
All,

I have a several questions regarding query response time and I would appreciate any help that
can be provided.

We have a system that indexes approximately 200,000 documents per day at a fairly constant
rate and holds them in a cfs-style file system directory index for 8 days. The index is approximately
50 GBs when optimized - which we do semi-monthly.

We are running lucene 2.3.2 with jre 1.6. 0_10 on Centos5 on 64-bit Dell 2950s - 3GHz dual/quad
core processors with local ext3 Raid-5 15k disks (approximately 1.7TBs) The box has 16GB and
the JVM is allocated 11G (both Xms and Xmx)

Every 15 minutes, we flush the IndexWriter and create a new IndexSearcher to expose the newly
indexed content.

Every hour, approximately 1 hours worth of content (approximately 8,000 documents) is deleted,
we flush the IndexWriter, and create a new IndexSearcher.

Q1: Given these settings, are there general rules of thumb for setting the MergeFactor, MaxMergeDocs,
MaxBufferedDocs, and RAMBufferSizeMB?

We do a series of warm up searches every time we create a new IndexSearcher. Right now we
are directly calling the IndexSearcher.search() method with a query, null filter, and 10 documents
to return. We run searches against all of the index fields.

Q1: Are there any rules of thumb for the number or complexity of warm up searches?
Q2: Is it important to "warmup" the query parser, analyzer, etc or the ranges we use in queries
or the sorting?

When the system is receiving regular queries, between 1 and 5 per second for example, the
search response times are extremely fast (sub 500ms) and mostly independent of query complexity.
We see slower query responses (on the order of 2-4 seconds) for the first few queries  when
using a newly created IndexSearcher. However, the extremely fast response times return quickly
and continue.

When the system has not received any search requests for a period of time, as little as 5
seconds, the query response time for even a simple query starts climbing (5 -8 seconds) and
the longer the idle period between queries, the longer the query response time (growing to
15-30 seconds if the idle time is 30seconds to a minute). NOTE: the system is still indexing
new content and removing old content when there are no incoming queries.

Q3: Is there a known issue where the IndexSearcher cache empties over time?

Finally, there are times when the query response times completely go off the charts - to 100s
of seconds.

Q4: Is it possible that this is due to segments being merged together? If so, besides the
MergeFactor, etc. settings are there ways to mitigate this?

Thanks in advance for any help you can provide.

Regards,
Dan

Dan O'Connor
SVP, Engineering
Acquire Media<http://www.acquiremedia.com/>
77 South Bedford Street, Suite 350<http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18>
Burlington, MA 01803<http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18>
e: doconnor@acquiremedia.com<mailto:doconnor@acquiremedia.com>
o: 781-250-0565
f: 877-861-7724



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message