lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitaly Funstein <vfunst...@gmail.com>
Subject Re: In memory Lucene configuration
Date Mon, 16 Jul 2012 05:25:44 GMT
Have you tried sharding your data? Since you have a fast multi-core
box, why not split your indices N-ways, say the smaller one into 4,
and the larger into 8. Then you can have a pool of dedicated search
threads, executing the same query against separate physical indices
within each "logical" one in parallel, then put the results together
in the calling thread. Yes, it's more code to write and test in the
app layer, but it may turn out to be well worth it. Due to GC overhead
and poor synchronization characteristics, RAMDirectory is definitely
not the way to go at this scale, as you probably already suspect.

On Sun, Jul 15, 2012 at 3:40 AM, Doron Yaacoby
<dorony@gingersoftware.com> wrote:
> Thanks for the quick input!
> I ran a few more tests with your suggested configuration (-Xmx1G -Xms1G with MMapDirectory).
At the third time I ran the same test I finally got an improvement - an average of ~30ms per
query, although it's still not as fast as I need it to be.
> The test contains about 2200 different queries (well, some are repeated twice or thrice),
and includes search time and doc loading (reading the two fields I mentioned). The queries
are all straight boolean conjunctions, and yes, I am dropping the first few queries when calculating
averages.
>
> BTW, didn't mention before that I'm using Lucene 3.5 and Java 1.7.
>
> -----Original Message-----
> From: Simon Willnauer [mailto:simon.willnauer@gmail.com]
> Sent: 15 July 2012 11:56
> To: java-user@lucene.apache.org
> Subject: Re: In memory Lucene configuration
>
> hey there,
>
> On Sun, Jul 15, 2012 at 10:41 AM, Doron Yaacoby <dorony@gingersoftware.com> wrote:
>> Hi, I have the following situation:
>>
>> I have two pretty large indices. One consists of about 1 billion documents (takes
~6GB on disk) and the other has about 2 billion documents (~10GB on disk). The documents are
very short (4-5 terms each in the text field, and one numeric field with a long value). This
is a read only index - I'm only going to read from it and never write. There is only one segment
in each index (At least there should be, I called forceMerge(1) on them).
>>
>> Search latency is the most important thing to me. I need it to be blazing fast, ~20ms
per query. Queries are always of the type +term1 +term2 +term3, and I'm asking for 10 results
from each index (searching is done simultaneously on both indices).
>>
>> I have a fast server (12 cores@3GHz each) with 32Gb RAM (running Linux) and I can
keep both indices in-memory when using a RAMDirectory. This didn't achieve the expected result
(average query time = ~43ms). I'm seeing latency spikes, where the same query is sometimes
answered in 10ms, but in a different occasion takes 2-3 seconds. I'm guessing this is due
to GC (as explained here<http://lucene.472066.n3.nabble.com/Plans-to-remove-RAMDirectory-td3601156.html>).
Using a warmed up MMapDirectory didn't help; the average query time was a bit slower. I tried
using InstantiatedIndex, but it has a huge memory consumption, I couldn't even load the smaller
6GB index.
>
> its very hard to believe that you can't get this returning results faster though. I'd
definitely recommend you MMapDirectory here or NIO should do too. When you measure this do
you measure a large number of different queries or just a handful? Do you discard the first
queries until caches are warmed up? What are you measuring, pure search time including doc
loading?
> If you use MMapDir how much memory do you grant to your JVM? I'd recommend you to sum
up the term dictionary file size (.tii) and the norm file size (nrm) and give the JVM something
like 3x the size as Xmx and Xms provided you don't need any more memory elsewhere. A guess
from the given index is that Xmx1G Xms1G should do the job and let the Filesystem use the
rest (that is important for lucene if you use MMap / NIOFS)
>
> Your queries are straight boolean conjunctions or do you use positions ie phrase queries
or spans?
>
> simon
>>
>> Any ideas about what could be the ideal configuration for me?
>> Thanks.
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message