lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Yaacoby <>
Subject In memory Lucene configuration
Date Sun, 15 Jul 2012 08:41:24 GMT
Hi, I have the following situation:

I have two pretty large indices. One consists of about 1 billion documents (takes ~6GB on
disk) and the other has about 2 billion documents (~10GB on disk). The documents are very
short (4-5 terms each in the text field, and one numeric field with a long value). This is
a read only index - I'm only going to read from it and never write. There is only one segment
in each index (At least there should be, I called forceMerge(1) on them).

Search latency is the most important thing to me. I need it to be blazing fast, ~20ms per
query. Queries are always of the type +term1 +term2 +term3, and I'm asking for 10 results
from each index (searching is done simultaneously on both indices).

I have a fast server (12 cores@3GHz each) with 32Gb RAM (running Linux) and I can keep both
indices in-memory when using a RAMDirectory. This didn't achieve the expected result (average
query time = ~43ms). I'm seeing latency spikes, where the same query is sometimes answered
in 10ms, but in a different occasion takes 2-3 seconds. I'm guessing this is due to GC (as
explained here<>).
Using a warmed up MMapDirectory didn't help; the average query time was a bit slower. I tried
using InstantiatedIndex, but it has a huge memory consumption, I couldn't even load the smaller
6GB index.

Any ideas about what could be the ideal configuration for me?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message