lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: Combining results of multiple indexes
Date Wed, 17 Dec 2008 15:20:56 GMT
The very first question is always "are you opening a new searcher
each time you query"? But you've looked at the Wiki so I assume not.
This question is closely tied to what kind of latency you can tolerate.

A few more details, please. What's slow? Queries? Indexing?

How slow? 100ms? 100s? What are your target times and
what are you seeing?

How big is your index? 100M? 100G? What kind of VM
parameters are you specifying?

As an aside, do note that there's no requirement in Lucene that
each document have the same fields, so it's unclear why you
need two indexes, but perhaps some of the answers to the above
will help us understand.

Also, be very very careful what you measure when you measure
queries. You absolutely *have* to put some instrumentation in
the code since "slow queries" can result from things other than
searching. For instance, iterating over a Hits object for 100s of

Mostly, I'm wondering if this is an 'XY" problem. You're asking
for specific information to accomplish "X" when there might be
a much better solution "Y" we can suggest if we knew more
about your problem....

Show the code, man <G>!


On Wed, Dec 17, 2008 at 9:40 AM, Preetham Kajekar <>wrote:

> Hi Grant,
> Thanks four response. Replies inline.
> Grant Ingersoll wrote:
>> On Dec 17, 2008, at 12:57 AM, Preetham Kajekar wrote:
>>  Hi,
>>> I am new to Lucene. I am not using it as a pure text indexer.
>>> I am trying to index a Java object which has about 10 fields (like id,
>>> time, srcIp, dstIp) - most of them being numerical values.
>>> In order to speed up indexing, I figured that having two separate
>>> indexers, each of them indexing different set of fields works great. So I
>>> have the first 5 fields in index1 and the remaining in index2.
>> Can you explain this a bit more?  Are those two fields really large org
>> something?  How are you obtaining them?  How are you correlating the
>> documents between the two indexes?  Did you actually try a single index and
>> it was too slow?
> I have a java object which has about 10 fields. However, the fields are not
> fixed. The java object is essentially a representation of Syslogs from
> network devices. So different syslogs have different fields. Each field has
> a unique id and a value (mostly numeric types, so i convert it to string).
> There are some fixed fields. So the object is a list of fields which is
> produced by a parser.
> I am trying to index using two indexers in two separate threads- one for
> fixed and another for the non-fixed fields. Except for a unique id, I do not
> store the fields in Lucene - i just index them. From the index, i get the
> unique id which is all I care about. (the objects are stored elsewhere and
> can be looked up based on this unique id).
> I did try using a single indexer, but things were quite slow. Getting high
> throughput is crucial and having two indexers seemed to do very well. (more
> than twice as fast)
> Further, the index will never be modified and I can have just one thread
> writing to the index. If there are any other performance tips would be very
> helpful. I have already looked at the wiki link regarding performance and
> using some of them.
> Thanks,
> ~preetham
>>> Now, I want to have boolean AND query's looking for values in both
>>> indexes. Like f1=1234 AND f7=ABCD.f1 and f7 and present in two separate
>>> indexes. Would using the MultiIndexReader help ? Since I am doing an AND, I
>>> dont expect that it would work.
>>> Thanks,
>>> ~preetham
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> --------------------------
>> Grant Ingersoll
>> Lucene Helpful Hints:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message