lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Hall <mh...@informatics.jax.org>
Subject Re: Performance issue
Date Mon, 02 Feb 2009 19:34:59 GMT
Do you NEED to be using 7 fields here?

Like Erick said, if you could give us an example of the types of data 
you are trying to search against, it would be quite helpful.

Its possible that you might be able to say collapse your 7 fields down 
to a single field, which would likely reduce the overall number of or 
clauses in your searches, speeding things up nicely.

At my project we search two letter prefix searches in sub seconds, for 
much larger datasets.  Alot of this however is directly due to how our 
indexes are structured.

-Matt

Erick Erickson wrote:
> Prefix queries are expensive here. The problem is
> that each one forms a very large OR clause on all
> the terms that start with those two letters. For instance,
> if a field in your index contained
> mine
> milanta
> mica
>
> a prefix search on "mi" would form
> mine OR milanta OR mica.
>
> Doing this across seven fields could get expensive.
>
> Two things:
> 1> what is the problem you are trying to solve? Perhaps some
> of the folks on the list can give you some suggestions. You can
> think about many strategies depending upon what you want
> to accomplish. A 300M index isn't very big, so you could, for
> instance, think about indexing a separate field that contains only
> the two beginning letters and search *that* in this case. I'll
> assume that three letter prefix queries are OK.
>
> 2> How are you measuring query time? If you're measuring the
> time it takes when you first start a searcher, be aware that the
> first few queries are usually slow because the caches haven't
> been filled. Further, are you measuring total response time or
> are you measuring *just* the query time? It's possible that the
> time is being spent assembling the response in your code
> rather than actual searching. You might insert some timers
> to determine that.
>
> Best
> Erick
>
> On Mon, Feb 2, 2009 at 2:58 AM, Mittal, Sourabh (IDEAS) <
> Sourabh-931.Mittal@morganstanley.com> wrote:
>
>   
>> Hi All,
>>
>> We face serious performance issues when users do 2 letter search e.g ho,
>> jo, pa ma, um ar, ma fi etc. time taken between 10 - 15 secs.
>> Below is our implementation details:
>>
>> 1. Search performs on 7 fields.
>> 2. PrefixQuery implementation on all fields
>> 3. AND search.
>> 4. Our indexer size is 300 MB.
>> 5. We show only 100 top documents only on the basis of score.
>> 6. We user StandardAnalyzer & StandardTokenizer for indexing &
>> searching.
>> 7. Lucene 2.4
>> 8. JDK1 .6
>>
>> Please suggest me how can we improve the performance.
>>
>> Regards,
>> Sourabh Mittal
>> Morgan Stanley | IDEAS Practice Areas
>> Manikchand Ikon | South Wing 18 | Dhole Patil Road
>> Pune, 411001
>> Phone: +91 20 2620-7053
>> Sourabh-931.Mittal@morganstanley.com
>>
>>
>>
>> --------------------------------------------------------------------------
>> NOTICE: If received in error, please destroy and notify sender. Sender does
>> not intend to waive confidentiality or privilege. Use of this email is
>> prohibited when received in error.
>>
>>     
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message