lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: lucene (search) performance tuning
Date Sat, 26 May 2012 11:38:08 GMT
On Sat, May 26, 2012 at 2:59 AM, Yang <teddyyyy123@gmail.com> wrote:
> I tested with more threads / processes. indeed this is completely
> cpu-bound, since running 1 thread gives the same latency as 4 threads (my
> box has 4 cores)
>
>
> given this, is there any way to simplify the scoring computation (i'm only
> using lucene as a first level "rough" search, so the search quality is not
> a huge issue here) , so that, for example, fewer fields are evaluated or a
> simpler scoring function is used?

are you using disjunction or conjunction queries? Can you make some
parts of the query mandatory?

simon
>
> thanks
> Yang
>
> On Fri, May 25, 2012 at 5:47 PM, Yang <teddyyyy123@gmail.com> wrote:
>
>> thanks a lot guys
>>
>>
>> On Tue, May 22, 2012 at 1:34 AM, Ian Lea <ian.lea@gmail.com> wrote:
>>
>>> Lots of good tips in
>>> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, linked from
>>> the FAQ.
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>> On Tue, May 22, 2012 at 2:08 AM, Li Li <fancyerii@gmail.com> wrote:
>>> > something wrong when writing in my android client.
>>> > if RAMDirectory do not help, i think the bottleneck is cpu. you may try
>>> to
>>> > tune jvm but i do not expect much improvement.
>>> > the best one is splitting your index into 2 or more smaller ones.
>>> > you can then use solr s distributed searching.
>>> > if the cpu is not fully used, yuo can do this in one physical machine
>>> >
>>> > 在 2012-5-22 上午8:50,"Li Li" <fancyerii@gmail.com>写道:
>>> >>
>>> >>
>>> >> 在 2012-5-22 凌晨4:59,"Yang" <teddyyyy123@gmail.com>写道:
>>> >>
>>> >> >
>>> >> > I'm trying to make my search faster. right now a query like
>>> >> >
>>> >> > name:Joe Moe Pizza   address:77 main street  city:San Francisco
>>> >> >is this a conjunction query or a disjunction query?
>>> >>
>>> >> > in a index with 20mil such short business descriptions (total size
>>> > about 3GB) takes about 100--200ms.
>>> >> >20m is not a small size, how many results for a query in average?
>>> >>
>>> >> > I profiled the query, most time is spent in TermScorer.score(),
as is
>>> > shown by the attached yourkit screenshot.
>>> >> >that's true, for a query, matching and scoring is very time consuming
>>> > and cpu intensive. another one is io for reading postings.
>>> >>
>>> >> >
>>> >> >
>>> >> >
>>> >> > I tried loading the index onto tmpfs (in-memory block device),
and
>>> also
>>> > tried RAMDirectory, neither helps much.
>>> >> >if that is true. it seems that io is not the
>>> >> > I am reading
>>> > http://www.cnlp.org/presentations/slides/AdvancedLuceneEU.pdf
>>> >> > it mentions
>>> >> > Size
>>> >> > – Stopword removal
>>> >> > – Stemming
>>> >> > • Lucene has a number of stemmers available
>>> >> > • Light versus Aggressive
>>> >> > • May prevent fine-grained matches in some cases
>>> >> > – Not a linear factor (usually) due to index compression
>>> >> >
>>> >> > so for "stopword removal", I'm already using the standard analyzer,
>>> so
>>> > stop word removal is already included, right?
>>> >> >
>>> >> > also generally any other tricks to try for reducing the search
>>> latency?
>>> >> >
>>> >> > Thanks!
>>> >> > Yang
>>> >> >
>>> >> >
>>> >> > ---------------------------------------------------------------------
>>> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message