lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Li <fancye...@gmail.com>
Subject Re: lucene (search) performance tuning
Date Sat, 26 May 2012 11:46:39 GMT
if you don't score but sort by id, it may be a little bit faster. but
for 3.x, you can hardly speed up by simpler scoring function. for your
situation, the bottleneck is cpu. you can speed up by paralleling. so
the best one is to split index and searching concurrently. so the cpus
can be fully used.
you can split do paralleling search in lucene. but I recommend you
using solr because it's easy to scale to many nodes without many
pains.

On Sat, May 26, 2012 at 8:59 AM, Yang <teddyyyy123@gmail.com> wrote:
> I tested with more threads / processes. indeed this is completely
> cpu-bound, since running 1 thread gives the same latency as 4 threads (my
> box has 4 cores)
>
>
> given this, is there any way to simplify the scoring computation (i'm only
> using lucene as a first level "rough" search, so the search quality is not
> a huge issue here) , so that, for example, fewer fields are evaluated or a
> simpler scoring function is used?
>
> thanks
> Yang
>
> On Fri, May 25, 2012 at 5:47 PM, Yang <teddyyyy123@gmail.com> wrote:
>
>> thanks a lot guys
>>
>>
>> On Tue, May 22, 2012 at 1:34 AM, Ian Lea <ian.lea@gmail.com> wrote:
>>
>>> Lots of good tips in
>>> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, linked from
>>> the FAQ.
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>> On Tue, May 22, 2012 at 2:08 AM, Li Li <fancyerii@gmail.com> wrote:
>>> > something wrong when writing in my android client.
>>> > if RAMDirectory do not help, i think the bottleneck is cpu. you may try
>>> to
>>> > tune jvm but i do not expect much improvement.
>>> > the best one is splitting your index into 2 or more smaller ones.
>>> > you can then use solr s distributed searching.
>>> > if the cpu is not fully used, yuo can do this in one physical machine
>>> >
>>> > 在 2012-5-22 上午8:50,"Li Li" <fancyerii@gmail.com>写道:
>>> >>
>>> >>
>>> >> 在 2012-5-22 凌晨4:59,"Yang" <teddyyyy123@gmail.com>写道:
>>> >>
>>> >> >
>>> >> > I'm trying to make my search faster. right now a query like
>>> >> >
>>> >> > name:Joe Moe Pizza   address:77 main street  city:San Francisco
>>> >> >is this a conjunction query or a disjunction query?
>>> >>
>>> >> > in a index with 20mil such short business descriptions (total size
>>> > about 3GB) takes about 100--200ms.
>>> >> >20m is not a small size, how many results for a query in average?
>>> >>
>>> >> > I profiled the query, most time is spent in TermScorer.score(),
as is
>>> > shown by the attached yourkit screenshot.
>>> >> >that's true, for a query, matching and scoring is very time consuming
>>> > and cpu intensive. another one is io for reading postings.
>>> >>
>>> >> >
>>> >> >
>>> >> >
>>> >> > I tried loading the index onto tmpfs (in-memory block device),
and
>>> also
>>> > tried RAMDirectory, neither helps much.
>>> >> >if that is true. it seems that io is not the
>>> >> > I am reading
>>> > http://www.cnlp.org/presentations/slides/AdvancedLuceneEU.pdf
>>> >> > it mentions
>>> >> > Size
>>> >> > – Stopword removal
>>> >> > – Stemming
>>> >> > • Lucene has a number of stemmers available
>>> >> > • Light versus Aggressive
>>> >> > • May prevent fine-grained matches in some cases
>>> >> > – Not a linear factor (usually) due to index compression
>>> >> >
>>> >> > so for "stopword removal", I'm already using the standard analyzer,
>>> so
>>> > stop word removal is already included, right?
>>> >> >
>>> >> > also generally any other tricks to try for reducing the search
>>> latency?
>>> >> >
>>> >> > Thanks!
>>> >> > Yang
>>> >> >
>>> >> >
>>> >> > ---------------------------------------------------------------------
>>> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message