lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: lucene (search) performance tuning
Date Tue, 29 May 2012 00:55:06 GMT
And, no RamDirectory does not help.

On Mon, May 28, 2012 at 5:54 PM, Lance Norskog <goksron@gmail.com> wrote:
> Can you use filter queries? Filters short-circuit a lot of search
> processing. "City:San Francisco" is a classic filter - it is a small
> part of the documents and it is reused a lot.
>
> On Sat, May 26, 2012 at 7:32 AM, Yang <teddyyyy123@gmail.com> wrote:
>> I'm using disjunction (OR) query. unfortunately all of the clauses are
>> optional
>>
>> On Sat, May 26, 2012 at 4:38 AM, Simon Willnauer <
>> simon.willnauer@googlemail.com> wrote:
>>
>>> On Sat, May 26, 2012 at 2:59 AM, Yang <teddyyyy123@gmail.com> wrote:
>>> > I tested with more threads / processes. indeed this is completely
>>> > cpu-bound, since running 1 thread gives the same latency as 4 threads (my
>>> > box has 4 cores)
>>> >
>>> >
>>> > given this, is there any way to simplify the scoring computation (i'm
>>> only
>>> > using lucene as a first level "rough" search, so the search quality is
>>> not
>>> > a huge issue here) , so that, for example, fewer fields are evaluated or
>>> a
>>> > simpler scoring function is used?
>>>
>>> are you using disjunction or conjunction queries? Can you make some
>>> parts of the query mandatory?
>>>
>>> simon
>>> >
>>> > thanks
>>> > Yang
>>> >
>>> > On Fri, May 25, 2012 at 5:47 PM, Yang <teddyyyy123@gmail.com> wrote:
>>> >
>>> >> thanks a lot guys
>>> >>
>>> >>
>>> >> On Tue, May 22, 2012 at 1:34 AM, Ian Lea <ian.lea@gmail.com> wrote:
>>> >>
>>> >>> Lots of good tips in
>>> >>> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, linked
from
>>> >>> the FAQ.
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Ian.
>>> >>>
>>> >>>
>>> >>> On Tue, May 22, 2012 at 2:08 AM, Li Li <fancyerii@gmail.com>
wrote:
>>> >>> > something wrong when writing in my android client.
>>> >>> > if RAMDirectory do not help, i think the bottleneck is cpu.
you may
>>> try
>>> >>> to
>>> >>> > tune jvm but i do not expect much improvement.
>>> >>> > the best one is splitting your index into 2 or more smaller
ones.
>>> >>> > you can then use solr s distributed searching.
>>> >>> > if the cpu is not fully used, yuo can do this in one physical
machine
>>> >>> >
>>> >>> > 在 2012-5-22 上午8:50,"Li Li" <fancyerii@gmail.com>写道:
>>> >>> >>
>>> >>> >>
>>> >>> >> 在 2012-5-22 凌晨4:59,"Yang" <teddyyyy123@gmail.com>写道:
>>> >>> >>
>>> >>> >> >
>>> >>> >> > I'm trying to make my search faster. right now a query
like
>>> >>> >> >
>>> >>> >> > name:Joe Moe Pizza   address:77 main street  city:San
Francisco
>>> >>> >> >is this a conjunction query or a disjunction query?
>>> >>> >>
>>> >>> >> > in a index with 20mil such short business descriptions
(total size
>>> >>> > about 3GB) takes about 100--200ms.
>>> >>> >> >20m is not a small size, how many results for a query
in average?
>>> >>> >>
>>> >>> >> > I profiled the query, most time is spent in TermScorer.score(),
>>> as is
>>> >>> > shown by the attached yourkit screenshot.
>>> >>> >> >that's true, for a query, matching and scoring is
very time
>>> consuming
>>> >>> > and cpu intensive. another one is io for reading postings.
>>> >>> >>
>>> >>> >> >
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > I tried loading the index onto tmpfs (in-memory block
device), and
>>> >>> also
>>> >>> > tried RAMDirectory, neither helps much.
>>> >>> >> >if that is true. it seems that io is not the
>>> >>> >> > I am reading
>>> >>> > http://www.cnlp.org/presentations/slides/AdvancedLuceneEU.pdf
>>> >>> >> > it mentions
>>> >>> >> > Size
>>> >>> >> > – Stopword removal
>>> >>> >> > – Stemming
>>> >>> >> > • Lucene has a number of stemmers available
>>> >>> >> > • Light versus Aggressive
>>> >>> >> > • May prevent fine-grained matches in some cases
>>> >>> >> > – Not a linear factor (usually) due to index compression
>>> >>> >> >
>>> >>> >> > so for "stopword removal", I'm already using the standard
>>> analyzer,
>>> >>> so
>>> >>> > stop word removal is already included, right?
>>> >>> >> >
>>> >>> >> > also generally any other tricks to try for reducing
the search
>>> >>> latency?
>>> >>> >> >
>>> >>> >> > Thanks!
>>> >>> >> > Yang
>>> >>> >> >
>>> >>> >> >
>>> >>> >> >
>>> ---------------------------------------------------------------------
>>> >>> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> >>> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >>>
>>> >>> ---------------------------------------------------------------------
>>> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >>>
>>> >>>
>>> >>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com



-- 
Lance Norskog
goksron@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message