lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Poor performances with Shingle and Phrase query
Date Thu, 21 Jan 2016 20:14:55 GMT
Shingles should make a huge different on phrase query performance if
1) the phrase queries involve high frequency terms and 2) you have a
substantial number of documents in the index (so that
time-to-visit-postings dominates over time-to-lookup-terms).

118 rec/sec is already very fast for a long phrase on a large index
... how many documents in your index.

You could also try using CommonGramsFilter instead: it's like
shingles, but only for high frequency terms, so you get less increase
on your index size but big perf gains for the otherwise slow phrase
queries.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jan 21, 2016 at 1:23 PM, Bertil Chapuis <bchapuis@gmail.com> wrote:
> Hello,
>
> I'm trying improve the speed of an index when searching for long phrases. I
> performed some tests with the benchmark module. With a simple analyser and
> PhraseQueries and get a throughput of 118 rec/sec. My test dataset is the
> latest dump of wikipedia. Here is the filters I use at indexation and query
> time:
>
> var filter: TokenFilter = new StandardFilter(tokenizer)
> filter = new LowerCaseFilter(filter)
> filter = new EnglishPossessiveFilter(filter)
> filter = new StopFilter(filter, StopAnalyzer.ENGLISH_STOP_WORDS_SET)
> filter = new SnowballFilter(filter, "English")
>
> In order to improve performances I tried to add a ShingleFilter and did
> some benchmark with PhraseQueries and BooleanQueries (Should, Must) and in
> both cases got a lower throughput (respectively 83rec/sec and 84 rec/sec).
> Here is the filter:
>
> var filter: TokenFilter = new StandardFilter(tokenizer)
> filter = new LowerCaseFilter(filter)
> filter = new EnglishPossessiveFilter(filter)
> filter = new StopFilter(filter, StopAnalyzer.ENGLISH_STOP_WORDS_SET)
> filter = new SnowballFilter(filter, "English")
> val shingleFilter =  new ShingleFilter(filter, 2, 2)
> shingleFilter.setOutputUnigrams(false)
> filter = shingleFilter
>
> From what I read, the performances should be better, but I'm unable to get
> the desired results. Has anyone some advices on the best way to use shingle
> in order to improve performances? Should I use some other form of Query?
>
> Thank you in advance for your help.
>
> Regards,
>
> Bertil

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message