lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: MM Parameter and Performance in Solr
Date Tue, 30 Jun 2009 16:47:05 GMT

: Date: Thu, 4 Jun 2009 19:30:10 -0700
: From: Kaktu Chakarabati
: Subject: MM Parameter and Performance in Solr

Kaktu: It doesn't look like you ever got a reply to your question 
(possibly because you sent it to solr-dev, but it's more appropriate for 
solr-user) 

I haven't done any specific performance comparisons of dismax with mm=100% 
vs mm=X%, but in truth that would be an apples to oranges comparison.

the lower the percentage, the more permutations of input terms there are 
that can produce matches, and the more documents that will match -- in 
which case Solr by definition is doing more work.

Asking for a workarround or best practice for dealing with something like 
this is akin to asking for workarrounds for queries that are slow because 
they contain lots of terms and match lots of documents -- there aren't 
really a lot of options, other then preventing your users from executing 
those queries.

The question i would ask in your shoes is wether having the partial 
matching of mm=X% is worth the added search time, or if you'd be happier 
having more exact matching (mm=100%) and faster searches.

: Hey guys,
: I've been noticing for quite a long time that using minmatch parameter with
: a value less than 100%
: alongside the dismax qparser seriously degrades performance. My particular
: use case involves
: using dismax over a set of 4-6 textual fields, about half of which do *not*
: filter stop words. ( so yes,
: these do involve iterating over large portion of my index in some cases).
: 
: This is somewhat understandable as the task of constructing result sets is
: no longer simply intersection based,
: however I do wonder what work-arounds / standard solutions exist for this
: problem and which are applicable
: in the solr/lucene environment ( I.e dividing index to 'primary' /
: 'secondary' sections, using n-gram indices, caching configuration, sharding
: might help..? )
: I'm working with not such a large corpus (~20 million documents) and the
: query processing time is way too long
: to my mind ( my goal is 90% percentile QTime to hit around 200ms, I can say
: that currently its more than double that.. )
: Can anyone please share some of his knowledge? what is practiced i.e in
: google, yahoo..? Any plans to address these issue in solr/lucene or am
: i just using it wrongly?



-Hoss


Mime
View raw message