lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Haishan Chen <hais...@msn.com>
Subject RE: Phrase Query Performance Question and score threshold
Date Mon, 05 Nov 2007 23:18:18 GMT



> Date: Mon, 5 Nov 2007 14:55:21 -0500> From: yonik@apache.org> To: solr-user@lucene.apache.org>
Subject: Re: Phrase Query Performance Question and score threshold> > On 11/5/07, Haishan
Chen <haishan@msn.com> wrote:> > If I limit the documents returned based on a
score threshold (filter by score) will it be able to improve query performance?> > No.>
> Taking a different approach can really speed up queries though.> To figure out what
approach you should take, we need to know what you> are trying to do.> As Hoss said:
http://people.apache.org/~hossman/#xyproblem> > > How many different phrase queries
are you having performance issues with?> > -Yonik
 
 
 
Thanks for replying Yonik.  
 
Out of my strong curiosity I was trying to implement a search application that my colleague
already did very successfully. I tried to to use SOLR to build the same application and see
if it works. Basically there are millions of documents. They are categorized and the content
of the document is constructed by program using its category as input. A search application
will search the content and bring up the document. The way of constructing the document has
been proven to be excellent in terms of relevancy. Of course it rely on using slop phrase
queries.  Now I want to build something that is able to search the content and bring up the
document fast.  That is basically what I want to do. 
 
I can't go any more detail on how the document content was constructed because the company
I work for has patent pending on it. I dare not to discuss it in public. But the way it was
constructed seems to be the reason of why document frequency was so high (for many phrase)
and a search usually bring up large result set.  But top score documents have very good relevancy.
So I am facing two issue. One is to make the slop phrase query faster, second is to make result
set smaller. 
 
Using a score threshold may solve the second issue. That will be great if you can point me
how to achieve that. 
 
As for the first issues. The number of different phrase queries have performance issues I
found so far are about 10. I believe there will be a lot more I just haven't tried.  It can
be solve by using faster hard ware though.  Also I believe it will help if SOLR has samilar
distributed search architecture like NUTCH so that it can scale out instead of scale up. 
 
 
 
 
Thanks a lot
Haishan
_________________________________________________________________
Help yourself to FREE treats served up daily at the Messenger Café. Stop by today.
http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWLtagline
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message