Return-Path: Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: (qmail 79304 invoked from network); 27 Jan 2009 16:48:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 27 Jan 2009 16:48:58 -0000 Received: (qmail 29280 invoked by uid 500); 27 Jan 2009 16:48:54 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 29239 invoked by uid 500); 27 Jan 2009 16:48:54 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 29228 invoked by uid 99); 27 Jan 2009 16:48:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Jan 2009 08:48:53 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [194.8.212.9] (HELO martini.dimedis.de) (194.8.212.9) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Jan 2009 16:48:46 +0000 Received: from localhost (localhost [127.0.0.1]) by martini.dimedis.de (Postfix) with ESMTP id 22AA07842D for ; Tue, 27 Jan 2009 17:48:24 +0100 (CET) Received: from martini.dimedis.de ([127.0.0.1]) by localhost (martini.dimedis.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Mr4RLe6Al1+M for ; Tue, 27 Jan 2009 17:48:23 +0100 (CET) Received: from [194.8.212.157] (cinzano.dimedis.de [194.8.212.157]) by martini.dimedis.de (Postfix) with ESMTP id D3C2F783E7 for ; Tue, 27 Jan 2009 17:48:23 +0100 (CET) Message-ID: <497F3AD7.5070300@netcologne.de> Date: Tue, 27 Jan 2009 17:48:23 +0100 From: Gert Brinkmann User-Agent: Mozilla-Thunderbird 2.0.0.19 (X11/20090103) MIME-Version: 1.0 To: solr-user@lucene.apache.org Subject: query with stemming, prefix and fuzzy? X-Enigmail-Version: 0.95.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hello, I am trying to get Solr to properly work. I have set up a Solr test server (using jetty as mentioned in the tutorial). Also I had to modify the schema.xml so that I have different fields for different languages (with their own stemmers) that occur in the content management system that I am indexing. So far everything does work fine including snippet highlighting. But now I am having some problems with two things: A) fuzzy search When trying to do a fuzzy search the analyzers seem to break up a search string like "house~0.6" into "house", "0" and "6" so that e.g. a single "6" is highlighted, too. So I tried to use an additional raw-field without any stemming and just a lower case and white space analyzer. This seems to work fine. But fuzzy query is very slow and takes 100% CPU for several seconds with only one query at a time. What can I do to speed up the fuzzy query? I e.g. have found a Lucene parameter prefixLength but no according Solr option. Does this exist? Are there some other options to pay attention to? B) combine stemming, prefix and fuzzy search Is there a way to combine all this three query types in one query? Especially stemming and prefixing? I think it would be problematic as a "house*" would be analyzed to "house" with the usual analyzers that are required for stemming? Do I need different query type fields and combine them with an boolean OR in the query? Something like data:house OR data_fuzzy:house~0.6 OR data_prefix:house* This feels to be a little bit circuitous. Is there a way to use "house*~.6" including correct stemming? Thank you, Gert