Return-Path: Delivered-To: apmail-lucene-solr-user-archive@locus.apache.org Received: (qmail 39177 invoked from network); 13 Nov 2006 18:52:50 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Nov 2006 18:52:50 -0000 Received: (qmail 37137 invoked by uid 500); 13 Nov 2006 18:53:00 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 37122 invoked by uid 500); 13 Nov 2006 18:53:00 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 37113 invoked by uid 99); 13 Nov 2006 18:53:00 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Nov 2006 10:53:00 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of yseeley@gmail.com designates 66.249.92.171 as permitted sender) Received: from [66.249.92.171] (HELO ug-out-1314.google.com) (66.249.92.171) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Nov 2006 10:52:47 -0800 Received: by ug-out-1314.google.com with SMTP id k40so1125904ugc for ; Mon, 13 Nov 2006 10:52:26 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=adK/BqoAQzammk/Ys7KjMDyTpC2LmRTd8agNm4mFx2H8F2SOTHm0Xxmf3cOJU/5dxQMbn9ZjG6Et8GZZ6fJnxGdrmKauvPadEhu9bUKTnyf1fk+gvTVZ5C+DIxu7O8MM5LeZ7q0zb23QPYY7Ye8HS1tUgFU7U7iEyaauSsKsCjo= Received: by 10.82.152.16 with SMTP id z16mr67567bud.1163443945805; Mon, 13 Nov 2006 10:52:25 -0800 (PST) Received: by 10.82.149.12 with HTTP; Mon, 13 Nov 2006 10:52:25 -0800 (PST) Message-ID: Date: Mon, 13 Nov 2006 13:52:25 -0500 From: "Yonik Seeley" Sender: yseeley@gmail.com To: solr-user@lucene.apache.org Subject: Re: Index & search questions; special cases In-Reply-To: <4557B45A.9070107@sympatico.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <4557B45A.9070107@sympatico.ca> X-Google-Sender-Auth: d0e9f215a212f404 X-Virus-Checked: Checked by ClamAV on apache.org On 11/12/06, Michael Imbeault wrote: > - Somewhat related : Let's say I index "Polymyxin B". If I stopword > single letters, would a phrase search ("Polymyxin B") still find the > right documents (I don't think so, but still)? If not, I'll have to > index single letters; how do I prevent the same problem as in the first > question (i.e., a search on Polymyxin B yielding documents with > Polymyxin and B, but not close to one another). The general problem seems that you can tell what should be in a phrase search and what shouldn't You could try throwing everything in a sloppy phrase query, so at least scores will go up when terms are closer together (in general). You could also try an exact phrase query, and if you don't get enough results, follow it up with another strategy (like what you have below). > My thought is to parse the user query and rephrase it to do phrase > searches on nearby terms containing single letters / numbers. If an user > search for HIV 1 hepatitis, I'd rewrite it as ("HIV 1" AND hepatitis) OR > ("1 hepatitis" AND hiv). Is it a sensible solution? That might work. Whatever general strategy you end up trying, you can probably boost relevancy with some domain specific knowledge injected with something like the SynonymFilter. -Yonik