Return-Path: Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: (qmail 83825 invoked from network); 15 Sep 2010 16:39:28 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 15 Sep 2010 16:39:28 -0000 Received: (qmail 71576 invoked by uid 500); 15 Sep 2010 16:39:26 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 71510 invoked by uid 500); 15 Sep 2010 16:39:25 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 71502 invoked by uid 99); 15 Sep 2010 16:39:25 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Sep 2010 16:39:25 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [158.74.244.10] (HELO lta2ip001.ees.hhs.gov) (158.74.244.10) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Sep 2010 16:39:01 +0000 X-SENDER-IP: 158.74.248.85 X-SENDER-REPUTATION: None Received: from lta3tj001.ees.hhs.gov (HELO atlmx1.ees.hhs.gov) ([158.74.248.85]) by lta2ip001.ees.hhs.gov with ESMTP; 15 Sep 2010 12:39:05 -0400 Received: from [158.74.248.204] by atlmx1.ees.hhs.gov with ESMTP (You are accessing a U.S. Government information system, which includes(1) this computer, (2) this computer network, (3) all computers connected to this network, and (4) all devices and storage media attached to this network or to a computer on this networ); Wed, 15 Sep 2010 12:38:56 -0400 X-Server-Uuid: A0924751-1906-43D9-BB7A-ADBAA32135F7 Received: from LTA3VS021.ees.hhs.gov ([158.74.248.132]) by LTA3MF011.ees.hhs.gov with Microsoft SMTPSVC(6.0.3790.3959); Wed, 15 Sep 2010 12:34:34 -0400 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Subject: RE: Solr returning irrelevant results Date: Wed, 15 Sep 2010 12:34:34 -0400 Message-ID: <86C85FDA2DB86F4F91A7DFEFED7B69F70C1364@LTA3VS021.ees.hhs.gov> In-Reply-To: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Solr returning irrelevant results Thread-Index: ActU7G8ln5euTdJLQmStT2bSa5AE8gAByUHg References: <86C85FDA2DB86F4F91A7DFEFED7B69F70C1363@LTA3VS021.ees.hhs.gov> From: "Nguyen, Vincent (CDC/OSELS/PHITPO) (CTR)" To: solr-user@lucene.apache.org, yonik@lucidimagination.com X-OriginalArrivalTime: 15 Sep 2010 16:34:34.0824 (UTC) FILETIME=[E18F4480:01CB54F3] X-TMWD-Spam-Summary: TS=20100915163856; ID=1; SEV=2.3.1; DFV=B2010091516; IFV=NA; AIF=B2010091516; RPD=5.03.0010; ENG=NA; RPDID=7374723D303030312E30413032303230372E34433930463638362E303139343A53434653544154333932313331312C73733D312C6667733D30; CAT=NONE; CON=NONE; SIG=AAAAAAAAAAAAAAAAAAAAAAAAfQ== X-MMS-Spam-Filter-ID: B2010091516_5.03.0010 X-Sending-Verification: Verified X-WSS-ID: 608E29152KC7124457-07-04 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Sorry about that, I made it uppercase to emphasize it. The word was = just "examined" Vincent Vu Nguyen Division of Science Quality and Translation Office of the Associate Director for Science Centers for Disease Control and Prevention (CDC) 404-498-6154 Century Bldg 2400 Atlanta, GA 30329=20 -----Original Message----- From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik = Seeley Sent: Wednesday, September 15, 2010 11:40 AM To: solr-user@lucene.apache.org Subject: Re: Solr returning irrelevant results On Wed, Sep 15, 2010 at 11:29 AM, Nguyen, Vincent (CDC/OSELS/PHITPO) (CTR) wrote: > I was running a query on the word "mining" and got results from > documents that have nothing to do with mining. =A0I got results with a > score of 0.2997284 and less. =A0It looks like Solr was querying the > dsm.fulltext field for "mine" as well, which is ok except there were = no > "mine" words in the document. =A0However, I did find words like > "exaMINEd". Was the "MINE" in "exaMINEd" actually uppercase, or did you do that for emphasis? If it was actually uppercased, one could argue it is a relevant document since someone was trying to get "MINE" to stand out for some reason. Anyway, if you don't want that behavior then turn off splitting on case = change. splitOnCaseChange=3D"0" in WordDelimiterFilterFactory http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDeli= miterFilterFactory -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8