lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nguyen, Vincent (CDC/OSELS/PHITPO) (CTR)" <v...@cdc.gov>
Subject RE: Solr returning irrelevant results
Date Wed, 15 Sep 2010 16:34:34 GMT
Sorry about that, I made it uppercase to emphasize it.  The word was just "examined"

Vincent Vu Nguyen
Division of Science Quality and Translation
Office of the Associate Director for Science
Centers for Disease Control and Prevention (CDC)
404-498-6154
Century Bldg 2400
Atlanta, GA 30329 


-----Original Message-----
From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik Seeley
Sent: Wednesday, September 15, 2010 11:40 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr returning irrelevant results

On Wed, Sep 15, 2010 at 11:29 AM, Nguyen, Vincent (CDC/OSELS/PHITPO)
(CTR) <vng0@cdc.gov> wrote:
> I was running a query on the word "mining" and got results from
> documents that have nothing to do with mining.  I got results with a
> score of 0.2997284 and less.  It looks like Solr was querying the
> dsm.fulltext field for "mine" as well, which is ok except there were no
> "mine" words in the document.  However, I did find words like
> "exaMINEd".

Was the "MINE" in "exaMINEd" actually uppercase, or did you do that
for emphasis?

If it was actually uppercased, one could argue it is a relevant
document since someone was trying to get "MINE" to stand out for some
reason.

Anyway, if you don't want that behavior then turn off splitting on case change.
splitOnCaseChange="0" in WordDelimiterFilterFactory
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8



Mime
View raw message