lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teague James" <teag...@insystechinc.com>
Subject RE: Of, To, and Other Small Words
Date Tue, 15 Jul 2014 01:21:53 GMT
Jack,

Thanks for replying and the suggestion. I replied to another suggestion with my field type
and I do have <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
/>.  There's nothing in the stopwords.txt. I even cleaned out stopwords_en.txt just to
be certain. Any other suggestions on how to control this behavior?

-Teague

-----Original Message-----
From: Jack Krupansky [mailto:jack@basetechnology.com] 
Sent: Monday, July 14, 2014 4:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Of, To, and Other Small Words

Or, if you happen to leave off the "words" attribute of the stop filter (or misspell the attribute
name), it will use the internal Lucene hardwired list of stop words.

-- Jack Krupansky

-----Original Message-----
From: Anshum Gupta
Sent: Monday, July 14, 2014 4:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Of, To, and Other Small Words

Hi Teague,

The StopFilterFactory (which I think you're using) by default uses lang/stopwords_en.txt (which
wouldn't be empty if you check).
What you're looking at is the stopword.txt. You could either empty that file out or change
the field type for your field.


On Mon, Jul 14, 2014 at 12:53 PM, Teague James <teaguej@insystechinc.com>
wrote:
> Hello all,
>
> I am working with Solr 4.9.0 and am searching for phrases that contain 
> words like "of" or "to" that Solr seems to be ignoring at index time. 
> Here's what I tried:
>
> curl http://localhost/solr/update?commit=true -H "Content-Type: text/xml"
> --data-binary '<add><doc><field name="id">100</field><field

> name="content">blah blah blah knowledge of science blah blah 
> blah</field></doc></add>'
>
> Then, using a broswer:
>
> http://localhost/solr/collection1/select?q="knowledge+of+science"&fq=i
> d:100
>
> I get zero hits. Search for "knowledge" or "science" and I'll get hits.
> "knowledge of" or "of science" and I get zero hits. I don't want to 
> use proximity if I can avoid it, as this may introduce too many 
> undesireable results. Stopwords.txt is blank, yet clearly Solr is 
> ignoring "of" and "to"
> and possibly more words that I have not discovered through testing 
> yet. Is there some other configuration file that contains these small 
> words? Is there any way to force Solr to pay attention to them and not 
> drop them from the phrase? Any advice is appreciated! Thanks!
>
> -Teague
>
>



-- 

Anshum Gupta
http://www.anshumgupta.net 


Mime
View raw message