lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: How to figure out whether stopwords are being indexed or not
Date Wed, 22 Feb 2017 01:22:36 GMT
Attach &debug=query to your query and look at the parsed query that's returned.
That'll tell you what was searched at least.

You can also use the TermsComponent to examine terms in a field directly.

Best,
Erick

On Tue, Feb 21, 2017 at 2:52 PM, Pratik Patel <pratik@semandex.net> wrote:
> I have a field type in schema which has been applied stopwords list.
> I have verified that path of stopwords file is correct and it is being
> loaded fine in solr admin UI. When I analyse these fields using "Analysis" tab
> of the solr admin UI, I can see that stopwords are being filtered out.
> However, when I query with some of these stopwords, I do get the results
> back which makes me think that probably stopwords are being indexed.
>
> For example, when I run following query, I do get back results. I have word
> "and" in the stopwords list so I expect no results for this query.
>
> http://localhost:8081/solr/collection1/select?fq=Description_note:*%20and%20*&indent=on&q=*:*&rows=100&start=0&wt=json
>
> Does this mean that the "and" word is being indexed and stopwords are not
> being used?
>
> Following is the field type of field Description_note :
>
>
> <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100" omitNorms="true">
>       <analyzer type="index">
>       <charFilter class="solr.HTMLStripCharFilterFactory" />
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="((?m)[a-z]+)'s" replacement="$1s" />
> <filter class="solr.WordDelimiterFilterFactory" protected="protwords.txt"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
>         <filter class="solr.KStemFilterFactory" />
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>       </analyzer>
>       <analyzer type="query">
>       <charFilter class="solr.HTMLStripCharFilterFactory" />
>         <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="((?m)[a-z]+)'s" replacement="$1s" />
> <filter class="solr.WordDelimiterFilterFactory" protected="protwords.txt"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
>         <filter class="solr.KStemFilterFactory" />
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>       </analyzer>
>     </fieldType>

Mime
View raw message