lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <sar...@syr.edu>
Subject RE: ShingleFilter failing with more terms than index phrase
Date Tue, 13 Jul 2010 15:45:32 GMT
Hi Ethan,

You'll probably get better answers about Solr specific stuff on the solr-user@a.l.o list.

Check out PositionFilterFactory - it may address your issue:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory

Steve

> -----Original Message-----
> From: Ethan Collins [mailto:collins.ethans@gmail.com]
> Sent: Tuesday, July 13, 2010 3:42 AM
> To: java-user@lucene.apache.org
> Subject: ShingleFilter failing with more terms than index phrase
> 
> I am using lucene 2.9.3 (via Solr 1.4.1) on windows and am trying to
> understand ShingleFilter. I wrote the following code and find that if I
> provide more words than the actual phrase indexed in the field, then the
> search on that field fails (no score found with debugQuery=true).
> 
> Here is an example to reproduce, with field names:
> Id: 1
> title_1: Nina Simone
> title_2: I put a spell on you
> 
> Query (dismax) with:
> - “Nina Simone I put”  <- Fails i.e. no score shown from title_1 search
> (using debugQuery)
> - “Nina Simone” <- SUCCESS
> 
> But, when I used Solr’s Field Analysis with the ‘shingle’ field (given
> below) and tried “Nina Simone I put”, it succeeds. It’s only during the
> query that no score is provided. I also checked ‘parsedquery’ and it shows
> disjunctionMaxQuery issuing the string “Nina_Simone Simone_I I_put” to the
> title_1 field.
> 
> title_1 and title_2 fields are of type ‘shingle’, defined as:
> 
>    <fieldType name="shingle" class="solr.TextField"
> positionIncrementGap="100" indexed="true" stored="true">
>        <analyzer type="index">
>            <tokenizer class="solr.StandardTokenizerFactory"/>
>            <filter class="solr.LowerCaseFilterFactory"/>
>            <filter class="solr.ShingleFilterFactory"
> maxShingleSize="2" outputUnigrams="false"/>
>        </analyzer>
>        <analyzer type="query">
>            <tokenizer class="solr.StandardTokenizerFactory"/>
>            <filter class="solr.LowerCaseFilterFactory"/>
>            <filter class="solr.ShingleFilterFactory"
> maxShingleSize="2" outputUnigrams="false"/>
>        </analyzer>
>    </fieldType>
> 
> Note that I also have a catchall field which is text. I have qf set
> to: 'id^2 catchall' and pf set to: 'title_1^1.5 title_2^1.2'
> 
> If I am missing something or doing something wrong please let me know.
> 
> -Ethan
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

Mime
View raw message