lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ethan Collins <collins.eth...@gmail.com>
Subject Re: ShingleFilter failing with more terms than index phrase
Date Wed, 14 Jul 2010 07:15:44 GMT
Hi Steve,

Thanks for your kind response. I checked PositionfilterFactory
(re-index as well) but that also didn't solve the problem. Interesting
the problem is not reproduceable from Solr's Field Analysis page, it
manifests only when it's in a query.

I guess the subject for this post is not very correct, it's not that
ShingleFilter is failing but -- using ShingleFilter, there is no score
provided by the shingle field when I pass more terms than the indexed
terms. I observe this using debugQuery.

I had actually posted to solr-user but received no response yet.
Probably because the problem is not clear at first glance. However,
there's an example I have put in the mail for someone interested to
try out and check if there's a problem. Let's see if I receive any
response.

-Ethan

On Tue, Jul 13, 2010 at 9:15 PM, Steven A Rowe <sarowe@syr.edu> wrote:
> Hi Ethan,
>
> You'll probably get better answers about Solr specific stuff on the solr-user@a.l.o list.
>
> Check out PositionFilterFactory - it may address your issue:
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory
>
> Steve
>
>> -----Original Message-----
>> From: Ethan Collins [mailto:collins.ethans@gmail.com]
>> Sent: Tuesday, July 13, 2010 3:42 AM
>> To: java-user@lucene.apache.org
>> Subject: ShingleFilter failing with more terms than index phrase
>>
>> I am using lucene 2.9.3 (via Solr 1.4.1) on windows and am trying to
>> understand ShingleFilter. I wrote the following code and find that if I
>> provide more words than the actual phrase indexed in the field, then the
>> search on that field fails (no score found with debugQuery=true).
>>
>> Here is an example to reproduce, with field names:
>> Id: 1
>> title_1: Nina Simone
>> title_2: I put a spell on you
>>
>> Query (dismax) with:
>> - “Nina Simone I put”  <- Fails i.e. no score shown from title_1 search
>> (using debugQuery)
>> - “Nina Simone” <- SUCCESS
>>
>> But, when I used Solr’s Field Analysis with the ‘shingle’ field (given
>> below) and tried “Nina Simone I put”, it succeeds. It’s only during the
>> query that no score is provided. I also checked ‘parsedquery’ and it shows
>> disjunctionMaxQuery issuing the string “Nina_Simone Simone_I I_put” to the
>> title_1 field.
>>
>> title_1 and title_2 fields are of type ‘shingle’, defined as:
>>
>>    <fieldType name="shingle" class="solr.TextField"
>> positionIncrementGap="100" indexed="true" stored="true">
>>        <analyzer type="index">
>>            <tokenizer class="solr.StandardTokenizerFactory"/>
>>            <filter class="solr.LowerCaseFilterFactory"/>
>>            <filter class="solr.ShingleFilterFactory"
>> maxShingleSize="2" outputUnigrams="false"/>
>>        </analyzer>
>>        <analyzer type="query">
>>            <tokenizer class="solr.StandardTokenizerFactory"/>
>>            <filter class="solr.LowerCaseFilterFactory"/>
>>            <filter class="solr.ShingleFilterFactory"
>> maxShingleSize="2" outputUnigrams="false"/>
>>        </analyzer>
>>    </fieldType>
>>
>> Note that I also have a catchall field which is text. I have qf set
>> to: 'id^2 catchall' and pf set to: 'title_1^1.5 title_2^1.2'
>>
>> If I am missing something or doing something wrong please let me know.
>>
>> -Ethan
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message