lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lee Carroll <lee.a.carr...@googlemail.com>
Subject Re: Queried value and Indexed value are the same still no match in the query result
Date Mon, 13 Feb 2012 10:28:19 GMT
Hi You have a lot of language processing for a field which contains,
at least in your example non words.

Do you need the synonyms, two lots of stemming, etc....

what is the field for?

>>" I don't believe that this last point is what actually causes
>> my unsatisfactory results"

it probably is

On 13 February 2012 10:02, Dirceu Vieira <dirceuvjr@gmail.com> wrote:
> Hi,
>
> Anybody has any thoughts about this?
> I'm really struggling whit these problems, any hints would be very welcome!
>
> Regards,
>
> Dirceu
>
> On Fri, Feb 10, 2012 at 4:45 PM, Dirceu Vieira <dirceuvjr@gmail.com> wrote:
>
>> Hi Guys,
>>
>> Would someone have time to help me understand what's happening here:
>>
>> I have a dynamic field called *prMeta_service *and this value *"EHT2011-2012"
>> *is indexed for various documents.
>>
>> When I search for the same exact value (*"EHT2011-2012"*), it ends up NOT
>> matching the content.
>> I have spent quite a lot of time lately trying to understand what happens,
>> reading every documentation possible about the Token Filters that are used
>> in this field, but I can't seem to find the answer.
>>
>> It seems to me that for some reason, the parser is getting lost because
>> the value contains letters and numbers, I mention that because I have tried
>> querying only for *"2011-2012" and *"*20112012*" and then I have the
>> expected results.
>>
>> I am using Solr 1.4, and I haven't tried it in any other version.
>>
>> Another interesting factor is that for some reason the
>> SnowballPorterFilterFactory is removing a character from *"2011" * and so
>> *"201" *is the value that is actually indexed.
>> I don't believe that this last point is what actually causes
>> my unsatisfactory results, but I just wanted to know if anybody have any
>> issue with the Finish language stemming.
>>
>>
>> I would very much appreciate if someone could spare some time to help me
>> on this issue.
>>
>>
>> My configuration looks like:
>>
>>
>> *- Dynamic field: *
>>
>> <dynamicField name="prMeta_*" type="text" indexed="true" stored="true"
>> multiValued="true"/>
>>
>> *- Field type:*
>>
>> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>> <analyzer type="index">
>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> <filter class="solr.StopFilterFactory" ignoreCase="true" words="
>> stopwords.txt"/>
>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>> generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll
>> ="0"/>
>> <filter class="solr.LowerCaseFilterFactory"/>
>> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"
>> />
>> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>> <filter class="solr.SnowballPorterFilterFactory" language="Finnish"/>
>> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="
>> 25"/>
>> </analyzer>
>> <analyzer type="query">
>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> ignoreCase="true" expand="true"/>
>> <filter class="solr.StopFilterFactory" ignoreCase="true" words="
>> stopwords.txt"/>
>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>> generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll
>> ="0"/>
>> <filter class="solr.LowerCaseFilterFactory"/>
>> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"
>> />
>> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>> <filter class="solr.SnowballPorterFilterFactory" language="Finnish"/>
>> </analyzer>
>> </fieldType>
>>
>> *- The field analysis gives me that as a response:*
>>
>>  EHT2011-2012 EHT2011-2012 EHT 2011 2012 20112012 eht 2011 2012 20112012
>> eht 2011 2012 20112012 eht 2011 2012 20112012 eht 201 2012 20112012 e eheht2202012202012012220201201120112201120201120120112012
>>
>> - *When I run the query in the admin in debug mode (&debugQuery=true),
>> that's the result:*
>>
>> <str name="rawquerystring">
>> prMeta_service:EHT2011-2012
>> </str>
>> <str name="querystring">
>> prMeta_service:EHT2011-2012
>> </str>
>> <str name="parsedquery">
>> PhraseQuery(prMeta_service:"eht 201 2012")
>> </str>
>> <str name="parsedquery_toString">
>> prMeta_service:"eht 201 2012"
>> </str>
>>
>>
>> Thank you very much in advance!
>>
>> Best regards,
>>
>> --
>> Dirceu Vieira Júnior
>> -------------------------------------------------------------------
>> +47 9753 2473
>> dirceuvjr.blogspot.com
>> twitter.com/dirceuvjr
>>
>>
>
>
> --
> Dirceu Vieira Júnior
> -------------------------------------------------------------------
> +47 9753 2473
> dirceuvjr.blogspot.com
> twitter.com/dirceuvjr

Mime
View raw message