lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Weird: Solr Search result and Analysis Result not match?
Date Wed, 09 Nov 2011 17:33:02 GMT
Regarding <1>. Take a look at admin/analysis and see the tokenization just
to check.

Oh, and one more thing...
putting <LowerCaseFilterFactory> in front of <WordDelimiterFilterFactory>
kind of defeats the purpose of WordDelimiterFilterFactory. One of the
things WDDF does is split on case change and you're removing the case
changes before WDDF gets hold of it.

Best
Erick

On Tue, Nov 8, 2011 at 9:40 PM, Ellery Leung <elleryleung@be-o.com> wrote:
> Thanks Erick, here are my responses:
>
> 1. Yes.  What I want to achieve is that when index is filtered with EdgeNgram, and a
query that is not filtered in that way, I can do search on partial string.
> 2. Good suggestion, will test it.
> 3. ok
> 4. Thank you
> 5/6. Will remove the synonyms and word delimiterfilterfactory in query
> 7. will look at that using Luke.  By the way, it is the first time I saw that there
is a tool for that.  Thank you.
> 8. Yes.
>
> Will check that again, thank you.
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: 2011年11月8日 9:52 下午
> To: solr-user@lucene.apache.org; elleryleung@be-o.com
> Subject: Re: Weird: Solr Search result and Analysis Result not match?
>
> Several things:
>
> 1> You don't have EdgeNGramFilterFactory in your query analysis chain,
> is this intentional?
> 2> You have a LOT of stuff going on here, you might try making your
> analysis chain simpler and
>     adding stuff back in until you see the error. Don't forget to re-index!
> 3> Analysis doesn't take into account query *parsing*, so it's
> possible to get a false sense of
>     assurance when the analysis page matches your expectations.
> 4> Even though nothing jumps out at me except the Edge.... factory,
> nice job of including
>     information.
> 5> It's unusual to expand synonyms both at query and index time,
> usually one or the
>     other with index time preferred.
> 6> Same with WordDelimiterFilterFactory. If you put all the variants
> in the index, you don't
>     need to put all the variants in the query and vice-versa.
> 7> Take a look at your actual contents, perhaps using Luke to insure
> that what you expect
>      to be in your index actually is.
> 8> You did re-index after your latest changes to your schema, right <G>?
>
> All of this is a way of saying that I don't quite see what the problem
> is, but at least there are
> some avenues to explore.
>
> Best
> Erick
>
> On Mon, Nov 7, 2011 at 9:29 PM, Ellery Leung <elleryleung@be-o.com> wrote:
>> Hi all.
>>
>>
>>
>> I am using Solr 3.4 under Win 7.
>>
>>
>>
>> In schema there is a multivalue field indexed in this way:
>>
>> ==========================
>>
>> Schema:
>>
>> ==========================
>>
>> <field name="myEvent" type="myCustomText" multiValued="true" indexed="true"
>> stored="true" omitNorms="true"/>
>>
>>
>>
>> <fieldType name="myCustomText" class="solr.TextField"
>> positionIncrementGap="100">
>>
>>        <analyzer type="index">
>>
>>                <charFilter class="solr.MappingCharFilterFactory"
>> mapping="../../filters/filter-mappings.txt"/>
>>
>>                <charFilter class="solr.HTMLStripCharFilterFactory"/>
>>
>>                <tokenizer class="solr.StandardTokenizerFactory"/>
>>
>>                <filter class="solr.TrimFilterFactory"/>
>>
>>                <filter class="solr.LowerCaseFilterFactory"/>
>>
>>                <filter class="solr.SynonymFilterFactory"
>> synonyms="../../filters/filter-synonyms.txt" ignoreCase="true"
>> expand="true"/>
>>
>>                <filter class="solr.ASCIIFoldingFilterFactory"/>
>>
>>                <filter class="solr.WordDelimiterFilterFactory"
>> splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" preserveOriginal="1"/>
>>
>>                <filter class="solr.PhoneticFilterFactory"
>> encoder="DoubleMetaphone" inject="true"/>
>>
>>                <filter class="solr.PorterStemFilterFactory"/>
>>
>>                <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
>> maxGramSize="50" side="front"/>
>>
>>                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>
>>        </analyzer>
>>
>>        <analyzer type="query">
>>
>>                <charFilter class="solr.MappingCharFilterFactory"
>> mapping="../../filters/filter-mappings.txt"/>
>>
>>                <charFilter class="solr.HTMLStripCharFilterFactory"/>
>>
>>                <tokenizer class="solr.StandardTokenizerFactory"/>
>>
>>                <filter class="solr.TrimFilterFactory"/>
>>
>>                <filter class="solr.LowerCaseFilterFactory"/>
>>
>>                <filter class="solr.SynonymFilterFactory"
>> synonyms="../../filters/filter-synonyms.txt" ignoreCase="true"
>> expand="true"/>
>>
>>                <filter class="solr.ASCIIFoldingFilterFactory"/>
>>
>>                <filter class="solr.WordDelimiterFilterFactory"
>> splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1"
>> generateWordParts="0" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" preserveOriginal="1"/>
>>
>>                <filter class="solr.PhoneticFilterFactory"
>> encoder="DoubleMetaphone"/>
>>
>>                <filter class="solr.PorterStemFilterFactory"/>
>>
>>                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>
>>        </analyzer>
>>
>> </fieldType>
>>
>> ==========================
>>
>> Actual index:
>>
>> ==========================
>>
>> <arr name="myEvent">
>>
>> <str>2284e2</str>
>>
>> <str>2284e4</str>
>>
>> <str>2284e5</str>
>>
>> <str>1911e2</str>
>>
>> </arr>
>>
>>
>>
>> ==========================
>>
>> Question:
>>
>> ==========================
>>
>> Now when I do a search like this:
>>
>>
>>
>> myEvent:1911e2
>>
>>
>>
>> This should match the 4th item.  Now on "Full Interface", it does not return
>> any result.  But on "analysis", matches are highlighted.
>>
>>
>>
>> By using Debug: the parsedquery is:
>>
>>
>>
>> MultiPhraseQuery(myEvent:"(1911e2 1911) (A e) 2")
>>
>>
>>
>> Parsedquery_toString:
>>
>>
>>
>> myEvent:"(1911e2 1911) (A e) 2"
>>
>>
>>
>> Can anyone please help me on this?
>>
>>
>
>

Mime
View raw message