lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Search differences between solr 1.4.0 and 3.6.1
Date Wed, 28 Nov 2012 12:52:10 GMT
Well, I get the same results in 1.4 and 3.6. The only difference is I
didn't put
<http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml>
in.

In both cases the 12 is missing from the query analysis but is in the
index analysis, due to the catenateNumbers being 1 in one case and
0 in the other.

So Im guessing there's something else going on that you're overlooking,
but don't have any good clue....

Best
Erick


On Wed, Nov 28, 2012 at 4:34 AM, Frederico Azeiteiro <
Frederico.Azeiteiro@cision.com> wrote:

> I just reload both indexes just to make sure that all definitions are
> loaded.
> On Analysis tool I can see differences, even that the fields are defined
> on the same way:
>
> Query Analyser for 3.6.1
> org.apache.solr.analysis.WordDelimiterFilterFactory
> {protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0,
> catenateWords=0, luceneMatchVersion=LUCENE_36, generateWordParts=1,
> catenateAll=0, catenateNumbers=0}
> term text: GAMES
>
> Query Analyser for 1.4.0
> org.apache.solr.analysis.WordDelimiterFilterFactory
> {protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0,
> catenateWords=0, generateWordParts=1, catenateAll=0, catenateNumbers=0}
> term text: GAMES | 12
>
> The "12" is lost on query for 3.6.1.
> The only diference I can see on the field definition is the
> "luceneMatchVersion=LUCENE_36"... Could it cause this issue?
>
> Thank you.
> Frederico
>
> -----Mensagem original-----
> De: Erick Erickson [mailto:erickerickson@gmail.com]
> Enviada: terça-feira, 27 de Novembro de 2012 12:26
> Para: solr-user@lucene.apache.org
> Assunto: Re: Search differences between solr 1.4.0 and 3.6.1
>
> Using the definition you provided, I don't get the same output. Are you
> sure you are doing what you think? The generateNumberParts=0 keeps the '12'
> from making it through the filter in 1.4 and 3.6 so I suspect you're not
> quite doing something the same way in both.
>
> Perhaps looking at index tokenization in one and query in the other?
>
> Best
> Erick
>
>
> On Mon, Nov 26, 2012 at 9:06 AM, Frederico Azeiteiro <
> Frederico.Azeiteiro@cision.com> wrote:
>
> > Hi,
> >
> >
> >
> > While updating our SOLR to 3.6.1 I noticed some results differences
> > when using search strings with letters+number.
> >
> > For a text field defined as:
> >
> > <analyzer type="index">
> > <http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml>
> >
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >
> > <charFilter class="solr.MappingCharFilterFactory"
> > mapping="mapping-ISOLatin1Accent.txt"/>
> >
> > <filter class="solr.WordDelimiterFilterFactory"
> > protected="protwords.txt" splitOnCaseChange="1" catenateAll="0"
> > catenateNumbers="1" catenateWords="1" generateNumberParts="0"
> > generateWordParts="1" stemEnglishPossessive="0"/>
> >
> > </analyzer>
> >
> > <analyzer type="query">
> > <http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml>
> >
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >
> > <filter class="solr.SynonymFilterFactory" ignoreCase="true"
> > expand="true" synonyms="synonyms.txt"/>
> >
> > <filter class="solr.WordDelimiterFilterFactory"
> > protected="protwords.txt" splitOnCaseChange="1" catenateAll="0"
> > catenateNumbers="0" catenateWords="0" generateNumberParts="0"
> > generateWordParts="1"/>
> >
> > </analyzer>
> >
> >
> >
> > Searching for string GAMES12 returns a lot of results on 3.6.1 that
> > are not returned on 1.4.0.
> >
> >
> >
> > It looks like WordDelimiterFilterFactory  is acting different for
> > 3.6.1, the numeric part of the keyword is being ignored and the search
> > is performed using only GAMES.
> >
> >
> >
> > Analisys returns for 1.4.0:
> >
> > org.apache.solr.analysis.WordDelimiterFilterFactory
> > {protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0,
> > catenateWords=0, generateWordParts=1, catenateAll=0,
> > catenateNumbers=0}
> >
> > term position
> >
> > 1
> >
> > 2
> >
> > term text
> >
> > GAMES
> >
> > 12
> >
> > term type
> >
> > word
> >
> > word
> >
> > source start,end
> >
> > 0,5
> >
> > 5,7
> >
> > payload
> >
> >
> >
> >
> >
> > AND for 3.6.1
> >
> >
> >
> > org.apache.solr.analysis.WordDelimiterFilterFactory
> > {protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0,
> > catenateWords=0, luceneMatchVersion=LUCENE_36, generateWordParts=1,
> > catenateAll=0, catenateNumbers=0}
> >
> > position
> >
> > 1
> >
> > term text
> >
> > GAMES
> >
> > startOffset
> >
> > 0
> >
> > endOffset
> >
> > 5
> >
> > type
> >
> > word
> >
> > positionLength
> >
> > 1
> >
> >
> >
> >
> >
> > Is this something that can be modified/fixed to return the same results?
> >
> >
> >
> > Thank you.
> >
> >
> >
> > Regards,
> >
> > Frederico
> >
> >
> >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message