lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: search returns matches for non-starting wildcard prefix queries
Date Mon, 09 Feb 2009 18:11:13 GMT
Rupert,

Try using "string" field type instead of "text" and test it out with some unusual/rare last
name patterns.  For example, try it with last names that consist of more than one word and
see if you are happy with those results.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch




________________________________
From: Rupert Fiasco <rufiasco@gmail.com>
To: solr-user@lucene.apache.org
Sent: Monday, February 9, 2009 12:46:15 PM
Subject: search returns matches for non-starting wildcard prefix queries

(I think I have a horrible subject line but I wasnt sure how to
properly explain myself).

I have a text field that I store last names in (and everything is
lowercased prior to insertion, not sure if that matters).

The field is described as:

   <field name="last_name" type="text" indexed="true" stored="false"
multiValued="true"/>
    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>



When running a query such as

last_name:m*

I get data back like:

Pashman, Md
Maldonado
Manolidis
Fleisher, M.D., D.Ht., D.A.B.F.M.
Merino
Monroe
McLay
Maltsberger
McMurtray
Murphy Md
Loeb Md


As you can see most are perfect matches, but there are some that
*dont* start with the letter "M" but do have "M" at the beginning of
another "word" in the field.

Wouldnt the query "m*" just query for matches where the first letter
is "M" in the whole field and not within another "word" in that field?

Do I need to make another field to store last names and not perform
any analysis on that field (akin to a spell check field)?

Thanks in advance.

-Rupert

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message