lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Fauerbach <chrisfauerb...@gmail.com>
Subject Re: Multiple Words in String
Date Mon, 04 Apr 2011 01:04:41 GMT
It's not a specific case only ( e.g. microsoft.com),  but it's really a
multi word issue.

carwash, bookkeeper etc...

I'm ultimately looking for a schema for search and retrieve that's heavily
focused on 'names'.. these are peoples names, business names etc..   not
content like large text fields, web sites or anything like that, but
business data that I'm very succesfully receiving using dataimport
handlers...  it's these special cases that are really tripping me up .. my
business folks keep coming up with them!


Chris Fauerbach
chrisfauerbach@gmail.com


On Sun, Apr 3, 2011 at 6:51 PM, Erick Erickson <erickerickson@gmail.com>wrote:

> Is this a general question or specific? You can handle specific ones by
> using synonyms.
>
> But the general case, that is treating any two pairs of tokens as
> a single pair seems fraught with unintended consequences, but
> you know your problem space better than I do.
>
> Best
> Erick
>
> On Sat, Apr 2, 2011 at 2:21 PM, Chris Fauerbach <chrisfauerbach@gmail.com
> >wrote:
>
> > Good afternoon everyone!
> > I am stumped, and I would love some help.    I'm new to solr/lucene,
> > but I have thrown myself into it, so I think I have a solid
> > understanding.   Using the analysis tool in the admin interface, I see
> > these words stemmed and processed as I assume they would be, so I'm
> > stuck.
> >
> > In my index, I have two documents, each with a text field, and here
> > are example values
> >
> > 1) microsoft.com
> > 2) micro soft
> >
> > I want to do a search using microsoft or "micro soft" and find both.
> > I'm using the dismax interface, the fields are properly listed in the
> > config, and I can find both records, but never at the same time.
> > Here's my schema.xml for my text field, any thoughts on what I can do
> > to find these together?
> >
> >
> >    <fieldType name="text" class="solr.TextField"
> > positionIncrementGap="100">
> >      <analyzer type="index">
> >        <tokenizer class="solr.StandardTokenizerFactory"/>
> >                <filter class="solr.LowerCaseFilterFactory"/>
> >        <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true"/>
> >        <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
> > preserveOriginal="1"/>
> >                <filter class="solr.SynonymFilterFactory"
> > synonyms="syn/index_synonyms.txt" ignoreCase="true" expand="true"/>
> >                <filter class="solr.EdgeNGramFilterFactory"
> minGramSize="2"
> > maxGramSize="15" side="front"/>
> >                <filter class="solr.EdgeNGramFilterFactory"
> minGramSize="2"
> > maxGramSize="15" side="back"/>
> >        <filter class="solr.SnowballPorterFilterFactory"
> > language="English" protected="protwords.txt"/>
> >      </analyzer>
> >      <analyzer type="query">
> >        <tokenizer class="solr.StandardTokenizerFactory"/>
> >                <filter class="solr.LowerCaseFilterFactory"/>
> >                <filter class="solr.EdgeNGramFilterFactory"
> minGramSize="2"
> > maxGramSize="15" side="front"/>
> >                <filter class="solr.EdgeNGramFilterFactory"
> minGramSize="2"
> > maxGramSize="15" side="back"/>
> >        <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true"/>
> >        <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
> > preserveOriginal="1"/>
> >        <filter class="solr.SnowballPorterFilterFactory"
> > language="English" protected="protwords.txt"/>
> >
> >      </analyzer>
> >    </fieldType>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message