lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Gearon <gear...@sbcglobal.net>
Subject Re: shingles work in analyzer but not real data
Date Thu, 02 Sep 2010 21:51:08 GMT
I thought shingles were either a viral infection or roof material?

(Hey, it's crazy friday early for me)
Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Thu, 9/2/10, Jonathan Rochkind <rochkind@jhu.edu> wrote:

> From: Jonathan Rochkind <rochkind@jhu.edu>
> Subject: Re: shingles work in analyzer but not real data
> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
> Cc: "Vishal Patel" <vishal_patel@silvertouch.com>, "Michiel Willekens" <Michiel.Willekens@globalorange.nl>
> Date: Thursday, September 2, 2010, 2:47 PM
> I've run into this before too. Both
> the dismax and solr-lucene _query parsers_ will tokenize a
> query on whitespace _before_ they pass the query to any
> field analyzers. 
> There are some reasons for this, lots of things wouldn't
> work if they didn't do this.
> 
> But it makes your approach kind of hard. Try doing your
> search as a phrase search with double quotes, "apple pie", I
> bet it'll work then -- because both dismax and solr-lucene
> will respect the phrase quotes and NOT tokenize the stuff
> inside there before it gets to the field analyzers.
> 
> So if non-tokenized fields like this are all that are
> included in your search, and if you can get your client
> application to just force phrase quoting of everything
> before sending to Solr, that might work. Otherwise.... I
> don't know of a good solution. If you figure one out, let me
> know.
> 
> Jonathan
> 
> Jeff Rose wrote:
> > Hi,
> >   We are using SOLR to match query
> strings with a keyword database, where
> > some of the keywords are actually more than one
> word.  For example a keyword
> > might be "apple pie" and we only want it to match for
> a query containing
> > that word pair, but not one only containing
> "apple".  Here is the relevant
> > piece of the schema.xml, defining the index and query
> pipelines:
> > 
> >   <fieldType name="text"
> class="solr.TextField" positionIncrementGap="100">
> >      <analyzer type="index">
> >        <tokenizer
> class="solr.PatternTokenizerFactory" pattern=";"/>
> >         <filter
> class="solr.LowerCaseFilterFactory"/>
> >         <filter
> class="solr.TrimFilterFactory" />
> >      </analyzer>
> >      <analyzer type="query">
> >         <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter
> class="solr.TrimFilterFactory" />
> > <filter class="solr.ShingleFilterFactory" />
> >       </analyzer>
> >    </fieldType>
> > 
> > In the analysis tool this schema looks like it works
> correctly.  Our
> > multi-word keywords are indexed as a single entry, and
> then when a search
> > phrase contains one of these multi-word keywords it is
> shingled and matched.
> >  Unfortunately, when we do the same queries on
> top of the actual index it
> > responds with zero matches.  I can see in the
> index histogram that the terms
> > are correctly indexed from our mysql datasource
> containing the keywords, but
> > somehow the shingling doesn't appear to work on this
> live data.  Does anyone
> > have experience with shingling that might have some
> tips for us, or
> > otherwise advice for debugging the issue?
> > 
> > Thanks,
> > Jeff
> > 
> >   

Mime
View raw message