lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Bennett <mbenn...@ideaeng.com>
Subject 4 quick questions about Fuzzy Search, including forcing SlowFuzzySearch
Date Fri, 09 Nov 2012 23:12:17 GMT
I've been checking the code a bit, but it's taking while, and I have 4
questions:

Summary:

I want to submit fuzzy searches, with lower scores, of long words, via
Solr.  I want to use the older/slower method, even though it's slower.   (I
realize low percents on long words sounds like a bad idea, it's a very long
story, there's lots of other stuff going on)

Also I have search time analyzer logic in schema.xml that needs to used,
whether I'm doing a regular search or fuzzy search.

Example:
        state:California~0.65
        (overly simple example of course)
Or even:
        state:CALIFORNI~0.65  (1 letter off)
And still have match:
        Content indexed as state:california

Things I'm worried about:

1: Need the parser to call SlowFuzzyQuery instead of FuzzySearch (yup, we
know it's slow!)
    Not sure if this is about invoking the old parser, or if it's some type
of config issue instead?

2: I don't want the 0.65 score being needlessly translated into an integer
and then getting needlessly capped at 2.
  I'm not sure if the approach is:
    * "don't bother converting from float to int",
      OR
    * "convert to int if you want, but don't cap it at 2"

3: Schema.xml Analyzers apply lowercase to words at both index and search
time.
  (We actually have some other complex analyzers that *need* to happen,
just using lowercase as an example)
  But it seems like I search state:CALIFORNI~0.65  (via solr) it doesn't
work.
  I'm worried that Solr isn't running my text through the query analyzers
first!

4: Would the XML parser help with any this?  I think it's still somewhat in
limbo?
    We do programmatically build some parts of queries using the Lucene
API, then convert to Strings.
    Then we pass the strings to Solr; this seemed to be suggested
workaround I found online.
    Wondering if XML would bypass this step and give other more precise
control over slowfuzzy vs. fuzzy.


I'm not sure if this a matter of trying to force the old "classic" query
parser, or setting some configuration or -D directive regardless of parser
being used.




--
Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513

Mime
View raw message