lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gert Brinkmann <g...@netcologne.de>
Subject query with stemming, prefix and fuzzy?
Date Tue, 27 Jan 2009 16:48:23 GMT
Hello,

I am trying to get Solr to properly work. I have set up a Solr test
server (using jetty as mentioned in the tutorial). Also I had to modify
the schema.xml so that I have different fields for different languages
(with their own stemmers) that occur in the content management system
that I am indexing. So far everything does work fine including snippet
highlighting.

But now I am having some problems with two things:

A) fuzzy search

When trying to do a fuzzy search the analyzers seem to break up a search
string like "house~0.6" into "house", "0" and "6" so that e.g. a single
"6" is highlighted, too. So I tried to use an additional raw-field
without any stemming and just a lower case and white space analyzer.
This seems to work fine. But fuzzy query is very slow and takes 100% CPU
for several seconds with only one query at a time.

What can I do to speed up the fuzzy query? I e.g. have found a Lucene
parameter prefixLength but no according Solr option. Does this exist?
Are there some other options to pay attention to?


B) combine stemming, prefix and fuzzy search

Is there a way to combine all this three query types in one query?
Especially stemming and prefixing? I think it would be problematic as a
"house*" would be analyzed to "house" with the usual analyzers that are
required for stemming?

Do I need different query type fields and combine them with an boolean
OR in the query? Something like

  data:house OR data_fuzzy:house~0.6 OR data_prefix:house*

This feels to be a little bit circuitous. Is there a way to use
"house*~.6" including correct stemming?

Thank you,
Gert

Mime
View raw message