lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Dynamic analizer settings change
Date Wed, 11 Sep 2013 14:33:42 GMT
You're still in danger of overly-broad hits. When you
try stemming differently into the _same_ underlying
field you get things that make sense in one language
but are totally bogus in another language matching
the query.

As far as lots and lots of fields is concerned, if you
want to restrict your searches to only one language
you have a couple of choices here....

Consider a different core per language. Solr easily
handles many cores/server. Now you have no
'wasted' space, it just happens that the stemmer for
the core uses the DE-specific stemmers. Which
you can extend to German de-compounding etc.

Alternatively, you can form your queries with some
care. There's nothing that requires, say, edismax to
be specified in solrconfig.xml. Anything you would
put in the defaults section of the config you can
override on the command line. So, for instance,
if you knew you were querying in French, you could
form something like (going from memory)
defType=edismax&qf=title_fr,text_fr
or
&qf=title_de,text_de

and so completely avoid cross-languge searching.

Or you could simply include a field that has the
language and tack on an fq clause like fq=de.

But you haven't told us how big your problem is. I wouldn't
worry at all about efficiency at this stage if you have, say,
10M documents, I'd just try the simplest thing first and
measure.

500M documents is probably another story.

FWIW
Erick


On Wed, Sep 11, 2013 at 9:50 AM, maephisto <my_sky_mc@yahoo.com> wrote:

> Thanks Jack! Indeed, very nice examples in your book.
>
> Inspired from there, here's a crazy idea: would it be possible to build a
> custom processor chain that would detect the language and use it to apply
> filters, like the aforementioned SnowballPorterFilter.
> That would leave at the end a document having as fields: text(with filtered
> content) and language(the one determined by the processor).
> And at search time, always append the language=<user selected language>.
>
> Does this make sense? If so, would it affect the performance at index time?
> Thanks!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Dynamic-analizer-settings-change-tp4089274p4089305.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message