jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bertrand Delacretaz" <bdelacre...@apache.org>
Subject Re: IndexingConfiguration jr 1.4 release, analyzing, searching and synonymprovider
Date Wed, 22 Aug 2007 07:34:29 GMT
On 8/21/07, Ard Schrijvers <a.schrijvers@hippo.nl> wrote:

> ...So would you like to see parts like chaining of filters for a indexing a property?
Think
>  that shouldn't be to hard to implement....

If that's within the scope of your work, that would IMHO be very
useful, to give people precise control on how the various properties
are indexed.

...Certainly something like
>
> <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true"
expand="false"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>
> would ofcourse ease the use of implementing synonyms/stopwords yourself....

Yes, given that many Lucene TokenFilters are available, this is useful I think.

I see two potential issues that you might want to take into account:

1) With configurable indexing analyzers, people sometimes have a hard
time figuring out how exactly their data is indexed (and why they
don't find it later).

Solr provides an analysis test page for that (see "Solr's content
analysis test page" in [1]). In the case of Jackrabbit, maybe logging
the filtered values of fields at the DEBUG level would help.

2) As discussed previously, one problem with this is which analyzer to
use when running a query that applies to several fields. In Solr, you
can configure a different analyzer for querying, it's probably the
best solution.

People then have to make sure their config is consistent for indexing
and querying, and might need in some cases to provide their own custom
QueryAnalyzer to achieve this. For example one that provides fake
synonyms for a token, with each synonym being the result of the one of
the analysis methods used. This can get tricky depending on the
configured analysis, when searching in multiple fields.

See also http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
for more info on how Solr manages the analyzers.

-Bertrand

[1] http://www.xml.com/lpt/a/1668

Mime
View raw message