lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Collator-based facet sorting in Solr
Date Wed, 12 Sep 2012 10:58:11 GMT
On Wed, Sep 12, 2012 at 3:44 AM, Toke Eskildsen <te@statsbiblioteket.dk> wrote:
>
> That would be a serious impediment. For some of our uncontrolled fields,
> the same word can be cased very differently: CD, cd, Cd. To be of the
> safe side, the client would have to ask for 3 times the wanted amount of
> facet information. But if we cannot normalize at index time,
> de-duplication on the server would require changes to the faceting code.

I'll open an issue for this. We should at least fix the analysis
factory APIs to support it, even if
the solr configuration xml doesn't yet have syntax.

>
> Regardless, it sounds that the idea passes the initial sanity check.
> Should I open a JIRA issue for it?

I think you should.

As an ugly workaround to the above problem: you could actually
construct a Lucene Analyzer with KeywordTokenizer(ICUCollationAtt)
followed by LowerCase/etc/etc and load that up with <analyzer
class=....> in solr. I think that will work fine.

-- 
lucidworks.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message