lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Della Bitta <michael.della.bi...@appinions.com>
Subject Re: Solr faceting -- sort order
Date Thu, 19 Jul 2012 14:33:09 GMT
Maybe I'm not understanding the problem, but I accomplish this by
having two fields. One for sorting, like so:

<fieldType name="sort" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

And then a string type field for faceting. Use a copyField directive
to get the same data in both, and then sort on the sort field, and
facet on the string field. The MappingCharFilterFactory removes
accents for sorting, so you don't have to worry about accented
characters sorting out of order.

Michael Della Bitta

------------------------------------------------
Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Thu, Jul 19, 2012 at 4:37 AM, Toke Eskildsen <te@statsbiblioteket.dk> wrote:
> On Wed, 2012-07-18 at 20:30 +0200, Christopher Gross wrote:
>> When I do a query, the results that come through retain their original
>> case for this field, like:
>> doc 1
>> keyword: Blah Blah Blah
>> doc 2
>> keyword: Yadda Yadda Yadda
>>
>> But when I pull back facets, i get:
>>
>> blah blah blah (1)
>> yadda yadda yadda (1)
>
> Yes. The results from your query are the stored values, while the
> results from your facets are the indexed ones. That's the way faceting
> works with Solr.
>
> Technically there is nothing wrong with writing a faceting system that
> uses the stored values. We did this some years back, but abandoned the
> idea. As far as I remember, it was a lot slower to initialize the
> internal structures this way. One could also do faceting fully at search
> time, by iterating all the documents and requesting the stored value for
> each of them directly from the index, but that would be very slow.
>
>> I was attempting to fix a sorting problem -- keyword "aaaa" would show
>> up after keyword "Zulu" due to the "index" sorting, so I thought that
>> I could lowercase it all to have it be in the same order.  But now it
>> is all in lower case, and I'd like it to retain the original style.
>
> Currently the lowercase trick is the only solution for plain Solr and
> even that only works as long as your field holds only a-z letters. So no
> foreign names or such.
>
> Looking forward, one solution would be to specify a custom codec for the
> facet field, where the comparator used for sorting is backed by a
> Collator that sorts the terms directly, instead of using CollatorKeys.
> It would be a bit slower for index updates, but should do what you
> require. Unfortunately I am not aware of anyone who has created such a
> codex or even how easy it is to get it to work with Solr (4.0 alpha).
>
> We have experimented with a faceting approach that allows for custom
> ordering, but it sorts upon index open and thus has a fairly long start
> up time. Besides, it it not in a proper state for production:
> https://issues.apache.org/jira/browse/SOLR-2412
>
> - Toke Eskildsen, State and University Library, Denmark
>

Mime
View raw message