lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] Updated: (SOLR-2396) deprecated char[] based collation factories / replace with byte[]-based
Date Mon, 28 Feb 2011 21:22:37 GMT


Robert Muir updated SOLR-2396:

    Attachment: SOLR-2396.patch

updated patch, instead adding CollatedField and ICUCollatedField.

the trick was trying to get this thing to "use" my internal analyzer... setting TOKENIZED
and changing SolrQueryParser to check isTokenized() instead of 'instanceof TextField' got
things going.

still needs unit tests but locale-sensitive range queries etc are working here.

> deprecated char[] based collation factories / replace with byte[]-based
> -----------------------------------------------------------------------
>                 Key: SOLR-2396
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>         Attachments: SOLR-2396.patch, SOLR-2396.patch
> In LUCENE-2551 collation support was changed to use byte[] keys.
> Previously it encoded sort keys with IndexableBinaryString into char[],
> but this is wasteful with regards to RAM and disk when terms can be byte.
> A simple solution is to create tokenizer factories which are KeywordTokenizer + [ICU]CollationAttributeFactory.
> A better solution would be [ICU]CollationFieldTypes, as this would allow locale-sensitive
> range queries, but I found this to be more difficult due to the fact that the indexed
> are byte[] not String...

This message is automatically generated by JIRA.
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message