lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luís Filipe Nassif <lfcnas...@gmail.com>
Subject Re: Get original DocValues from ICUCollationDocValuesField
Date Sun, 30 Apr 2017 21:02:29 GMT
Thanks Uwe!

I've considered indexing it into 2 fields, but asked to know if there is a
better way. Maybe I will convert the values to lower case and apply
ASCIIFoldingFilter to eliminate the need of Collator and the overhead of
indexing into 2 fields.

Luis

2017-04-30 17:36 GMT-03:00 Uwe Schindler <uwe@thetaphi.de>:

> Hi,
>
> No. Collation keys are a one-way function. You need to index it into 2
> different fields, once for sorting as collation key and once for facetting
> or display.
>
> Uwe
>
>
> Am 30. April 2017 22:29:23 MESZ schrieb "Luís Filipe Nassif" <
> lfcnassif@gmail.com>:
>>
>> A related question: is it possible to do faceting on a SortedDocValuesField
>> using Collation rules? Or faceting is always case sensitive?
>>
>> Thanks in advance,
>> Luis
>>
>> 2017-04-30 12:35 GMT-03:00 Luís Filipe Nassif <lfcnassif@gmail.com>:
>>
>>  Hi Lucene community!
>>>
>>>  I can successful get original doc values from fields indexed with
>>>  SortedDocValues with code like:
>>>
>>>  BytesRef bref = atomicReader.getSortedDocValues(field).get(doc);
>>>  String value = bref.utf8ToString();
>>>
>>>  But as I need to use locale sorting, I use ICUCollationDocValuesField for
>>>  indexing several fields. But for those fields, the code above does not
>>>  work, the value returned is a lot of unreadable chars. I know it is because
>>>  of the conversion of Strings to CollationKeys done by ICU Collator.
>>>
>>>  Is there a way to convert the returned BytesRef to the original doc value?
>>>  Or, in other words, how can I get the original String from an ICU
>>>  RawCollationKey?
>>>
>>>  Any help will be very appreciated!
>>>
>>>  Thanks Lucene contributors for so great projet!
>>>  Luis Nassif
>>
>>
>>
> --
> Uwe Schindler
> Achterdiek 19, 28357 Bremen
> https://www.thetaphi.de
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message