lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Alphanumeric sort with alphabets first
Date Fri, 17 Mar 2017 15:03:06 GMT
I would back up further and say that 2500 fields is too much from the
start. Why do you need this many fields? And you say you can sort on
any of them... for a corpus of any decent size this is going to chew
up memory like crazy. Admittedly OS memory if you use docValues but
still memory.

That said, a custom sort function is probably the way to go if you
really need to.

Best,
Erick

On Thu, Mar 16, 2017 at 9:17 PM, Srinivasan Narayanan
<snarayanan@sapient.com> wrote:
> Can someone please respond?
>
> From: Srinivasan Narayanan <snarayanan@sapient.com>
> Date: Monday, March 13, 2017 at 3:51 PM
> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
> Subject: Alphanumeric sort with alphabets first
>
>
> Hello SOLR experts,
>
> I am new to SOLR and I am trying to do alphanumeric sort on string field(s). However,
in my case, alphabets should come before numbers. I also have a large number of such fields
(~2500), any of which can be alphanumerically sorted upon at runtime. I’ve explored below
concepts in SOLR to arrive at a solution:
>
> 1)      Custom similarity plugin : far fetched, and probably not even applicable to my
usecase
>
> 2)      Analyzer/tokenizer and regex magic to left pad number parts with 0s : two disadvantages
– I believe this needs extra fields (copy) to be created which I cannot do (2500 more fields
is too much) and this will still push numbers before alphabets
>
> 3)      Custom function (ValueSource) and regex magic to left pad numeric tokens with
0s, and invoke function for sorting only – a bit better than the previous one, but still
numbers come before alphabets.
>
> 4)      Custom function (ValueSource) and regex magic to left pad numeric tokens with
0s, prefix numeric tokens with tilde (~), and invoke function for sorting only – this is
where I stand right now. Very ugly, but it works. Because tilde has a very high ASCII value,
it pushes numbers behind alphabets.
> There should obviously be a better approach I am missing. Please help!

Mime
View raw message