lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Nylund <jnyl...@yahoo.com>
Subject Re: weird sorting behavior
Date Thu, 31 Dec 2009 17:07:15 GMT
Thanks Erik,

the null problem was introduced when I copied the example below, now I  
have the nulls excluded using (sortMissingLast="true"), in 1.5 using  
the suggested config below and im still not seeing the desired behavior.

It seems to me that the default behavior of the Java Collator using  
the ROOT locale (PRIMARY or SECONDARY dont seem to matter in this  
example) is as follows:

empty string
symbols (by this I mean $, & , *, * etc)
numerics
alpha
leading spaces

My desire is:
alpha
numeric
symbols
leading spaces
empty string

Im going to try a custom RuleBasedCollator to see if I can make this  
happen as Shalin suggested.

thanks
Joel



I
On Dec 31, 2009, at 11:11 AM, Erick Erickson wrote:

> have you tried setting sortMissingLast="true" in your schema.xml?  
> Something
> like...
>
> <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> omitNorms="true"/>
>
> or perhaps in your individual field definition instead. The schema.xml
> examples have additional information that you really should scan at
> least....
>
> HTH
> Erick
>
> On Thu, Dec 31, 2009 at 8:53 AM, Joel Nylund <jnylund@yahoo.com>  
> wrote:
>
>> Hi,
>>
>> After some further investigation, it turns out that null fields were
>> sorting first, so if the title was null it was coming up first.  
>> This is true
>> even with 1.5 and collatedROOT. (I tried on last nights build).
>>
>> So let me change my question, how do I make items with null values  
>> sort
>> last?
>>
>> thanks
>> Joel
>>
>>
>> On Dec 30, 2009, at 3:11 PM, Joel Nylund wrote:
>>
>> Hi, so this is only available in 1.5?
>>>
>>> I tried in 1.4 and got :
>>>
>>> org.apache.solr.common.SolrException: Error loading class
>>> 'solr.CollationKeyFilterFactory'
>>>
>>> Is there a way to do this in 1.4?
>>>
>>> The link Shalin sent is a 1.5 link I think.
>>>
>>> thanks
>>> Joel
>>>
>>> On Dec 25, 2009, at 10:52 PM, Robert Muir wrote:
>>>
>>> Hello, as Shalin said, you might want to try  
>>> CollationKeyFilterFactory.
>>>>
>>>> Below is an example (using the multilingual root locale), where the
>>>> spaces will sort after the letters and numbers as you mentioned,  
>>>> but
>>>> it will still not be case-sensitive. This is because strength is
>>>> 'secondary'.
>>>>
>>>> But are you really sure you want the spaces sorted after the  
>>>> letters
>>>> and numbers? Or instead do you just want them ignored for  
>>>> sorting? If
>>>> this is the case, then try 'primary', so that spaces, punctuation,
>>>> accents and things like that in addition to case are ignored in the
>>>> sort: for example "Test-1234" and "   test1234" sort the same with
>>>> primary, but not with secondary (the one with leading spaces will  
>>>> sort
>>>> last)
>>>>
>>>> If all else fails, you can write custom rules for it too, as Shalin
>>>> mentioned.
>>>>
>>>> <fieldType name="collatedROOT" class="solr.TextField">
>>>> <analyzer>
>>>> <tokenizer class="solr.KeywordTokenizerFactory"/>
>>>> <filter class="solr.CollationKeyFilterFactory"
>>>>     language=""
>>>>     strength="secondary"
>>>> />
>>>> </analyzer>
>>>> </fieldType>
>>>>
>>>> On Fri, Dec 25, 2009 at 5:37 AM, Shalin Shekhar Mangar
>>>> <shalinmangar@gmail.com> wrote:
>>>>
>>>>>
>>>>> On Thu, Dec 24, 2009 at 11:51 PM, Joel Nylund <jnylund@yahoo.com>
>>>>> wrote:
>>>>>
>>>>> update, I tried changing to datatype string, and it sorts the  
>>>>> numerics
>>>>>> better, but the other sorts are not as good.
>>>>>>
>>>>>> Is there a way to control sorting for special chars, for  
>>>>>> example, I
>>>>>> want
>>>>>> blanks to sort after letters and numbers.
>>>>>>
>>>>>>
>>>>>> In the general case, CollationKeyFilterFactory will do the  
>>>>>> trick. You
>>>>> could
>>>>> create a custom rule set which sorts spaces after letters and  
>>>>> numbers.
>>>>> See
>>>>> http://wiki.apache.org/solr/UnicodeCollation
>>>>>
>>>>>
>>>>> using alphaOnlySort - sorts nicely for alpha, but numbers dont  
>>>>> work
>>>>>> string - sorts nicely for numbers and letters, but special  
>>>>>> chars like
>>>>>> blanks show up first in the list
>>>>>>
>>>>>>
>>>>>> alphaOnlySort has a PatternReplaceFilterFactory which removes all
>>>>> characters
>>>>> except a-z. This is the reason behind those wierd results. You  
>>>>> could try
>>>>> removing that filter and see if thats what you need.
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Shalin Shekhar Mangar.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Robert Muir
>>>> rcmuir@gmail.com
>>>>
>>>
>>>
>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message