couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Antivackis <patrick.antivac...@gmail.com>
Subject Re: View keys case-insensitive?
Date Thu, 09 Apr 2009 21:45:30 GMT
It's quite normal as far as ICU is concerned.
ICU is about language not about ASCII code.
In ICU, case is the third element looked for comparison (same level than
circled letter in Nordic languages for example), so not very important.
So when you sort words together, a or A is still an a, so they are sorted
nearby. In ICU you can specify if you prefer a before A or A before a, but
not simply a before b before c.... before A before B before C.

To have such behavior (like ASCII) you need to custom ICU in specifying the
collation you want almost letter by letter.
It is great for you, but what about Japanese users or Arabic users ??

So this is definitely the right behaviour of ICU sorting (collation).


2009/4/9 Brian Candler <B.Candler@pobox.com>

> > I've spent entirely too long on this now and I still can't for the
> > life of me figure out why A < aa.
>
> Time for an experimental, black-box approach:
>
> ----
> require 'rubygems'
> require 'restclient'
> require 'json'
>
> DB="http://127.0.0.1:5984/collator"
>
> RestClient.delete DB rescue nil
> RestClient.put "#{DB}",""
>
> (32..126).each do |c|
>  RestClient.put "#{DB}/#{c.to_s(16)}", {"x"=>c.chr}.to_json
> end
>
> RestClient.put "#{DB}/_design/test", <<EOS
> {
>  "views":{
>    "one":{
>      "map":"function (doc) { emit(doc.x,null); }"
>    }
>  }
> }
> EOS
>
> puts RestClient.get("#{DB}/_design/test/_view/one")
> ----
>
> This shows the collation sequence to be as follows.
>
> {"total_rows":95,"offset":0,"rows":[
> {"id":"20","key":" ","value":null},
> {"id":"60","key":"`","value":null},
> {"id":"5e","key":"^","value":null},
> {"id":"5f","key":"_","value":null},
> {"id":"2d","key":"-","value":null},
> {"id":"2c","key":",","value":null},
> {"id":"3b","key":";","value":null},
> {"id":"3a","key":":","value":null},
> {"id":"21","key":"!","value":null},
> {"id":"3f","key":"?","value":null},
> {"id":"2e","key":".","value":null},
> {"id":"27","key":"'","value":null},
> {"id":"22","key":"\"","value":null},
> {"id":"28","key":"(","value":null},
> {"id":"29","key":")","value":null},
> {"id":"5b","key":"[","value":null},
> {"id":"5d","key":"]","value":null},
> {"id":"7b","key":"{","value":null},
> {"id":"7d","key":"}","value":null},
> {"id":"40","key":"@","value":null},
> {"id":"2a","key":"*","value":null},
> {"id":"2f","key":"/","value":null},
> {"id":"5c","key":"\\","value":null},
> {"id":"26","key":"&","value":null},
> {"id":"23","key":"#","value":null},
> {"id":"25","key":"%","value":null},
> {"id":"2b","key":"+","value":null},
> {"id":"3c","key":"<","value":null},
> {"id":"3d","key":"=","value":null},
> {"id":"3e","key":">","value":null},
> {"id":"7c","key":"|","value":null},
> {"id":"7e","key":"~","value":null},
> {"id":"24","key":"$","value":null},
> {"id":"30","key":"0","value":null},
> {"id":"31","key":"1","value":null},
> {"id":"32","key":"2","value":null},
> {"id":"33","key":"3","value":null},
> {"id":"34","key":"4","value":null},
> {"id":"35","key":"5","value":null},
> {"id":"36","key":"6","value":null},
> {"id":"37","key":"7","value":null},
> {"id":"38","key":"8","value":null},
> {"id":"39","key":"9","value":null},
> {"id":"61","key":"a","value":null},
> {"id":"41","key":"A","value":null},
> {"id":"62","key":"b","value":null},
> {"id":"42","key":"B","value":null},
> {"id":"63","key":"c","value":null},
> {"id":"43","key":"C","value":null},
> {"id":"64","key":"d","value":null},
> {"id":"44","key":"D","value":null},
> {"id":"65","key":"e","value":null},
> {"id":"45","key":"E","value":null},
> {"id":"66","key":"f","value":null},
> {"id":"46","key":"F","value":null},
> {"id":"67","key":"g","value":null},
> {"id":"47","key":"G","value":null},
> {"id":"68","key":"h","value":null},
> {"id":"48","key":"H","value":null},
> {"id":"69","key":"i","value":null},
> {"id":"49","key":"I","value":null},
> {"id":"6a","key":"j","value":null},
> {"id":"4a","key":"J","value":null},
> {"id":"6b","key":"k","value":null},
> {"id":"4b","key":"K","value":null},
> {"id":"6c","key":"l","value":null},
> {"id":"4c","key":"L","value":null},
> {"id":"6d","key":"m","value":null},
> {"id":"4d","key":"M","value":null},
> {"id":"6e","key":"n","value":null},
> {"id":"4e","key":"N","value":null},
> {"id":"6f","key":"o","value":null},
> {"id":"4f","key":"O","value":null},
> {"id":"70","key":"p","value":null},
> {"id":"50","key":"P","value":null},
> {"id":"71","key":"q","value":null},
> {"id":"51","key":"Q","value":null},
> {"id":"72","key":"r","value":null},
> {"id":"52","key":"R","value":null},
> {"id":"73","key":"s","value":null},
> {"id":"53","key":"S","value":null},
> {"id":"74","key":"t","value":null},
> {"id":"54","key":"T","value":null},
> {"id":"75","key":"u","value":null},
> {"id":"55","key":"U","value":null},
> {"id":"76","key":"v","value":null},
> {"id":"56","key":"V","value":null},
> {"id":"77","key":"w","value":null},
> {"id":"57","key":"W","value":null},
> {"id":"78","key":"x","value":null},
> {"id":"58","key":"X","value":null},
> {"id":"79","key":"y","value":null},
> {"id":"59","key":"Y","value":null},
> {"id":"7a","key":"z","value":null},
> {"id":"5a","key":"Z","value":null}
> ]}
>
> I've never seen this sequence before. It's not even EBCDIC :-)
>
> Adding aa into the pot gives:
>
> ...
> {"id":"61","key":"a","value":null},
> {"id":"41","key":"A","value":null},
> {"id":"X","key":"aa","value":null},
> ...
>
> As you say, that is most bizarre.
>
> Cheers,
>
> Brian.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message